-
Notifications
You must be signed in to change notification settings - Fork 5
Add a permissions system #2543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a permissions system #2543
Conversation
a2b83f1
to
5643572
Compare
Deploying databuilder-docs with
|
Latest commit: |
a476462
|
Status: | ✅ Deploy successful! |
Preview URL: | https://9e5fc93d.databuilder.pages.dev |
Branch Preview URL: | https://evansd-table-permissions.databuilder.pages.dev |
5643572
to
28a9cce
Compare
28a9cce
to
2e033e3
Compare
2e033e3
to
733b48b
Compare
733b48b
to
5faa3ad
Compare
5faa3ad
to
39465cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just had a look to see the shape of the change, so these comments are fairly bikesheddy rather than a thorough review!
|
||
from ehrql import claim_permissions | ||
|
||
claim_permissions("some_permission", "another_permission") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will researchers know what values to use? Would it be worth linking to the schema docs below and/or including a couple of real examples like "waiting_list"
and "isaric"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought was that the first time researchers would encounter this function would be from the warning (later error) message they get locally which will tell them exactly what names to use. And I worried that the real names might be a bit cryptic to anyone not familiar with the particular tables in question. I wanted to have something in the docstring, but I don't think it's ever going to serve as a comprehensive introduction to the topic.
class SelectTable(ManyRowsPerPatientFrame): | ||
name: str | ||
schema: TableSchema | ||
required_permission: str | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safer for the default permission to be some placeholder value rather than None
, so that all tables have to override it? Or is this something we'll definitely consider when we add a new table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the default here, in the query model, has to be None
otherwise every single table we ever construct (which we do many times in tests) needs to be updated.
If we're worried about forgetting we could have a separate check that looks at every table in our official schemas and make sure they all explicitly specify a required_permission
attribute, which may be None
for many tables.
ehrql/permissions.py
Outdated
f"permissions assigned by the OpenSAFELY team. For more information see:\n" | ||
f"https://docs.opensafely.org/ehrql/reference/language/#permissions" | ||
) | ||
# For the inital rollout of this feature we issue a warning locally but continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: inital
ehrql/permissions.py
Outdated
f"https://docs.opensafely.org/ehrql/reference/language/#permissions" | ||
) | ||
# For the inital rollout of this feature we issue a warning locally but continue | ||
# running. Eventually we want to make this a hard error so that it can't be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Is doing this part of #2515 or should we make a new issue to make sure we don't forget it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to wait for feedback from the co-pilots before creating an issue for this because I'm not quite sure what steps we need to do first before we can reasonably make this a hard error. But yeah, I think it's a separate step which warrants a separate ticket.
39465cd
to
68083ea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This broadly makes sense to me. Interesting to see how it is plumbed in to the table definitiones and qm.
Left a couple of questions, more for my own understanding of future intent.
required_permission = "special_perm" | ||
|
||
|
||
@function_body_as_string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, I had not see this decorator before! Evil/Genius hack! I love it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I was quite pleased with that. Makes defining test fixtures much more pleasant I think.
raise Error("Schema class must subclass either `PatientFrame` or `EventFrame`") | ||
|
||
try: | ||
table_name = cls._meta.table_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that using a inner class is fine, we at least have the precedent of django to follow, folks should be somewhat familiar.
But for my own education, could you give an example of a speculative meta inner classes method that would be used for future dummy data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I haven't got as far as thinking what the API might look like. But there are a couple of tables which are currently special-cased in the dummy data generator:
ehrql/ehrql/dummy_data/generator.py
Lines 203 to 223 in 29f56df
def rows_for_patients(self, table_info): | |
row = { | |
"date_of_birth": self.date_of_birth, | |
"date_of_death": self.date_of_death, | |
} | |
# Apply any FirstOfMonth constraints | |
for key, value in row.items(): | |
if key in table_info.columns and value is not None: | |
if table_info.columns[key].get_constraint(Constraint.FirstOfMonth): | |
row[key] = value.replace(day=1) | |
return [row] | |
def rows_for_practice_registrations(self, table_info): | |
# TODO: Generate more interesting registration histories; for now, we just | |
# assume that every patient is permanently registered with a single practice | |
# from birth | |
row = { | |
"start_date": self.events_start, | |
"end_date": None, | |
} | |
return [row] |
And my thought was that we can move some of this logic to the table definitions themselves, and then apply it more broadly to different kinds of table.
ehrql/permissions.py
Outdated
|
||
|
||
def parse_permissions(environ): | ||
return set(environ.get("EHRQL_PERMISSIONS", "").split(",")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AIUI, this supports a single purpose list, e.g. dataset permissions.
If we want to do T100 or old-patient-id flags, presumably we'd need to either have multiple env var for each, or have a more structured value for EHRQL_PERMISSIONS. This is obviously out of scope for this PR, but I was wondering how you had been thinking to handle that when we get to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, strictly speaking I think we can still handle those things with just a flat list of permissions e.g. t1oo,untagged_patient_ids,event_level_data
. There's nothing specifically about datasets in the permissions, that just happens to be the only thing which is currently checked.
But I do agree that we may want more structured data here and I wondered about switching to a JSON list rather than comma separated strings. I figured it would be easy enough to do that later and I didn't want to over-complicate things. but now you've raised it, maybe we would be better off tackling this now. It would be a trivial change in the controller.
68083ea
to
c1a57c0
Compare
Prompted by Simon's question, I've switch the format of the |
c1a57c0
to
d8caaf8
Compare
We need to be able to do this when enforcing permissions. This is a slight improvement on the status quo I think in that we now at least have a proper type to represent a collection of measures and some consistency between `Dataset()` and `Measure()` in that both now have a `_compile()` method. Measures are still in a strange half-state in being a lot like query model objects, but not actually being part of the query model. But sorting that all out is beyond the scope of this PR.
We do this via a Django-ish approach of defining an inner class. There are other approaches we could take, but the big advantage of using a class is that it makes it easy to include methods among the metadata which I think we'll want to do for dummy data generation purposes. In the first instance the only metadata property we support is supplying a table name which differs from the class name.
This is slightly cleaner than the previous approach.
This offsets what is, to my mind, the biggest downside of the inner class approach which is silent failure on typos.
When referencing this in the user docs I've tried to use some suitable generic text which doesn't suggest that you can just ask for permission and expect to be granted it. We may well want to tweak this on a per-table basis though, depending on the exact reason for the restriction.
I had previously tried to auto-generate the documentation here, but I think we're better off having the flexibility to explain different permissions in different ways. However I don't want us to be able to forget to document them, or to typo the permission name, so we now enforce this at documentation build time.
It's helpful if we can surface the error type to the user directly in Job Server, as we do for certain specific kinds of database error.
d8caaf8
to
a476462
Compare
This is now superseded by the ehrQL permissions system: * opensafely-core/ehrql#2543
This is now superseded by the ehrQL permissions system: * opensafely-core/ehrql#2543 Note that this PR should not be merged until the related `research-action` PR has been merged, which removes the one usage of the `opensafely check` command: * opensafely-core/research-action#115
This is now superseded by the ehrQL permissions system: * opensafely-core/ehrql#2543 Note that this PR should not be merged until the related `research-action` PR has been merged, which removes the one usage of the `opensafely check` command: * opensafely-core/research-action#115
This is now superseded by the ehrQL permissions system: * opensafely-core/ehrql#2543 Note that this PR should not be merged until the related `research-action` PR has been merged, which removes the one usage of the `opensafely check` command: * opensafely-core/research-action#115 Note that we deliberately leave the `repository_permissions.yaml` file in place to avoid triggering errors on clients which haven't yet updated and are still trying to fetch this file.
This is now superseded by the ehrQL permissions system: * opensafely-core/ehrql#2543
This is now superseded by the ehrQL permissions system: * opensafely-core/ehrql#2543
This is now superseded by the ehrQL permissions system: * opensafely-core/ehrql#2543 Note that this PR should not be merged until the related `research-action` PR has been merged, which removes the one usage of the `opensafely check` command: * opensafely-core/research-action#115 Note that we deliberately leave the `repository_permissions.yaml` file in place to avoid triggering errors on clients which haven't yet updated and are still trying to fetch this file. I've set a reminder to remove this file in a couple of weeks.
This adds a simple, generic permissions system to ehrQL. In this first instance this is only used to restrict access to certain tables, but I anticipate extending it to e.g. gate access to Event Level Data and for the various opt-out controls.
ehrQL table definitions can now be annotated with metadata using an internal
_meta
class. This can included arequired_permission
attribute e.g.A check at the documentation build stage ensures that every restricted table references the required permissions somewhere in the table documentation.
In Production
The RAP Controller already supplies an
EHRQL_PERMISSIONS
environment variable with a JSON-encoded list of permissions that the current job has. If a dataset or measure definition uses a restricted table without the relevant permission then this results in an immediate error, before any queries are executed. For example:ehrQL then exits with a specific status code which the RAP Agent can identify and use to inform the user that their code failed due to a permissions error.
Locally
Running the same code locally currently results in a warning in the logs:
The intention is that once this change has been circulated and users have had time to made the necessary changes we can make this a hard error rather than a warning so it will be impossible to miss in future.
See the documentation preview here:
https://evansd-table-permissions.databuilder.pages.dev/reference/language/#permissions
Note that the
claim_permissions()
function acts globally, rather than being a dataset/measure specific method. This seems like a better fit with the project-wide nature of permissions and less fiddly for the user who just has to stick the relevant line somewhere in their Python file.Note also the deliberate decision to use strings for permissions rather than a dedicated enum type. The autocomplete benefits of enums are largely obviated by the fact that the error message gives the user exactly the code they need to copy. And the typo-resistance is less important given that mistyped permissions will result in immediate permissions failures anyway. Attempting to use a dedicated
Permission
type turned out to be architecturally awkward and made for a more cumbersome user story as well, so strings felt like the better compromise.Closes: #2514 #2515