Skip to content

Conversation

evansd
Copy link
Contributor

@evansd evansd commented Sep 19, 2025

This adds a simple, generic permissions system to ehrQL. In this first instance this is only used to restrict access to certain tables, but I anticipate extending it to e.g. gate access to Event Level Data and for the various opt-out controls.

ehrQL table definitions can now be annotated with metadata using an internal _meta class. This can included a required_permission attribute e.g.

@table
class some_table(EventFrame):
    class _meta:
        required_permission = "some_restricted_dataset"

A check at the documentation build stage ensures that every restricted table references the required permissions somewhere in the table documentation.

In Production

The RAP Controller already supplies an EHRQL_PERMISSIONS environment variable with a JSON-encoded list of permissions that the current job has. If a dataset or measure definition uses a restricted table without the relevant permission then this results in an immediate error, before any queries are executed. For example:

EHRQLPermissionError: You do not currently have all the permissions needed for this action.

Missing permissions are:

    * appointments: required for access to the `tpp.appointments` table
    * waiting_list: required for access to the `tpp.wl_clockstops` and `tpp.wl_openpathways` tables

If you think this is a mistake and that you should have these permissions please contact OpenSAFELY support.

ehrQL then exits with a specific status code which the RAP Agent can identify and use to inform the user that their code failed due to a permissions error.

Locally

Running the same code locally currently results in a warning in the logs:

Some of the tables or features you are using require special permission to use with real
patient data. The permissions needed are:

    * appointments: required for access to the `tpp.appointments` table
    * waiting_list: required for access to the `tpp.wl_clockstops` and `tpp.wl_openpathways` tables   

You can continue to work on your code using dummy data by “claiming” the required permisions:

    from ehrql import claim_permissions
    claim_permissions("appointments", "waiting_list")

The intention is that once this change has been circulated and users have had time to made the necessary changes we can make this a hard error rather than a warning so it will be impossible to miss in future.

See the documentation preview here:
https://evansd-table-permissions.databuilder.pages.dev/reference/language/#permissions

Note that the claim_permissions() function acts globally, rather than being a dataset/measure specific method. This seems like a better fit with the project-wide nature of permissions and less fiddly for the user who just has to stick the relevant line somewhere in their Python file.

Note also the deliberate decision to use strings for permissions rather than a dedicated enum type. The autocomplete benefits of enums are largely obviated by the fact that the error message gives the user exactly the code they need to copy. And the typo-resistance is less important given that mistyped permissions will result in immediate permissions failures anyway. Attempting to use a dedicated Permission type turned out to be architecturally awkward and made for a more cumbersome user story as well, so strings felt like the better compromise.

Closes: #2514 #2515

Copy link

cloudflare-workers-and-pages bot commented Sep 19, 2025

Deploying databuilder-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: a476462
Status: ✅  Deploy successful!
Preview URL: https://9e5fc93d.databuilder.pages.dev
Branch Preview URL: https://evansd-table-permissions.databuilder.pages.dev

View logs

Copy link
Contributor

@suzannehamilton suzannehamilton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just had a look to see the shape of the change, so these comments are fairly bikesheddy rather than a thorough review!


from ehrql import claim_permissions

claim_permissions("some_permission", "another_permission")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will researchers know what values to use? Would it be worth linking to the schema docs below and/or including a couple of real examples like "waiting_list" and "isaric"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought was that the first time researchers would encounter this function would be from the warning (later error) message they get locally which will tell them exactly what names to use. And I worried that the real names might be a bit cryptic to anyone not familiar with the particular tables in question. I wanted to have something in the docstring, but I don't think it's ever going to serve as a comprehensive introduction to the topic.

class SelectTable(ManyRowsPerPatientFrame):
name: str
schema: TableSchema
required_permission: str | None = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safer for the default permission to be some placeholder value rather than None, so that all tables have to override it? Or is this something we'll definitely consider when we add a new table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the default here, in the query model, has to be None otherwise every single table we ever construct (which we do many times in tests) needs to be updated.

If we're worried about forgetting we could have a separate check that looks at every table in our official schemas and make sure they all explicitly specify a required_permission attribute, which may be None for many tables.

f"permissions assigned by the OpenSAFELY team. For more information see:\n"
f"https://docs.opensafely.org/ehrql/reference/language/#permissions"
)
# For the inital rollout of this feature we issue a warning locally but continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: inital

f"https://docs.opensafely.org/ehrql/reference/language/#permissions"
)
# For the inital rollout of this feature we issue a warning locally but continue
# running. Eventually we want to make this a hard error so that it can't be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Is doing this part of #2515 or should we make a new issue to make sure we don't forget it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to wait for feedback from the co-pilots before creating an issue for this because I'm not quite sure what steps we need to do first before we can reasonably make this a hard error. But yeah, I think it's a separate step which warrants a separate ticket.

Copy link
Member

@bloodearnest bloodearnest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This broadly makes sense to me. Interesting to see how it is plumbed in to the table definitiones and qm.

Left a couple of questions, more for my own understanding of future intent.

required_permission = "special_perm"


@function_body_as_string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, I had not see this decorator before! Evil/Genius hack! I love it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I was quite pleased with that. Makes defining test fixtures much more pleasant I think.

raise Error("Schema class must subclass either `PatientFrame` or `EventFrame`")

try:
table_name = cls._meta.table_name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that using a inner class is fine, we at least have the precedent of django to follow, folks should be somewhat familiar.

But for my own education, could you give an example of a speculative meta inner classes method that would be used for future dummy data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I haven't got as far as thinking what the API might look like. But there are a couple of tables which are currently special-cased in the dummy data generator:

def rows_for_patients(self, table_info):
row = {
"date_of_birth": self.date_of_birth,
"date_of_death": self.date_of_death,
}
# Apply any FirstOfMonth constraints
for key, value in row.items():
if key in table_info.columns and value is not None:
if table_info.columns[key].get_constraint(Constraint.FirstOfMonth):
row[key] = value.replace(day=1)
return [row]
def rows_for_practice_registrations(self, table_info):
# TODO: Generate more interesting registration histories; for now, we just
# assume that every patient is permanently registered with a single practice
# from birth
row = {
"start_date": self.events_start,
"end_date": None,
}
return [row]

And my thought was that we can move some of this logic to the table definitions themselves, and then apply it more broadly to different kinds of table.



def parse_permissions(environ):
return set(environ.get("EHRQL_PERMISSIONS", "").split(","))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AIUI, this supports a single purpose list, e.g. dataset permissions.

If we want to do T100 or old-patient-id flags, presumably we'd need to either have multiple env var for each, or have a more structured value for EHRQL_PERMISSIONS. This is obviously out of scope for this PR, but I was wondering how you had been thinking to handle that when we get to it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, strictly speaking I think we can still handle those things with just a flat list of permissions e.g. t1oo,untagged_patient_ids,event_level_data. There's nothing specifically about datasets in the permissions, that just happens to be the only thing which is currently checked.

But I do agree that we may want more structured data here and I wondered about switching to a JSON list rather than comma separated strings. I figured it would be easy enough to do that later and I didn't want to over-complicate things. but now you've raised it, maybe we would be better off tackling this now. It would be a trivial change in the controller.

@evansd
Copy link
Contributor Author

evansd commented Oct 1, 2025

Prompted by Simon's question, I've switch the format of the EHRQL_PERMISSIONS variable to be JSON rather than comma-separated strings. This gives us a bit more future-proofing and generally seems more robust. This means we need to merge and deploy this RAP Controller change first:

evansd added 16 commits October 1, 2025 12:39
We need to be able to do this when enforcing permissions.

This is a slight improvement on the status quo I think in that we now at
least have a proper type to represent a collection of measures and some
consistency between `Dataset()` and `Measure()` in that both now have a
`_compile()` method.

Measures are still in a strange half-state in being a lot like query
model objects, but not actually being part of the query model. But
sorting that all out is beyond the scope of this PR.
We do this via a Django-ish approach of defining an inner class. There
are other approaches we could take, but the big advantage of using a
class is that it makes it easy to include methods among the metadata
which I think we'll want to do for dummy data generation purposes.

In the first instance the only metadata property we support is supplying
a table name which differs from the class name.
This is slightly cleaner than the previous approach.
This offsets what is, to my mind, the biggest downside of the inner
class approach which is silent failure on typos.
When referencing this in the user docs I've tried to use some suitable
generic text which doesn't suggest that you can just ask for permission
and expect to be granted it. We may well want to tweak this on a
per-table basis though, depending on the exact reason for the
restriction.
I had previously tried to auto-generate the documentation here, but I
think we're better off having the flexibility to explain different
permissions in different ways. However I don't want us to be able to
forget to document them, or to typo the permission name, so we now
enforce this at documentation build time.
It's helpful if we can surface the error type to the user directly in
Job Server, as we do for certain specific kinds of database error.
@evansd evansd force-pushed the evansd/table-permissions branch from d8caaf8 to a476462 Compare October 1, 2025 11:39
@evansd evansd enabled auto-merge October 1, 2025 11:41
@evansd evansd merged commit 7e5a1af into main Oct 1, 2025
9 checks passed
@evansd evansd deleted the evansd/table-permissions branch October 1, 2025 11:56
evansd added a commit to opensafely-core/research-action that referenced this pull request Oct 2, 2025
This is now superseded by the ehrQL permissions system:
 * opensafely-core/ehrql#2543
evansd added a commit to opensafely-core/opensafely-cli that referenced this pull request Oct 2, 2025
This is now superseded by the ehrQL permissions system:
 * opensafely-core/ehrql#2543

Note that this PR should not be merged until the related
`research-action` PR has been merged, which removes the one usage of the
`opensafely check` command:
 * opensafely-core/research-action#115
evansd added a commit to opensafely-core/opensafely-cli that referenced this pull request Oct 2, 2025
This is now superseded by the ehrQL permissions system:
 * opensafely-core/ehrql#2543

Note that this PR should not be merged until the related
`research-action` PR has been merged, which removes the one usage of the
`opensafely check` command:
 * opensafely-core/research-action#115
evansd added a commit to opensafely-core/opensafely-cli that referenced this pull request Oct 2, 2025
This is now superseded by the ehrQL permissions system:
 * opensafely-core/ehrql#2543

Note that this PR should not be merged until the related
`research-action` PR has been merged, which removes the one usage of the
`opensafely check` command:
 * opensafely-core/research-action#115

Note that we deliberately leave the `repository_permissions.yaml` file
in place to avoid triggering errors on clients which haven't yet updated
and are still trying to fetch this file.
evansd added a commit to opensafely-core/research-action that referenced this pull request Oct 3, 2025
This is now superseded by the ehrQL permissions system:
 * opensafely-core/ehrql#2543
evansd added a commit to opensafely-core/research-action that referenced this pull request Oct 3, 2025
This is now superseded by the ehrQL permissions system:
 * opensafely-core/ehrql#2543
evansd added a commit to opensafely-core/opensafely-cli that referenced this pull request Oct 3, 2025
This is now superseded by the ehrQL permissions system:
 * opensafely-core/ehrql#2543

Note that this PR should not be merged until the related
`research-action` PR has been merged, which removes the one usage of the
`opensafely check` command:
 * opensafely-core/research-action#115

Note that we deliberately leave the `repository_permissions.yaml` file
in place to avoid triggering errors on clients which haven't yet updated
and are still trying to fetch this file. I've set a reminder to remove
this file in a couple of weeks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prevent jobs from querying restricted datasets without permission
3 participants