Skip to content

Conversation

@aragilar
Copy link

This takes a different approach than #10 by specifying the required RFCs and directing readers to existing resources rather than giving inline examples. It also specifies how it interacts with OpenID Connect (those endpoints can be used, but cannot be assumed), and calls out RFCs that should be used to improve security (PKCE plus the "Best Current Practice" RFCs) as well as flagging the future transition to OAuth 2.1 (which will affect at least one VO Data Provider).

I'm not sure what's the expected way to cite the RFCs in RECs, so that will need to be fixed. Also, I've probably been more terse than is good, so do feel free to expand the text.

I'm also not sure if we want to create a page on the twiki (and link to it here) documenting experiences implementing OAuth 2.x/OIDC (e.g. the JWT advice is from personal experience of not being able to log into one of our services because the JWT became so large it broke the redirects), or publish a note or add it to this document (none of which would be normative, but having implemented OAuth 2.x/OIDC clients and resource servers in both Python and JS/TS, you do need to be quite picky about libraries, and I expect Java would be similar and it would be good to collect such experiences somewhere).

Details of how OAuth 2.x and OIDC can be used within the VO are added,
namely:
* a list of the required 6 OAuth 2.0 RFCs needed for interoperability
* a list of the 3 OAuth 2.0 RFCs that should be implemented to improve
  security
* short discussions of OIDC (OpenID Connect) and OAuth 2.1
* links to tutorials and resources
* a list of RFCs that are worth considering, but which are not required
  for interoperability

This will need some changes to cite RFCs via bibtex (unclear what the
standard form is).
Whilst the two initial ``core'' OAuth 2.0 RFCs (namely \rfc{6749} and \rfc{6750})
assume that the three roles in the system\footnote{The client, the authorization
server and the resource server; for the VO a resource server would be a
service providing DALI endpoints, and the authorization server the OAuth 2.0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the leading spaces on lines 574-575 intentional?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's how my editor formatted the footnote, they can be removed.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's your PR, so you should remove them.

@mbtaylor
Copy link
Member

@aragilar, on the whole I agree with keeping the text in section 3 focussed on normative requirements by reference to existing RFCs, though as you say the text here could maybe be expanded a bit.

I think there might also be mileage in a worked example of what using these standards looks like, but that should appear in section 5 with the other examples. If such an end-to-end example appropriate to the usage we want here exists in some other document maybe we could point there?

However: I think it would be better to make sure we have something that works at least in prototype and is acceptable to client and service providers before working too hard on the wording.

My main question about the content is: does this recommendation provide an answer to the problem of how the client knows where it can use the token it acquires? This is what I've called "scope" in section 3.1, but should probably rename "domain" to avoid confusion. Without that, this is too insecure to be used. RFC 9728 mentions this issue, but I can't currently see how to apply its advice to avoid phishing type attacks by malicious servers.

Also: do you have a service that implements the system outlined here? I've been working with @jesusjuansalgado at SKA who has a service mostly aligned with what you've written, except that it doesn't currently use RFC 9728, instead a custom ivoa_bearer scheme pointing to RFC 8414 authorization server metadata as described in PR #10.

@aragilar
Copy link
Author

https://www.rfc-editor.org/rfc/rfc9728.html#name-authorization-server-metada specifies on the authorization server what resource servers are associated with the authorization server (and starting with the resource server directing the client to the authorization server, you have the consistent loop). That's why I stopped suggesting my original design, as the missing origin bit is provided by RFC 9728 (so there's no VO-specific components to the system).

I've been really busy the last few month (and will be until the end of October most likely) with the AAO move to main campus, but there's been some progress (I'm not sure exactly how far, but it's definitely not done) on the server-side of this with Data Central. As far as I know no-one has done anything on the Python client side for OAuth 2.x/OIDC, my plan (when I have time to get back to coding) is to add this to request-oauthlib/oauthlib (given it's all standard RFCs, and I think only RFC 9728 is the only part not supported yet) so pyvo wouldn't need have custom logic for OAuth 2.x/OIDC (and any other Python tools would be able to reuse the changes as well).

@mbtaylor
Copy link
Member

The protected_resources list seems to be in the wrong place, since it's supplied by a document defined by the resource server, and not reliably supplied by the server dispensing the tokens.

Probably I'm missing something, but I don't see what's to stop the resource_metadata from the www-authenticate challenge pointing to a spoof authorization server metadata document at evil.com that references endpoints from a real service astro.org.

For instance: client requests a resource from a service that it doesn't know is malicious:

   GET /data
   Host: s1.evil.com
   
   HTTP/1.1 200 OK
   WWW-Authenticate: bearer resource_metadata="https://s2.evil.com/.well-known/auth-protected-resource/data"

Client retrieves malicious authorization server metadata as directed:

   GET /.well-known/auth-protected-resource/data HTTP/1.1
   Host: evil.com
   
   HTTP/1.1 200 OK
   Content-type: application/json
   
   {
     "issuer": "https://s2.evil.com/data",   // not sure about this
     "token_endpoint": "https://astro.org/token",
     "registration_endpoint": "https://astro.org/iam/api/client-registration",
     "device_authorization_endpoint": "https://astro.org/devicecode",
     "protected_resources": ["https://s1.evil.com/data"],
     ...
   }

The client does dynamic registration followed by device authentication to obtain a token <TOKEN> from astro.org, in accordance with the endpoints in the ASM, then presents it to the original resource server to get the resource it originally asked for:

   GET /data
   Host: s1.evil.com
   Authorization: bearer <TOKEN>

And the malicious server now has the token. Is there some check that prevents this sequence? Or am I wrong about how this interaction plays out?

@andamian
Copy link
Contributor

I think (hope) I might have a possible solution for this @mbtaylor. I've been playing a bit with Keycloak so some of this is based on literature, some on my own admittedly limited experience with it.
The key here is for the VO ecosystem to mandate the use of aud claim in tokens. Every vo resource will have to be registered with the auth service they advertise for login. The discover mechanism by which clients can get access token with the appropriate aud for the resource server they are trying to access is not specified in the OAuth2.0/OIDC but RFC 8707 provides an alternative.

The request for an access token to astro.org must include the resource=https://s1.evil.com/<path_for_multitenant_resources> parameter. If that resource is registered with astro.org then an access token with aud=my-resource is issued otherwise that fails (TBD). At a minimum, every resource server must check if its resource ID is in the aud claim of the access token before proceeding. Note that the resourceID is the same with the client ID that is registered with the authorization server. That can be used by the resource service further down to do credential delegation by exchanging the token with a different one intended for a different resource/audience. The auth service checks for the two to match.

RFC 8707 is just one mechanism for getting the right aud. The use of audience parameter in the token request seems to be popular as well.

I haven't tried too hard to poke holes into this mechanism, but it's the most promising I came across. Not only that rogue players are dismissed but also the issued token is limited in what services it can be used for (and potentially what scopes it has within those services).

One interesting question is whether tokens from linked IdPs can be exchanged with tokens issued by the proxy auth service. Example: a data provider (DP) runs their own "proxy" auth service. The auth service allows for linking of user accounts with external IdPs, let's say GitHub. If I have my account linked with GH, and have a GH access token but want to access a VO resource in the realm of the DP, instead of login to the DP "proxy" auth service manually, can the client automatically exchange the GH token for an access token with the proxy auth service? Probably not, at least with clients. Browser might use cookies to make the flow look more automatic.

Has any of you considered this mechanism? Do you see a showstopper?

@mbtaylor
Copy link
Member

The suggestion to use the aud claim has come up before in these discussions. It may form part of the solution, but there are a few complications.

One is that mandating an aud claim in the token would require it to be a JWT, which may not be the token technology that all providers want to use.

That aside I see a couple of ways that audience restriction could work. The problem that needs to be solved is to prevent the client from using a token for out-of-domain resources.

One option is for the client to request audience restriction as described by RFC 8707. Then services could reject requests that specify an audience outside their domain (phishing). That would work for RFC 8707-aware services, but it requires the client to know in advance what protected resources it's going to want to access with the token. That's OK for e.g. TAP, but not good if there are multiple protected resources, perhaps on different hosts, that may be served by the same authentication, and the client doesn't know in advance which ones it will want to access; for example the DataLink scenario. RFC 8707 is the wrong way round for our requirements - we want the server to tell the client what resources the token is valid for rather than the other way round. But in practice it could be made to work to an extent, with one token request per resource server.

Another option is to require services to issue a JWT with an aud claim and require clients to decode the JWT and only use the token in accordance with its declared audience. That could work, subject to the comments above about mandating JWTs. I haven't so far seen a suitable description for the syntax of aud claims (maybe I've not looked in the right place), but if necessary we could make something up like a space-separated list of permitted origins. If we think that mandating JWTs is going to be acceptable across providers it could be worth pursuing this.

(I thought for a moment that RFC 7662 OAuth 2.0 Token Introspection might form an alternative to mandating JWTs here, but it doesn't work because for good reason the token introspection endpoint has to itself be protected by authorization).

@andamian
Copy link
Contributor

andamian commented Oct 31, 2025

I don't think that the mechanism is applicable to JWT tokens only, although we tend to prefer them for efficiency reasons (reduced number of network requests).
Opaque tokens are similar to JWT tokens except that they don't contain the actual information. Instead they are a key that can be used to access the info on the /introspect end point. So they can still have the aud claims associated with them which makes the authorization on the /introspect also work: only resource services with a client id that matches the audience of the opaque token can resolve them. In a way, opaque tokens seem more secure because they don't reveal anything about aud or other claims like the JWT counterparts do.

As for services that depend on each other (tap-datalink-soda). Auth services like Keycloak can be configured to return multiple entries for aud when they are related so that the same token can be used to call the tap service and the datalink service. Alternatively, the tap service and the corresponding datalink service can be configured with the same client id (and form a common aud).

So I think this still looks promising. The next step is to figure out the SSO in a federated environment but that it's probably for a different thread.

@rra
Copy link

rra commented Oct 31, 2025

I'm not sure I understand your proposed design with opaque tokens. Are you saying that every time a client obtains an opaque token, it needs to present that token to some server endpoint and get a list of URLs to which that token can be safely sent, and then before performing any client-side operation, it needs to check the URL against that list? What URL matching algorithm are you proposing — just scheme, hostname, and port, or also path?

I don't think you can assume in OAuth that the aud claim is a URL at all, or is comprehensible to the client, so presumably that would have to be an additional restriction standardized by the IVOA. Likewise with this hypothetical /introspect endpoint, I think (it's similar to the tokeninfo endpoint in OAuth, but isn't that endpoint only defined for ID tokens, not access tokens?).

Alternatively, the tap service and the corresponding datalink service can be configured with the same client id (and form a common aud).

This loses the security properties you're trying to create, no? Your scheme relies, so far as I can tell, on aud being exactly and precisely a list of API URLs to which it is safe to send that token, so that the client can check it against a service it is thinking about contacting. So services can't just use a common aud unless they're hosted at the same URL.

@andamian
Copy link
Contributor

andamian commented Oct 31, 2025

I don't know much about opaque tokens but token introspection is defined in RFC7662.
The generic flow that I have in mind for opaque tokens (with the typical actors: user, client application (PyVO, TOPCAT), auth server (AuthS), resource VO service(RS)):

1. Client accesses the <<capabilities>> end point of RS to find the corresponding AuthS
2. Client request access token for user for resource=<<capabilitiesURL>> # this prevents phishing
3. User authenticates with AuthS
4. AuthS has configured a map of capabilitiesURL to RS client IDs, so if it finds the entry, it generates an opaque token with the corresponding `aud`=`RS client ID` and returns the token
5. Client gets the access token from AuthS and uses it with a request to RS (<<sync>> or <<async>> or any of the other end points)
6. RS calls AuthS `/instrospect` to get user details
7. AuthS looks up the opaque token in its database, finds the corresponding `aud` and checks if it matches the client ID of the caller RS before returning all the info associated with the token back to RS.

Observations:

  • aud and RS client IDs configured in AuthS are arbitrary strings. They can be the RS URL for straightforward mapping but that it's not required.
  • The mapping of capabilities URL to client IDs in step 4 is a matter of configuration in AuthS. Multi tenant services from the same domain can share the same client ID or have separate ones.
  • It's up to the client to keep track of RS/access tokens if it wants to cache and reuse them for subsequent calls. Sending them to wrong services has no effect as those services do not have the appropriate client ID / client secret (symmetrical/asymmetrical keys etc) to be able to decode them on the /introspect end point. The opaque tokens only work with the RS they were created for.
  • The access token is issued for the RS and it's not bound to a domain (like cookies are). So a RS can have endpoints in different domains as long as those endpoints are configured to access the AuthS with the RS client credentials (client ID, client secret, etc)
  • JWT tokens work in a similar manner except that the /introspect call is not necessary. They however are more transparent (everyone can peak into the payload) and cannot be revoked like the opaque ones can.

@pdowler
Copy link

pdowler commented Oct 31, 2025

I find the use of client confusing, but I think in 6 the RS calls AuthS/instrospect using "client credentials" (it's own) and the incoming token... so every RS is known to AuthS and can prove it. Is that what you mean here?

@andamian
Copy link
Contributor

I find the use of client confusing, but I think in 6 the RS calls AuthS/instrospect using "client credentials" (it's own) and the incoming token... so every RS is known to AuthS and can prove it. Is that what you mean here?

I think that's a way (the only one?) for the AuthS to be able to issue service-specific access tokens and prevent misuse, accidental or not.

@rra
Copy link

rra commented Oct 31, 2025

Thanks, I didn't know about RFC 7662.

  1. RS calls AuthS /instrospect to get user details

You have just reintroduced the security problem we're trying to fix, no? The whole point is that you CANNOT do anything to solve this problem at the RS because it's too late. At that point the client has already sent the token to the attacker, who has replayed it to the RS. The check HAS to be done on the client.

This is why the aud strings have to be URLs that the client can match. That's central to your whole approach; otherwise, I don't think you're addressing the security concern Mark raised.

Sending them to wrong services has no effect as those services do not have the appropriate client ID / client secret

I think you are missing that the whole point of the attack is that the attacker relays the token to its legitimate service, thus allowing it to take arbitrary actions as the user on any service authorized in the aud list. The attacker doesn't talk to the AuthS system directly.

@rra
Copy link

rra commented Oct 31, 2025

I think this approach would work, although it's kind of painful to the client. But I can't think of anything better.

  1. User client follows some process to get an access token (whether OPenID Connect or OAuth 2)
  2. Authentication server provides an introspection endpoint along the lines of RFC 7662.
  3. User client sends the token to that endpoint (discovered somehow, maybe through the OAuth /.well-known endpoint? I haven't checked if it's present there) and looks at the aud result, interpreting it as a list of web origins (URLs without regard to the path component). It can do this once and store the results alongside the token or it can do it each time, depending on the situation.
  4. Whenever the client is about to make a request, it verifies that the URL to which it is sending the token matches an entry on that list. (This check must be done for every hop when following redirects.) If it's not on the list, it refuses to send the request with the token.

Note that step 3 has to be done a little carefully to ensure that the attacker can't insert its own introspection endpoint that returns bogus results. The user client has to be sure that it's reaching the trusted metadata endpoint for the service.

I think trying to register all possible paths under a web origin is not going to be viable at a lot of sites, so I would lean towards just matching web origins and saying you cannot run trusted and untrusted services from the same origin. That's generally already the case for unrelated JavaScript reasons, so that shouldn't be too much of a hardship.

@andamian
Copy link
Contributor

andamian commented Oct 31, 2025

There are a lot of details to sort out here. For example OAuth2.0 core spec does not define any discoverability mechanism. That's only defined in RFC8414 as .well-known/oauth-authorization-server. In OIDC, it's part of the core standard as .well-known/openid-configuration. The two have common info but are not exactly the same. Keycloak supports both. We just need to be mindful of this in AuthVO.

The approach you are suggesting uses the cookie model and relies on the client not to cross domains. I'm thinking more like a service oriented model where a client gets the capabilities of the service, and maps all its endpoints (regardless of of their domain) to the requested access token that it gets from the AuthS. As such, it sends the token with every request for which the URL starts with one of those end points.

As for the attack:

  1. If the attacker gets access to the bear token from the client then that's little we can do about that. It is know that a bearer token is like a short term password for that resource service
  2. If the attacker is a rogue server/service that the client sends a request with the token (by mistake since phishing is avoided), if the token is opaque, the attacker doesn't know which resource service it is mint for and what the scope that token has (scope is another line of defence that we haven't touched on) and it doesn't have the credentials to access the /introspect endpoint nor does it know who the AuthS that generated the token is. Without that info, there's little they can do about with the opaque token. Or are there other scenarios that I haven't considered?

@rra
Copy link

rra commented Oct 31, 2025

I think we're agreed that the goal is to prevent well-behaved clients from sending the token to an attacker's service. Bearer token authentication relies on the client safe-guarding the token; there's not much that can be done on a server if the token has been stolen.

I don't quite follow the specific details of your proposed alternative service-oriented model and how it prevents phishing. Maybe you could make it more concrete by laying out the steps the client would take the way that I did for the aud filtering model I proposed above? In particular, I am not sure what capabilities endpoint the client is retrieving, how it knows to associate that endpoint with a specific token provider, and what it is doing with the information that it retrieves so that it knows which token is associated with which services and to not send that token to other services. I'm particularly curious how this works when the client interaction is initiated by a user entering the base URL for some (possibly attacker-controlled) IVOA service that returns a WWW-Authenticate header to the client that is expected to be sufficient information for the client to authenticate.

I don't think any weight can be put on the opacity of the token. That's just security via obscurity. We have to assume the attacker knows exactly what service they're attacking or can just try all the services in some finite candidate pool until it finds one that works.

@andamian
Copy link
Contributor

andamian commented Nov 1, 2025

When a client attempts to access a VO service (for example, tap), it begins by probing the service’s /capabilities endpoint as described in AuthVO. For token-based access the response could be (this is my suggestion):

GET /tap/capabilities
   Host: rs.com
   
   HTTP/1.1 200 OK
   WWW-Authenticate: ivoa_oidc issuer=https://auths.com

This response indicates that the service (rs.com/tap) relies on the IdP issuer https://auths.com for authentication.

(I’ll skip the details of how the client discovers and authenticates to the IdP — those would have to be defined within AuthVO.)

A client then contacts the IdP (auths.com) to obtain an access token, specifying the intended resource: resource=https://rs.com/tap. If https://rs.com/tap is a registered resource with auths.com, the authorization server associates the resource’s registered client ID with the token’s aud claim when issuing the access token — whether opaque or JWT-based.

As a result, the issued token is valid only for that resource (https://rs.com/tap) or for other resource servers that share the same registered client ID, but not for unrelated or unregistered services that properly enforce audience checks.

This mechanism effectively limits phishing and token misuse: no access token will be issued for an untrusted or spoofed VO service unless it has been explicitly registered with a trusted IdP. In that sense, the trustworthiness of resource services is anchored in the registration policy of the IdP.

At the next level, as @mbtaylor noted earlier in the thread, lies the question of trust between IdPs themselves — an area being addressed by ongoing federation efforts in OAuth2/OIDC (e.g., OpenID Federation 1.0). For now, VO clients (such as domain-specific services) can be configured to trust only a set of known IdPs within a federation, ensuring they never interact with unrecognized ones.

However, for general-purpose clients like PyVO or TOPCAT, this still is an issue. A malicious site could easily spoof the login page of a legitimate IdP and obtain user credentials. To mitigate this risk, such clients should avoid initiating device or browser-based flows on their own and instead rely on access tokens obtained through trusted, out-of-band mechanisms.

@rra
Copy link

rra commented Nov 1, 2025

Oh, I see. Yes, that works too, at the cost of making the client go back to the auth server for a new token every time it wants to contact an additional service. I think that will end up a bit like Kerberos where the client has a whole cache of tokens and has to know which one to use for any given service, although maybe one could add some API to add a scope to an existing token. (That should preserve the same security properties unless I'm missing something.)

I think I like this better than the aud approach. It's still a bit awkward, but I think it will be harder to mess up and it should be pretty effective at pushing simple clients to do the right thing. The client is going to need a fair bit of code support, though, given that every time it follows a DataLink record or whatnot it's going to have to get a new token (or enhance its existing one).

I have some ideas about how to handle device auth flow in a way that isn't quite so vulnerable to phishing (by making the user go to the authentication site and obtain a per-client private URL to use for device auth flow first), but that does come at the cost of not allowing everything to be handled automatically by the client after it's pointed to a service URL. I don't think there's any way to both have phishing resistance and to automatically initiate an auth flow after nothing more than pointing a client at an IVOA service. Those two goals seem fairly deeply conflicted to me, although maybe there's some neat approach I'm not seeing.

@jesusjuansalgado
Copy link

jesusjuansalgado commented Nov 3, 2025

Hi @aragilar, It’s fine to reference the RFCs that could be applied within the specification, but without a clearly defined workflow, I don’t think we’re actually solving the problem. This new approach to incorporating OAuth2 RFCs list doesn’t really clarify what changes are needed. I still believe this work should continue within ivoa-std/AuthVO#10, adding the RFCs references used but providing a clear guideline on how to use them in the IVOA scope.

A few specific comments on the discussion:

  • Token introspection cannot be performed using dynamically registered clients. That limits the use of this approach.
  • Yes, the audience claim could be the key to solving the security issue, but it could be complex to implement. Setting a default audience for dynamically registered clients could be a path forward, but that requires changes in the token issuers' configurations (that are used for a lot more things than for VO)
  • We should avoid requiring configuration changes in existing token issuers like Indigo IAM or Keycloak — these systems are outside the scope of IVOA, and the less we depend on them being modified, the better.
  • I think we should focus only on authentication tokens (OIDC), as proposed by @andamian in the past. After testing prototypes, I agree that this is the right direction. The VO client only needs to convey “who the user is” to the VO resource; authorisation decisions are more complex and handled separately.

One possible way to address the security concern would be to introduce a new IVOA registry entity — for example, an authorisation provider — that maintains a list of services allowed to request tokens (or cookies) from a given issuer.
The flow would be:

  • The client receives a token issuer URL.
  • The client queries the IVOA registry to verify that the target service (its top-level domain or identifier) is included in the list of trusted services for that issuer.

So, similar to the solution discussed, but solving the problem at the IVOA level. This approach avoids changing or complicating standard OAuth behaviour, keeps the solution within the IVOA scope, and could even be reused for other authentication types. For example, cookie-based authentication could use a similar rule, as now we are checking that the login domain matches the service domain and this may become a limiting factor in the future.

@mbtaylor
Copy link
Member

mbtaylor commented Nov 3, 2025

Use of RFC8707 as suggested (where the client specifies a target resource for each token requested) does not play all that nicely with the RFC 8628 Device Authorization Grant a.k.a. device flow.

As noted, the client will need a different bearer token for each different resource or group-of-resources it needs to access - where the maximum size of a group-of-resources is all the resources at a given origin. So in practice if it needs to access resources across multiple hosts it will need to acquire multiple bearer tokens. But in Device Authorization Grant the token request to the Authorization Server requires user interaction via a browser (navigating to a web page and entering a code). So every time the client needs to access resources from a new origin served by the same AuthS, the user has to do the business with the browser and the code. This isn't impossible, but it could be a bit tedious for the user.

Possibly this might be not so bad if RFC8252 Native Apps flow (suitable for some usage scenarios but not others) is used instead? I haven't tried that out so I'm not sure.

@jesusjuansalgado
Copy link

jesusjuansalgado commented Nov 3, 2025

If I remember correctly, we decided not to use the Native App Flow (RFC 8252) because it requires a browser to be available on the client side, which is unsuitable for headless tools or scripts. For those cases, the Device Code Flow (RFC 8628) remains the recommended approach.

When combined with RFC 8707 (Resource Indicators for OAuth 2.0), a workflow that needs to access data from multiple origins must indeed repeat the device flow for each issuer/resource combination. This is a strong limitation, although it provides better token scoping and prevents misuse across unrelated services. Also, support for the resource parameter is still inconsistent — for example, Keycloak 21+ only supports it partially, and Indigo IAM varies depending on configuration. It is optional for many issuers.

In a typical Device Code Flow, a device registration is tied to the issuer rather than to each individual service. Once registered, the same client/device can be reused for all services linked to that issuer — this association can be resolved through the IVOA Registry, as I proposed. The client can also reuse the same OIDC ID token for all related VO services, as long as the token is valid (or refreshed when needed). Yes, users are forced to create a device registration per issuer providing their credentials in an external browser but, in fact, this is the improved security approach where credentials are never written in other places than the issuer-controlled pages.

This is why I continue to advocate for covering only OIDC-based authentication in the IVOA specification — letting the VO client simply prove "who the user is" while leaving authorisation (access rights, resource policies, etc.) to the communication between the VO service and the IAM service. This approach greatly simplifies the client workflow and is consistent with the prototypes we are developing within SRCNet.

Finally, I believe the IVOA Registry approach (proposed in the previous comment) solves the cross-origin security problem in a VO context. It provides a federated trust mechanism that does not require any modification to external IAM services like Indigo IAM or Keycloak, keeping the solution lightweight, interoperable, and specific to IVOA needs. It is, in summary, moving the "allowed_origins" to the IVOA registry so we do not need to touch anything inside our IAM services or modify the tokens structure adding optional fields. In a previous discussion, the IVOA registry approach mentions to check the server registration and this is not valid for datalink services as they are not registered but searching the token issuer in the IVOA registry and getting the services that are allowed to ask for tokens for this issuer solves totally the problem in my view.

An example of this registration would be:

<ri:Resource
    xmlns:ri="http://www.ivoa.net/xml/RegistryInterface/v1.0"
    xmlns:auth="http://www.ivoa.net/xml/AuthVO/v1.0"
    xsi:type="auth:CredentialsProvider"
    status="active"
    updated="2025-10-06">

  <ri:identifier>ivo://ivoa.net/auth/iam.example.org</ri:identifier>

  <ri:curation>
    <ri:publisher>Example Credentials Issuer Service</ri:publisher>
    <ri:contact>
      <ri:name>Authentication Support</ri:name>
      <ri:email>[email protected]</ri:email>
    </ri:contact>
  </ri:curation>

  <!-- Bearer token authentication (OAuth2/OIDC) -->
  <auth:method type="bearer_token">
    <auth:issuer>https://iam.example.org/</auth:issuer>
    <auth:discoveryURL>https://iam.example.org/.well-known/openid-configuration</auth:discoveryURL>

    <!-- VO services allowed to advertise this IAM in their challenges -->
    <auth:allowedServices>
      <auth:service>https://data.example.org/tap</auth:service>
      <auth:service>https://data.example.org/datalink</auth:service>
      <auth:service>https://archive.example.org/soda</auth:service>
    </auth:allowedServices>
  </auth:method>

  <!-- Cookie-based authentication (for browser sessions) -->
  <auth:method type="cookie">
    <auth:issuer>https://cookie.example.org/login?</auth:issuer>
    <auth:allowedServices>
      <auth:service>https://portal.example.org/</auth:service>
      <auth:service>https://notebook.example.org/</auth:service>
    </auth:allowedServices>
  </auth:method>

</ri:Resource>

And the workflow is:
When a VO client encounters:
WWW-Authenticate: Bearer discovery_url="https://iam.astron.nl/.well-known/openid-configuration"
it should:

  • Fetch the discovery document.
  • Extract the "issuer" value.
  • Query the IVOA Registry for a matching CredentialsProvider with that issuer and authentication method.
  • Verify the service it’s contacting (https://data.astron.nl/tap) is listed under auth:allowedServices.
    If yes → proceed to token acquisition via the discovery or registration endpoints.
    If not → warn user (ask to enable trust for this service if the service is under development as it could not be officially part of the registration yet) or reject (potential phishing).
  • Continue with the device code flow

BTW, this discussion will be covered during next DSP session at the interop so RESERVE THE DATE/TIME as we need to have all of you there!

@rra
Copy link

rra commented Nov 3, 2025

The client queries the IVOA registry to verify that the target service (its top-level domain or identifier) is included in the list of trusted services for that issuer.

I think this works if and only if the IVOA registry is trusted. If the client hard-codes the URL to the IVOA Registry of Registries or some similar global service (and does TLS validation properly, etc.), requires the token issuer be listed there, and refuses to send the tokens to any services not part of the same registration entry, then that moves the problem to the vetting process for incorporation into that IVOA registry. That's probably a reasonable bar as long as it involves some level of human validation (I don't know what the current process is). An attacker could almost certainly socially engineer this with enough determination, but this is probably a higher bar than the model attacker of an IVOA service would bother with.

If the client supports authentication to any untrusted registries or unregistered services, though, this breaks down. The attacker can just create a registry listing all the evil.com services with trusted.com as a token issuer and we're back to the same problem provided the attacker can get the user to start from their registry (via typo-squatting or the like).

It's a trust bootstrapping problem, basically. You have to start from a trusted source and then validate each link, or the attacker will just replace the first hop past an untrusted link. You can assume trust in the authentication server because at least in theory the user is doing mutual auth to the authentication server (in practice, this is a whole other area of security problems, but those are at least outside the scope of the IVOA and have to be solved anyway to prevent general phishing of the auth provider for all uses). But that's the only piece you get for free from the protocol; you have to validate each hop after that point.

If you can declare the registry trusted, that introduces a second trust point and makes the problem much easier, but then you do have to follow through and make sure the registry really is trustworthy.

Note that you can combine the two models to allow for additional registries for unregistered services. The authentication server could, instead of listing all the origins of all services in the aud claim, instead list the origin of a registry in the aud claim. That allows bootstrapping the link to the registry server from the auth server, and then the registry server can vouch for all the services. That might be a bit more straightforward for the client if it already has a registry client implementation (but it would probably need to support publishing registries since I suspect not everyone with unregistered services is going to want to run a local searchable registry).

@andamian
Copy link
Contributor

andamian commented Nov 3, 2025

@mbtaylor – you’re right that the device flow and RFC 8707 don’t work together very smoothly. Unfortunately, there’s no straightforward fix for this, but there are a few possible workarounds:

  1. The authorization server (authS) could be configured to return multiple related aud values. RFC 8707 allows—and even recommends—this in Section 2.2. The client should check the aud of any access token it already holds before deciding to re-run the device flow. This implies that aud values must be resource URLs.
  2. Another option is to perform a token exchange when accessing a new service, assuming the client already has a valid access token from the authS. IMO, this approach is more generic, but since it’s not standardized, it can be harder to implement consistently.

Some services may only require authentication for convenience. For example, public TAP services might use authentication simply to let users list their UWS jobs, in which case enforcing aud checks might not be necessary. As a result, RFC 8707 support could be optional in the service’s discovery entry.

If RFC 8707 isn’t supported (for example, by the authS), an alternative is to include the resource directly in the token scope request. For instance, scope=https://rs.com/tap/table.readonly could be used to return the appropriate aud for the resource https://rs.com/tap.

@jesusjuansalgado – creating a VO federation could be an excellent idea, and the VO registry provides the necessary technology to enable it. However, a trusted federation would also require active oversight and enforcement. Someone would need to ensure that all registered participants adhere to a minimum set of standards regarding security, privacy, and related policies. For example, CILogon, a similar identity federation, implements the AARC Blueprint Architecture and the REFEDS Assurance Framework, and maintains policies and dedicated staff to enforce compliance. It’s something worth considering.

I’m looking forward to some interesting discussions at the Interop, but I also think it’s helpful if we start brainstorming ideas and identifying potential issues here in advance. That way, we’ll have time to experiment and research before we meet.

@rra
Copy link

rra commented Nov 3, 2025

warn user (ask to enable trust for this service if the service is under development as it could not be officially part of the registration yet)

Be very careful with warnings of this type, since if they happen for more than a very small percentage (<5%? <1%?) of legitimate use cases, you are just training the users to blindly click through the warning and you've essentially undermined the security mechanism. A lot of users will do that anyway even if the warning is safely <1% of legitimate cases because that well has been poisoned by lots of other warning fatigue in other software and the user has already been trained to dismiss all warnings without reading them.

A nice advantage of the aud system is that, since it doesn't require any external registration process, it's less likely that you will need to temporarily be in a state where you need to allow an insecure access with a warning.

@jesusjuansalgado
Copy link

jesusjuansalgado commented Nov 3, 2025

@rra

warn user (ask to enable trust for this service if the service is under development, as it could not be officially part of the registration yet)
Be very careful with warnings of this type, since if they happen for more than a very small percentage (<5%? <1%?) of legitimate use cases,

When a user is trying a non-registered service is because the engineer is developing a server and is testing it before formally registering it (or it is using a validator). The user is the service developer. So, I would say that there is no option to have this problem in a phishing exercise

About the trusted IVOA registry, the registering of services like a credentials issuer (or whatever we call it) would be quite unusual. One per project (with sometimes sporadic updates). It could be coordinated with the registry team but I think it could be totally controlled. It is a solution for the VO clients only but this is in fact our goal.
In order to use this possible vulnerability, an evil server needs to be created and an evil IVOA registry entry with a real token issuer has to be created , including the evil service as a valid service and without raising any alert (and, then, a user should try to access the evil service). Unfortunately, the number of registration in the IVOA registry is not so high and registration do not happen so often to do not notice it. I think an extra verification control by registry operators could be done for the credentials issuers.

Asking modifications of our token issuers due to a problem found for VO services, including special behaviours, optional metadata, or any other trick, although, probably, more elegant, could be blocked, as those servers are used not only for the IVOA. This is why I am reluctant to go into this line unless it is something so clear and so easy that we do not face problems proposing it to our organisations.

@rra
Copy link

rra commented Nov 3, 2025

The user is the service developer. So, I would say that there is no option to have this problem in a phishing exercise

Service developers and testers who would regularly run into this use case are exactly the people I'm the most concerned about getting phished because they usually have higher levels of credentials. Developers are not immune to warning fatigue.

That said, if the registration with the IVOA Is by web origin, hopefully everyone will just register all of their domains and that would cover undeployed services and unregistered services as well, without needing to support "warn and continue" in the client. Although those services are not in a regular registry entry, they're still covered by the origin policy on the authentication mechanism.

For the rest, I think you're saying the same thing that I was saying, in which case we agree: if the client always starts from the IVOA-maintained registry, the security bar is probably fine. The question is whether that's an acceptable client policy, since it's quite restrictive. I would have assumed that accessing unregistered services would be a fairly common need, but maybe I'm wrong about that.

I'm probably biased by the fact that currently none of our services are registered because we've not gotten around to standing up a publishing registry yet, but of course this is just a temporary state of affairs while we're getting things ready. But I will say that we already have around 10 separate deployments each with their own authentication services that would require separate registration entries to be usable by a standard IVOA client in this design, and will probably be adding more. I think we would have to register every environment that someone might want to use a standard IVOA client with (TOPCAT, PyVO, etc.), which would include internal and test/development environments. Is that going to be okay with the folks maintaining the IVOA registry?

@jesusjuansalgado
Copy link

jesusjuansalgado commented Nov 3, 2025

The user is the service developer. So, I would say that there is no option to have this problem in a phishing exercise
Service developers and testers who would regularly run into this use case are exactly the people I'm the most concerned >about getting phished because they usually have higher levels of credentials. Developers are not immune to warning fatigue.

Maybe I have not explained it properly. The vulnerability is a possible evil service created by a hacker. In the testing phase (before the registration), the tester is the hacker.
If the service is a real one, the engineer will see the warning telling that the service is not registered. Services are invisible (not findable) for the real users before registration.

In any case, we could deny access and ask validators to add an option to just warn (the validation before registration is also executed by the engineer that is developing the service)

@rra
Copy link

rra commented Nov 3, 2025

If the service is a real one, the engineer will see the warning telling that the service is not registered. Services are invisible (not findable) for the real users before registration.

The concern since the start of this discussion has been an attacker convincing a user to go to an attacker-controlled service, and then using that to steal their credentials. The defense against that in your design is that the client refuses to send credentials to unregistered services unless the user says "yes, this is okay." But if the user is used to regularly dismissing that warning and connecting anyway because they regularly test services under development, that user is now vulnerable to exactly the phishing attack that we have been trying to prevent in this discussion: The attacker via one mechanism or another gets them to visit an attacker-controlled service in a client, and they dismiss the warning without reading it because that's what they always do.

I'm not sure where our assumptions differ. Maybe you're assuming that users will always start with a registry search to find a service? I don't think that's true. (Also it's worth noting that developers and testers frequently are also users of other random astronomy services and may well want to look at some interesting image from some SIA server or whatever. Lots of people in astronomy wear multiple hats and switch contexts constantly.)

There are other things we can do that have better security properties than warn and continue but are still entirely on the client, like adding a configuration allow list of unregistered services for a particular token provider and making developers and testers edit a config file. They have various different UI and friction trade-offs. The thing I like about a strategy involving the aud claim (or even some other extension claim in a JWT ID token) is that you can automate this whole process via configuration on the auth server and avoid making the user deal with any UI issues, and also avoid having to round-trip changes through the IVOA for internal deployments.

I understand that you don't like this approach because you don't want to assume that sites can change the behavior of their auth servers, and I am sympathetic to that. But I can change my auth server, so I'd rather adopt a solution that removes the problem from the user UI and can be enforced in software.

@jesusjuansalgado
Copy link

@rra, I get your point, but I think it is not possible. If a service has been created to do phishing, this service should be exposed and findable in some way for the users before to be able to do the phishing. There is no other way that a client is able to find it unless it is exposed, so, registered (if it is a datalink that is not registered, the service is obtained inside a registered TAP, but, again, it will be under a registered service, the TAP). No one knows about this service except the hacker until it is findable.
For the validator tools, the credentials to be used are the ones testing the service (again, the hacker).
In general, there should not be lists of unregistered services.

As said, we can deny access (the warning was an "or") as the normal behaviour. We could enable a special mode for validators to change to a warning for testing that includes a totally active action by the engineer, and flag it like "do not use for other services than yours".

In a separate conversation with Mark, I also explored the "aud" approach some weeks ago, but the approach of adding this aud only for tokens to VO clients (as we could have other non-VO dynamically registered clients, and we want a normal behaviour for them) was tricky. How do you recognise a VO client? We did not want to have a list of pre-registered clients due to the complexity of storing secrets in public clients.
In general, trying to solve the problem by modifying the behaviour of general services (like all the token issuers that could be involved into VO tokens), adding optional configuration, forcing versions compatible with the update,... only to cover a problem for VO clients, is a path that, in my view, is overcomplicating the solution and difficult to justify to our organisations/federations. Having a solution that uses existing VO services like the registry looks more reasonable and controllable.

@aragilar
Copy link
Author

aragilar commented Nov 4, 2025

Yes, having to do VO-specific things is definitely not ideal, hence why I've pushed this RFC-based route (so that the code can be upstreamed to the various projects, if the support is not already there). On who supports RFC 9728 (which is the newest RFC), it does look like keycloak will gain support for it soonish, as support for the Model Context Protocol Authorization spec is being added (which depends on RFC 9728, see keycloak/keycloak#41521 and https://modelcontextprotocol.io/specification/draft/basic/authorization). I can't see anything in Indigo IAM about RFC 9728, but it would seem better to me to support for RFC 9728 than to do our own thing.

On JWTs, the issue we've found is if you include user profile information in them, they become excessively large and cause strange failures (because they need to be attached to GET requests). If they were minimal and just included sub iss etc. I don't think they would be excessively large.

@andamian
Copy link
Contributor

andamian commented Nov 4, 2025

@aragilar -I don’t think Keycloak’s lack of support for RFC 9728 is a major issue in this context. As far as I can tell, the main missing feature is support for the protected_resources metadata. However, this gap can largely be addressed by using RFC 8707 (Resource Indicators) and aud (audience) claims, if those are important for the resource server.

One open question is whether there’s a way to indicate in the resource metadata that a given resource requires RFC 8707 support. The info seems important for clients trying to decide whether to re-use an existing access token or exchange it.

The bottom line - do we converge on this:

www-authenticate: ivoa_bearer meta_url="https://example.com/.well-known/oauth-authorization-resource"

or similar?

The trust issue will remain regardless. As the specification itself notes, potential attacks such as phishing are still possible. For now, we can simply highlight these risks in the document.

@jesusjuansalgado
Copy link

@rra The scenario you describe corresponds to an email phishing attack, albeit with a relatively complex workflow. In practice, phishing attempts are far more likely to succeed through the use of simple malicious links sent by email than through links to protected data products that require opening a VO client (e.g. TOPCAT) to obtain a short-lived token. In the latter case, an attacker could send a link that prompts the client to request a token, but the process would fail because the service is not registered with the token issuer—or because the issuer itself is unregistered. Under normal VO client configurations, such an attack would generate an alert and fail gracefully.

A more realistic phishing attempt would be a traditional one, for example:

“Dear user, due to increased resource availability, you can now raise your quota in the SRCNet. Please log in at http://iam.evil.com/
to activate it.”

Such attacks, which mimic legitimate identity provider pages to capture credentials, remain common but are outside the scope of the specific problem we are addressing here.

As I mentioned to Mark and intend to discuss further at the next Interop, it is technically quite straightforward to modify or replace a VO client (including those embedded in science platforms) so that it captures credentials entered in plain text. This risk primarily affects clients that accept credentials directly—for instance, for Basic Authentication or certificate-based login. I am also concerned about other unverified methods. In contrast, OpenID Connect mitigates these risks by ensuring that users enter their credentials only on the identity provider’s web page, not within the client itself.

Regarding the use of the aud (audience) claim, I agree it is an elegant solution in principle, but in practice I see significant difficulties in restricting it to VO clients, as there is no straightforward way to identify them. Moreover, not all authorization servers support this feature consistently. Since VO relies on existing, general-purpose identity infrastructures, it would be challenging to persuade all identity provider administrators to modify their configurations in a VO-specific way. For that reason, I currently believe this path is unlikely to be feasible.

@jesusjuansalgado
Copy link

jesusjuansalgado commented Nov 4, 2025

@aragilar @andamian
I think using RFC 9728 can be hacked (I think Mark already mentioned this) with this flow:
First:

% curl https://evil-server.com/data"](https://evil-server.com/data
HTTP/1.1 401 Unauthorized
WWW-Authenticate: Bearer resource_metadata="https://evil-server.com/.well-known/oauth-protected-resource/data"

Then, we read the URL

% curl https://evil-server.com/.well-known/oauth-protected-resource/data
{
resource: "https://evil-server.com/data",
"authorization_servers": [
"https://evil-server-auth.com/"]
,
...

}

And, finally, reading that evil openid-configuration:

https://evil-server-auth.com/.well-known/openid-configuration
the output, is my real ska openid-configuration including my token issuer URL +
"protected_resources": [
" https://evil-server.com/data",
...
]
}

So, the original evil service is authorised to get the token

@rra
Copy link

rra commented Nov 4, 2025

I think we're talking past each other about phishing. I can't figure out if you just think my example is unlikely or if you don't agree that it works. Here it is step-by-step:

  1. Attacker posts to a forum or sends an email or whatever saying "hey, these TAP results are interesting, here's my query" and includes a tinyurl link (not inherently an unreasonable thing to do given that sync TAP queries with the ADQL in the query parameter can result in long and annoying URLs).
  2. Target thinks "oh let me open this up in my favorite VOTable viewer / script / whatever" and pastes in that URL
  3. The URL goes to, I dunno, data.1sst.cloud instead of data.lsst.cloud and returns a 401 response
  4. The target's client or library or whatever looks at the WWW-Authenticate header, parses the ivoa_bearer challenge, sees that it's asking for data.lsst.cloud (the correct name) credentials, and goes "oh, I already have those stored"
  5. Target's client sends the request again with its bearer token, and now bad things have happened.

At what step do you think this attack would fail? Or do you think all the steps do work but we just think we shouldn't care about it because targeted phishing attacks like this are unlikely? Honest question: I'm not sure which is your objection.

I agree it is an elegant solution in principle, but in practice I see significant difficulties in restricting it to VO clients, as there is no straightforward way to identify them

Why are you restricting it to VO clients? I still don't understand why you would even try to do this.

@frossie
Copy link

frossie commented Nov 4, 2025

I can reassure @rra that the attack vector he is trying to address absolutely can happen in astronomy. In fact, "click here to see the same search results" is routinely one of our most asked-for features, and users make use of them all the time where available (by posting them on slack etc). Additionally, it is now routine to include such clickable/pasteable queries in papers, and even issue DOIs to them which trains users not to object to not seeing the whole URL before clicking it.

@aragilar
Copy link
Author

aragilar commented Nov 4, 2025

@jesusjuansalgado I don't think your example makes sense, you have the evil-server.com with a service identifier which doesn't have matching origins. This is explicitly called out in https://datatracker.ietf.org/doc/html/rfc8414#section-6.2 as something clients MUST check.

I do think the attack @rra is the attack we should be concerned about (and I think it's the same concern @mbtaylor raised previously?). We need some process to establish the two way binding between the resource and authorization servers that the client can verify. Either the token contains the information (which requires a specific format for the tokens), or both the resource and authorization servers need to have information to point to each other (and client verify both ways before starting whichever OAuth flow is used).

@rra
Copy link

rra commented Nov 4, 2025

BTW, I have the same concern that @mbtaylor raised about RFC 9728: it looks great when used with protected_resources on the authorization server side as well, except that I'm not seeing why the attacker can't provide fake RFC 8414 metadata with an attacker-controlled issuer that matches the attacker-controlled domain, but that points to a real authorization_endpoint and token_endpoint on a different domain.

Is it because the resulting ID token will have an iss claim that must match the issuer in the metadata and the client can then detect that mismatch and realize after authentication that it has been using invalid metadata and cannot proceed? It feels like there needs to be a verifiable link here, and there probably is and I just don't know the details of the protocol well enough to spot it.

@aragilar
Copy link
Author

aragilar commented Nov 4, 2025

Maybe I'm missing something then? You have 4 (sets of) urls:

  1. The initial url (on the resource server)
  2. The RFC 9728 url under /.well-known/ (on the resource server)
  3. The RFC 8414 url under /.well-known/ (on the authorization server)
  4. The various OAuth challenge urls (on the authorization server)

1 and 2 must have the same origin (https://datatracker.ietf.org/doc/html/rfc9728#name-impersonation-attacks), and 3 and 4 must have the same origin (https://datatracker.ietf.org/doc/html/rfc8414#section-6.2).

Therefore, if we have a verifiable link between 2 and 3, we have a verifiable link 1 to 4 correct (which is really all we care about)? RFC 9728 then specified that link via authorization_servers field on the resource server and protected_resources on the authorization server.

@rra
Copy link

rra commented Nov 4, 2025

How does section 6.2 of RFC 8414 guarantee that 3 and 4 have the same origin? That's what I'm missing. If that's guaranteed, then I agree, that handles the problem and this is great and solves all of our issues, I think. But so far as I can tell, all section 6.2 requires is that the issuer have the same origin as the metadata. Great, but what requires the authorization_endpoint and token_endpoint to have the same origin as the issuer? Those are the endpoints we actually care about because that's where we're going to get the token from that the attacker is trying to steal.

I think 6.2 is trying to solve the reverse problem where the attacker is trying to steal submitted credentials by pointing at their own authorization_endpoint but using the real issuer. But in our case the service discovery of what issuer to use is under the attacker's control, so they don't have to use the real issuer and they do want to use the real authorization_endpoint.

I think the answer may be https://datatracker.ietf.org/doc/html/draft-ietf-oauth-mix-up-mitigation-01, although that's not published as an RFC (yet). This seems to confirm my guess that the client has to check the iss returned from the authentication interaction and ensure that it matches the issuer from the metadata, and that's the last piece to ensure everything is consistent. (And likewise for the token issuer, I presume.)

I have not checked if there's a similar way to verify the issuer identifier through the device auth flow, but I assume there is.

If this is the right solution, that was pretty hard to find (at least for me) and it adds a pretty important additional client requirement, so we'll want to make sure we document this very explicitly.

@aragilar
Copy link
Author

aragilar commented Nov 4, 2025

Huh, you're right in that RFC 8414 doesn't actually explicitly say that that they all need to be the same origin (neither does the OIDC discovery spec), so you could have different origins for each of the endpoints (which is an interesting choice). I think the OAuth mixup draft RFC is replaced by https://datatracker.ietf.org/doc/html/rfc9207 which is referred to in RFC 9700 (which I had on the "should" list, but maybe it needs to move to the "must" list?).

I wonder if the best solution is for me to email the IETF oauth working group (https://datatracker.ietf.org/wg/oauth/about/) and ask if this was intended or was an oversight (or if there is somewhere in one of the various RFCs which means the origins in RFC 8414 must be the same, in which case they should probably make an errata noting this).

@aragilar
Copy link
Author

aragilar commented Nov 4, 2025

https://mailarchive.ietf.org/arch/msg/oauth/RjbSwFRmLsk0EgAY2Ter-nw66EY/ and https://danielfett.de/2020/05/04/mix-up-revisited/ would imply @rra is right about RFC 8414, and RFC 9207 is required. Keycloak implements this (https://www.keycloak.org/securing-apps/specifications), I can't see what Indigo IAM supports (there doesn't appear to be a document listing what it supports).

@jesusjuansalgado
Copy link

jesusjuansalgado commented Nov 4, 2025

Hi @rra,

about the attack you clearly describe:

  • Attacker posts to a forum or sends an email or whatever saying "hey, these TAP results are interesting, here's my query" and includes a tinyurl link (not inherently an unreasonable thing to do given that sync TAP queries with the ADQL in the query parameter can result in long and annoying URLs).
    OK
  • Target thinks "oh let me open this up in my favorite VOTable viewer / script / whatever" and pastes in that URL
    OK
  • The URL goes to, I dunno, data.1sst.cloud instead of data.lsst.cloud and returns a 401 response
    OK
  • The target's client or library or whatever looks at the WWW-Authenticate header, parses the ivoa_bearer challenge, sees
    that it's asking for data.lsst.cloud (the correct name) credentials, and goes "oh, I already have those stored"

Well,... in my proposal, the client should parse the discovery file, get the issuer URL (or, even better, token URL) (e.g. iam.lsst.cloud/token), go to the IVOA registry to find this "credentials issuer" entry associated to iam.lsst.cloud/token, check the allowed services and then the client will find that data.1sst.cloud is not part of the list associated to this credentials provider service IVOA registry entry so it will raise an error. The cycle of device code flow (register client, and device, adding credentials into the issuer page) should be always done.
If we want to reuse tokens, I would say that tokens could be reused only if the relation (in memory and per session) , is the same from a list. In this case <data.1sst.cloud>,<iam.lsst.cloud> is new.

  • Target's client sends the request again with its bearer token, and now bad things have happened.
    I hope not!

Do you think this is fine or are there security holes in the workflow in your opinion?

@jesusjuansalgado
Copy link

@aragilar also, in the original proposal we went directly to the configuration file:

% curl https://server.com/data"
HTTP/1.1 401 Unauthorized
WWW-Authenticate: Bearer discovery_url="https://iam-server.com/.well-known/openid-configuration"

skiping one step (that is obtained in the second step of the RFC 9728 workflow).
Which metadata or protection do we gain by going first to .well-known/oauth-protected-resource/data ?

@mbtaylor
Copy link
Member

mbtaylor commented Nov 4, 2025

@jesusjuansalgado suggests using the IVOA Registry as a repository which a client can use to map protected resources to their corresponding Authorization Servers. If such mappings existed in the registry and if the registry could be trusted, I believe that would indeed solve the problem we're trying to address here.

@rra raises the question of whether clients might examine a rogue (searchable) registry; this isn't very likely, there are only a small number of searchable registries, they reside at long-lived endpoints, and these are generally hard-coded into VO clients.

However, whether the content of the registry is trustworthy is a different matter. Searchable registries are populated in an unsupervised way by harvesting from multiple publishing registries (there are currently a few tens of these) and new publishing registries can be added to the Registry of Registries by a user-initiated process which checks standards compliance but not details of content. I don't believe that such additions are reviewed by humans; human review on security grounds could anyway raise various problems including political ones ("I don't like the look of your home institution so I'm not going to let you into the registry"). Publishing registries own a unique ivo: namespace but this does not enforce restrictions on which service endpoints their records can refer to, so a malicious publishing registry could set up records mapping e.g. protected LSST resources to its own auth servers. Thus an attacker willing to register a new publishing registry, or able to compromise the content of one of the existing publishing registries, could compromise the content of the IVOA registry in a way that could enable token theft etc of the kind we are trying to prevent here.

Disclaimer: there are a few assertions in the above paragraph that I'm not sure about, Registry WG experts should be consulted for definitive answers. But in any case: content of the registry has not till now been considered as a security sensitive matter, and if we want to treat it as such a careful analysis would be required.

There is also the issue already raised that services under development might not have corresponding registry records which could necessitate some kind of abusable backdoor or exception mechanism; the flip side of that is that service developers would need to manipulate registry records in line with their working services, which may be onerous.

I don't say that using the registry to solve this issue is a complete non-starter, but it certainly raises some serious questions that would need to be answered first, in consultation with the Registry WG.

@jesusjuansalgado
Copy link

yes, it depends on the Registry WG team to answer.

Just to flag that the proposal is not asking for a supervised registration control in general, but a control on the new "credentials provider" entries that should not be very high in number (I think there are around 10 data providers in total in the current IVOA with services that require authorisation). I would say that new registrations of this kind of resource could be in pending status until authorised. Once created, the maintenance of the content of these registrations (adding or removing allowedServers) depends on the owner of the authority.

Solving the security problems with just that, on services that the IVOA owns, and without asking for modifications to the configuration of all the IAM systems, I think it is an acceptable price to pay. Let's see what our Registry colleagues think about. I will comment on this issue to them, so they have a presence in the discussion during interop

@rra
Copy link

rra commented Nov 4, 2025

Ah! RFC 9207 does indeed complete the picture. Good heavens there are a lot of OAuth RFCs.

Okay, given that, after all of these side-tracks, I am +1 on the original proposal (maybe with some tweaks) in this PR and believe we should handle the phishing concern with pure OAuth standards, namely the combination of RFCs 9728, 8414, and 9207, all of which should be mandatory to implement (the relevant portions of) for the IVOA bearer token auth method, combined with a recommendation to use device auth flow to get the token in the first place if the site wants to support automatic registration of non-browser clients.

If I understand all the pieces correctly, the protocol then looks like (this is essentially section 5 of RFC 9728 with some additional commentary):

  1. Client contacts an unknown IVOA service and receives a 401 response with a WWW-Authenticate challenge with the URL of the protected resource metadata in the resource_metadata parameter of the WWW-Authenticate challenge as defined in RFC 9728. Note that this means we no longer need a separate authentication scheme; we can just use bearer. (We can of course also send a separate scheme as well if we want to include some other non-auth-related parameters.)
  2. Client retrieves the resource metadata and verifies it following the rules in RFC 9728 (which, because it's following an explicit pointer via WWW-Authenticate, only means that the client must not follow redirects when making this request).
  3. Client determines the authorization_server URL (or URLs) that this service requires from the protected resource metadata and constructs a .well-known URL from that to retrieve the authentication server metadata following the rules in RFC 8414. Note that the contents of authorization_server in the protected resource metadata is a list of issuer identifiers (I had to read it a couple of times to be sure).
  4. Client retrieves the RFC 8414 authorization server metadata from that constructed URL and verifies it according to RFC 8414.
  5. Client verifies that the resource value in the protected resource metadata retrieved in step 2 matches an entry in the protected_resources list in the authorization server metadata retrieved in step 4. This is the critical step that prevents the phishing attacks we've been talking about.
  6. Now the client has all the information required to securely associate an issuer string with the protected resource, verified in both directions. If the client already has a stored token for that issuer, it can now just repeat the original request in step 1, sending that token as a bearer token, and be reasonably assured that all is good.
  7. If it does not already have a token, it should now use the authorization server metadata in step 4 to start an OAuth or OpenID Connect authentication flow, such as device auth. When starting that authentication flow, it MUST verify that the issuer of the authentication server (returned in iss per RFC 9207) matches the issuer that it discovered in step 3 and verified in step 4. If it does not, it must stop the protocol there and not continue with authentication.
  8. Once completing that authentication, it can now return to make a direct resource request to the URL in step 1, providing the bearer token.

@rra
Copy link

rra commented Nov 4, 2025

Hopefully I have now fixed all the errors in my last comment (sigh). Also apologies to @aragilar: The PR provides all the necessary documents if I'd just gone and read them thoroughly to start with. Thank you very much for your patience in pointing out which documents I'd missed!

@jesusjuansalgado
Copy link

In general, everything that comes from a trusted metadata that contains the services allowed to request tokens will work.
That was on the allowed_origins proposal (in the token), here with protected_resources (in the configuration) or in the IVOA registry.

The problem, as said before, is to impose these metadata on services that are used for other things (not only VO activities) and metadata not totally standard.

Just a search of which IAM systems could be used to produce a configurable "protected_resources" list into the output (automatically generated, so not fully trustworthy) (not including versions) (Maybe you have more knowledge of the compatibility):

IAM Service Can Publish protected_resources in .well-known/openid-configuration? Notes / How it Works
OAuth 2.0 / OpenID Connect (generic) ❌ No Not part of the standard; would require a custom extension.
Keycloak ✅ Yes Supports custom OpenID Provider metadata extensions; requires SPI or custom provider.
Indigo IAM ✅ Likely Can include resource lists in its resource server metadata; configurable per service.
Azure AD ❌ No Does not expose protected_resources; uses scopes and app roles instead.
AWS IAM / Cognito ❌ No Metadata does not include protected resources; access via roles/scopes.
Google Cloud IAM ❌ No Discovery documents only include endpoints; resource access via roles/permissions.
Auth0 ✅ Possible Custom fields can be added to OIDC discovery metadata; protected_resources possible via extension.
Okta ✅ Possible Custom metadata extensions supported; can define protected_resources per app/API.
Ping Identity ✅ Possible Supports custom OIDC metadata extensions; protected_resources can be added via configuration or API.

So, I am not sure if I see the glass half full or half empty. The future could go in a direction that simplifies this. We use Indigo IAM, so we could probably implement it, but I am not sure if this makes us dependent on a certain IAM implementation.

Also, I would like to guarantee that the behaviour we are adding is not global and does not affect other non-VO services. If I am using the scope mail for my mail application, I do not want to add strange metadata in the discovery URL because I am not sure of the consequences. Is my other enterprise service affected by this non-standard extension? Is it going to ignore it?

This is why I prefer to touch only the VO part and use our services for this without making changes in these services also used for other purposes.

@andamian
Copy link
Contributor

andamian commented Nov 4, 2025

@rra - yes, I believe that would work but I'd like to make an observation about point 4: RFC 8707 could be an alternative to that - mapping of resource to aud essentially confirms that the resource is part of the protected_resources. RFC 8707 could also work alongside protected_resources - they are not mutually exclusive. The language around protected_resources in RFC 9728 makes me a bit hesitant to place it central stage in our approach but the advantages could be worth the risks:

protected_resources
OPTIONAL. JSON array containing a list of resource identifiers for OAuth protected resources that 
can be used with this authorization server. Authorization servers MAY choose not to advertise 
some supported protected resources even when this parameter is used. In some use cases, the set 
of protected resources will not be enumerable, in which case this metadata parameter will not be present.

The weak point of the approach is that step 4 cannot be enforced (maybe that's why protected_resources is optional). We rely on clients that they would do it but it's not guaranteed. With RFC 8707, the resource service gets an assurance through the aud claim that the client followed the standard. It makes it mandatory.

@rra
Copy link

rra commented Nov 4, 2025

I just noticed a bit of an interesting problem that I hadn't fully internalized: The RFC 9728 resource identifier has to exactly match the URL that the client was attempting to access, and that in turn must exactly match an entry in protected_resources, to prevent the attacker from pointing to someone else's resource metadata. I misread the protocol initially and thought it was talking about the URL to the resource metadata, but it's not (and of course that wouldn't have made sense). RFC 9728 seems to indicate this resource URL is not an origin; it includes the full path. That's going to be a problem for, e.g., REST services where the path can contain arbitrary parameters, making the resources not enumerable. And even without that, having to list dozens of different IVOA services sounds not fun.

I think we need to provide some guidance here about how the client should construct the expected resource identifier used to validate the metadata, and I think it needs to be more generous than the apparent wording of RFC 9728 using the exact URL of the original request and instead take advantage of the additional wiggle room allowed in https://www.rfc-editor.org/rfc/rfc8707#section-2, which basically says clients should default to using the HTTP origin of the request (the URL without the path, query string, or fragments). This makes the resource identifiers semantically equivalent to the restrictions in @jesusjuansalgado's proposal. I think that rule makes this far more feasible to implement, at the cost of making it entirely insecure if secure resources are sharing an origin with possibly attacker-controlled resources. But as previously mentioned, that's already unsafe for many web resources due to the browser security model, so I think we can simply document that as an IVOA constraint.

@jesusjuansalgado Now that we've established that it's possible to get the right semantics through standard OAuth protocols, I'm -0.9 on any proposal that the IVOA design its own security protocol instead. If we had to, we had to, but there are a lot of ways to make mistakes in security protocol design and we don't have anywhere near the resources the IETF has for security review by experts. If there's an existing expert-designed protocol with fairly widespread implementation (which I think your chart indicates), I think we should adopt it and push remaining vendors to implement it.

I don't agree with your concern about limiting this metadata to VO clients. It is not VO-specific; implementing it for all protected resources without regard to whether they're being used for VO would be an improvement in security anyway. So I am -0 on making that a design goal.

@andamian I agree, client support of RFC 8707 also solves the same problem and would work in the absence of protected_resources. I'm +1 on documenting that as an alternative and encouraging clients to implement it as a fallback if protected_resources is absent.

The primary problem with RFC 8707 in practice is that, when used in conjunction with device auth, getting a new token for each new resource is extremely expensive to the point of potentially being infeasible. (I think this was @mbtaylor's point earlier.) The way to address that is for as many protected resources as possible to share the same resource identifier, which I think is probably true at many sites (but not all). So I think this approach is more usable for sites where all their IVOA resources can be grouped under a single HTTP origin and therefore the obtained token is valid for all services.

(The +1/-0.9/etc. stuff is https://www.apache.org/foundation/voting shorthand.)

@andamian
Copy link
Contributor

andamian commented Nov 4, 2025

Again, multiple auds, root aud for multiple related services and token exchange are all possible solutions to make RFC 8707 work in practice. And I can see protected_resources, if available, allowing further client optimizations. But I think it's the alternative that allows the resource service to rely solely on trusted authorization servers (and spec compliant clients).

The only gap for dynamic resource/auth discovery that I can see with the current RFC ecco system, is that while RFC 9728 specifies the auth services associated with a resource, it doesn't provide a mechanism for discovering how to request access to it. More specifically: what are the required or supported scopes, resource indicators, audiences, etc that can or must be sent with the token request to the auth server. This is a gap that we need to fill in AuthVO in anticipation of OAuth Working Group catching up with a "resource server metadata" spec analogous to authorization service metadata in RFC 8414.

@jesusjuansalgado
Copy link

jesusjuansalgado commented Nov 5, 2025

@rra As said, everything that comes from a trusted metadata that contains the services allowed to request tokens will work, so I am fine with the approach if it is something that could be implemented in a variety of IAM services

To illustrate the problem I am trying to explain and going to the real implementation (not the autogenerated compatibility table), Indigo IAM implements RFC 8414 not RFC 9728 (no protected_resources). I cannot find a way to do it in the configuration or similar (what makes sense as it is implementing a different standard) and the only option I can see at this moment is to override the class with something like:

@GetMapping("/.well-known/openid-configuration")
public Map<String, Object> openidConfiguration() {
    Map<String, Object> config = defaultConfig();
    config.put("protected_resources", List.of(
        "https://example.org/api",
        "https://example.org/access"
    ));
    return config;
}

Obviously, if I need to patch the code of the server to add this optional list because of an IVOA standard, it could be difficult (impossible?) to sell this to the SKAO security department. For other teams using different IAM servers, asking them to find how to add optional metadata on servers that are not prepared for that and they do not develop, looks tricky.

So, although I am happy if this approach can be implemented, I always think it is easier to modify the services that we develop (like the registry) more than the ones we just use. I will contact the Indigo IAM team.

@jesusjuansalgado
Copy link

OK, I have opened a discussion at the Indigo IAM collaboration page:
indigo-iam/iam#1077

@aragilar
Copy link
Author

aragilar commented Nov 5, 2025

@jesusjuansalgado The issue is anything bespoke the VO does has to be maintained by the VO providers (and likely forever), whereas if we follow what other groups are doing and use the specs that others have created and are implementing (or if need be provide the change so it can be in the systems we depend on), we are in a much better position with respect to ongoing changes around OAuth specs (it's worth reading through OAuth 2.1 and seeing what's now required, most of the security RFCs are now core, and the RFCs that are made core are referred to in the text). I agree in the sort term it's easier to make changes to our own software than that of others, but in the long term it's far better for us to be able to defer security improvements to those who do it as their main job, so when vulnerabilities are found we're not the ones needing to do the work to create the fixes (and instead it's a matter of updating the version in a config file).

@jesusjuansalgado
Copy link

jesusjuansalgado commented Nov 5, 2025

@aragilar I fully understand that it is better to use oAuth standards than to do it in our own way. The IVOA has reinvented the wheel many times. Once we have understood that what we need is to obtain the metadata of the allowed/protected services from a trustable origin, any solution is fine. The problem is that the proposed solution does not depend on us, not from the theoretical but from the practical point of view. We cannot guarantee that the IAM servers are going to be
upgraded to include an optional parameter derived from an emerging RFC just because the IVOA considers it is crucial. In some cases, we can do some direct pressure (SKA has a collaboration agreement with WLCG so, probably, I can influence them to include this in Indigo IAM). Obviously, if we want to have this implemented in AWS Cognito or Google Cloud IAM, our influence is null.

Take into account that the origin of this discussion is that I wanted bearer tokens to be included into AuthVO (excluded in the current version) so all the steps in this direction are very appreciated. I am totally open to find the best approach that fulfills this that, in my view, was a need.

Also, I think we need to write in AuthVO how to use the standards (not only the RFCs implied). We know that there are "interpretations" like the domain of the RFC 8414 document should be the same of the token_issuer to prevent attacks (I am not totally sure this is true in the RFC). Also, that we need to consider the list of protected_services as parent domains as datalink URLs are like http://mydataaccess.com/access?file=1192112 so what we need to add is http://mydataaccess.com/access into the list and have the interpretation that everything behind is an allowed protected service (what, also, I think it is a pragmatic interpretation not the one that is in the RFC). This is why, although you disliked it in the past, our pull request should describe the exact workflow so VO client and server implementors could know exactly what to do and the interpretation of the RFCs involved. This is why I think we need to write something that involves a clear description of all the steps.

So my position is:

  • RFC 9728: I like it and it is great from the theoretical point of view. It contains what we need in the optional metadata protected_services. Unfortunately, the implementation depends on non-IVOA resources and it implies changes in a multi-purpose security service. Conclusion: I am not sure if I am going to be able to implement it.
  • Registry approach: Not very happy with the solution but it is solving the problem. IVOA members can implement it in a very short-term period. We have total control of the development of the resources involved.

My approach now is to have an evaluation of the RFC 9728 development feasibility, which implies not being us hacking the IAM services (what I totally dislike) but knowing how open are the IAM dev teams to implement it. With this info, I will have a clearer view.

Also, as said, I am more worried about the security protocols we are already using in the IVOA where a compromised or evil client can get the user credentials in plain.

@mbtaylor
Copy link
Member

mbtaylor commented Nov 5, 2025

This all sounds positive for the RFC-only solution, which I'm all in favour of (as long as it can be used in practice - see @jesusjuansalgado's concerns). However I have some doubts about use of RFC 9207.

RFC 9207 is not intended to address the problem that we're looking at here. It's written in terms of (and is only applicable to?) the Authorization Code Grant, which works by getting the user-agent to call back to the client with auth secrets, and solves the problem that the client might misinterpret the secrets coming from one AS with those from another, i.e. it doesn't know which of several ASs it's talking to. As it says "Mix-up attacks are only relevant to clients that interact with multiple authorization servers." The situation we have involves only a single AS, and the client knows which one it is, the problem is that it might be erroneously associated with the wrong Resource Server. I have not so far convinced myself that use of RFC 9207 can prevent this, though if @rra and @aragilar have then I'm prepared to believe it's so.

Even if it does do the job, we'd need to abuse the text to some degree to work with device flow. Section 2.4 talks about the Authorization Code Grant response which is in application/x-www-form-urlencoded form, and says the client MUST extract the iss param from this form. If we're going to use it in the context of the Device Authorization Grant, is the idea to pull the iss parameter from out of the Device Authorization Response JSON document instead?

@rra
Copy link

rra commented Nov 5, 2025

We know that there are "interpretations" like the domain of the RFC 8414 document should be the same of the token_issuer to prevent attacks

@jesusjuansalgado I thought that for a moment too, but then I convinced myself that no, this is not necessary and the RFCs are correct. To see why, put yourself in the place of an attacker and try to figure out how to introduce metadata that points to an attacker-controlled token issuer. You'll see that it's not possible under the RFC verification rules; either you keep the issuer of the server you're attacking, in which case it is rejected by the validation in RFC 8414 because you are serving it from a different domain, or you have to change the issuer to match where you're serving the altered metadata from, in which case the authentication will be rejected by the RFC 9207 check because the authorization server will return a different issuer.

Basically, what happens is that the RFC 9207 check ties the metadata to the authorization server so the authorization server essentially vouches for the metadata and everything in it, including the URL to the token issuer. So it can point to a different domain and you can still be assured that it is the intended token issuer for that authorization server.

@mbtaylor I think we are indeed in the case where the client interacts with multiple authorization servers, no? I think that language just means that this doesn't apply to clients that only ever use a single authorization server under all circumstances, because they can just hard-code things like the issuer string, but TOPCAT has to interact with the authentication systems of multiple sites and cannot do that. So that's the multiple-AS case.

It would indeed be nice if RFC 9207 had examples for all the different flows, not just the authorization code flow, but the language seems unambiguous and general:

In authorization responses to the client, including error responses, an authorization server supporting this specification MUST indicate its identity by including the iss parameter in the response.

That doesn't seem to be limited to one flow; that's all of them. So yes, I would interpret that as saying that iss must be included in the JSON Device Authorization Response, since that is a response from an authorization server.

@jesusjuansalgado
Copy link

jesusjuansalgado commented Nov 6, 2025

@rra, yes, I think the simpler recommendation could be "always verify the discovery document RFC 8414 from its canonical place" or similar. In this case, mixed attacks that mixed real and fake metadata in a discovery document could be stopped. You can be redirected to discovery documents RFC 8414 that are fake (with evil protected resources), but once you have the issuer (that should be the real one because you can detect it later with RFC 9207) the client should verify the real discovery document. In my view, we need to enumerate a clear list of the checks in the spec to ensure all engineers understand (in case the support if done natively and not with an existing library)(well, even using a dedicated library we cannot be sure if the correct checks are done)

  • Discovery (RFC 8414): widely implemented (Keycloak, INDIGO IAM, Auth0, Google, etc.)
  • Resource indicators (RFC 8707): less universal. Some IdPs support it; others require configuration or extensions. We cannot assume it exists.
  • protected_resources / RFC 9728: still emerging
  • RFC 9207: A basic implementation (iss inside the token) is present for most (all?) IAMs
  • Introspection endpoint: commonly present in discovery docs and is a valuable fallback BUT it cannot be invoked by dynamically registered clients in some IAMs (e.g. Keycloak, INDIGO IAM, CILogon) disable introspection for dynamically registered clients for security reasons. So, if the token is opaque I assume we cannot read "aud". Is JWT a requirement for the workflow?

@rra
Copy link

rra commented Nov 6, 2025

I completely agree with spelling out the checks. It sounds like the next revision of OAuth may do some of the work of consolidating the documents, but right now it's very hard to piece these things together. I think we should spell out a "happy path" for implementers based on our research, something like "if you do the following, you're in the center of the RFCs and your stuff should work; if you want to stray off that path for whatever reason, here are the RFCs to read to see the full possible design space."

I don't think we need the introspection endpoint any more. It's helpful for a client using RFC 8707 so that they can detect if the authorization server has expanded aud beyond what they requested, but I think that's not strictly necessary (and in protocols that return a JWT ID token, they can check that instead of the possibly-opaque access token).

@andamian
Copy link
Contributor

andamian commented Nov 6, 2025

I believe we agree that mutual trust between auth services and the resource services that use them is essential. In practice, these trust relationships can vary: some may follow strict standards (enforcing multiple RFCs and policies), while others may adopt a more relaxed approach to reduce overhead, complexity, and barriers to adoption.

Our goal should be to support as many of these scenarios as possible. Data providers need to understand the available options and choose the one that best fits their needs. At the same time, general-purpose tools such as PyVO and TOPCAT should be able to automatically discover and work with these different configurations.

Now that we understand the available options, perhaps our next step could be to share our current visions of how this might work at our respective centres and compare our approaches. This would help us identify the concrete requirements for AuthVO. Of course, this is just a suggestion for those already leading the document revisions, with the hope that we can converge on the two proposed approaches (PRs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants