Skip to content

Conversation

tombentley
Copy link
Member

No description provided.

Signed-off-by: Tom Bentley <[email protected]>
@k-wall
Copy link
Member

k-wall commented Jun 26, 2025

With the audit use-case in mind, I can't help but think about IP address. IP address might be from network stack or the Proxy Protocol.

I'm not trying to extend the scope of this piece of work, but I wonder how we'd model it in case it alters direction. @rgodfrey expressed an idea to model IP address as a Principal. If we had that, the IP address could be added to to the Subject.

Alternatively, we could have method(s) on the FilterContext to retrieve the IP.

@k-wall
Copy link
Member

k-wall commented Jun 26, 2025

Nice work, @tombentley. A couple of questions/comments left.

Copy link
Member

@SamBarker SamBarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the core of the proposal but I think we need to get into few more of the details around what principals are actually expected to be available and how multiple ClientSubjectAwareFilters interact.

Signed-off-by: Tom Bentley <[email protected]>
@tombentley
Copy link
Member Author

So I pushed a significant rewrite, which lays out much more of the possible landscape of use cases and APIs. Let me know what you think.

Copy link
Member

@k-wall k-wall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm liking the proposed APIs. Very thorough work, @tombentley, well done.

Copy link
Member

@SamBarker SamBarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've run out of brain damn jet lag.

Copy link
Member

@SamBarker SamBarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry review is WIP but I've run out of time.

Comment on lines +233 to +234
* @return The authenticated principal for the client connection, or empty if the client
* has not successfully authenticated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this return when a client presents a client TLS cert and then something performs SASL initiation? As presumably the person configuring the proxy would have configured SASL as the principal type?

On that note I guess a plugin wanting to use the client TLS cert to initiate SASL would need to use the clientTlsContext rather than the principal.

The timing dependent nature of this leaves me a feeling a bit concerned. I need to mull it further.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could allow filters to specify their own principalType in the filterDefinition? By default they'd receive the virtual cluster principalType, but could override it.

You would have to amend the startup validations, since a VirtualCluster with principalType=SASL containing a filter that has specified principalType=X500 would need to ensure mTLS is enabled and SASL components are in the filter chain. Plus you'd want to validate that a filterfactory that consumes X509Principals can't declare it wants SASL principalType.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this return when a client presents a client TLS cert and then something performs SASL initiation? As presumably the person configuring the proxy would have configured SASL as the principal type?

This proposal uses the "SASL initiation" to refer to the proxy making its own SASL exchange with the server. This method is the clientPrincipal() so is unrelated to any SASL initiation. If you're asking about what it returns for a client-facing connection that is TLS+SASL then:

  • If the end user has configured VirtualCluster.principalType to SASL then the concrete type would be SaslPrincipal, and it would become non-empty as a result of a SASL terminator or initiator calling FilterContext.clientSaslAuthenticationSuccess(). So a filter intercepting both ApiVersions and Metadata would see it as empty and later non-empty.
  • If the end user has configured VirtualCluster.principalType to TLS then the concrete type would be X500Principal for the duration of the filter's life.

On that note I guess a plugin wanting to use the client TLS cert to initiate SASL would need to use the clientTlsContext rather than the principal.

If you're writing a filter that does care specifically about TLS stuff then it should be using the ClientTlsContext it gets from FilterContext.clientTlsContext().

This method is about exposing an identity to filters that don't need to care about the precise concrete type, and for the choice about how clients are identified to be given to end users (via VirtualCluster.principalType).

The timing dependent nature of this leaves me a feeling a bit concerned.

If you mean that a filter can see it as empty and later non-empty... well that's just a consequence of the fact that the Kafka protocol allows some APIs to be user prior to a successful SASL authenticate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could allow filters to specify their own principalType in the filterDefinition? By default they'd receive the virtual cluster principalType, but could override it.

We could, but how does that makes the end user's life any easier? If I'm setting up a proxy/virtual cluster I want to be able to say "We're going to use SASL OAUTHBEARER for authentication". From that point onwards I know what those user names mean, there's a single source of truth about where they come from and how they're managed. I don't really want one filter to use a TLS-based principal and other to use SASL, because that's really confusing. The exception is for things like audit logging where I think someone might want to know correlate client certs and SASL identity.

Copy link
Member

@robobario robobario Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case I think we're pondering is:

  1. using mTLS from client to proxy (no SASL)
  2. then taking the mTLS identity and using it to select a SASL client identity, then initiating SASL auth with the target server.
  3. other filters consume the SASL identity

And I think Tom you're saying that in this case they would make a TlsSaslInitiator that uses the TLS apis directly, the virtual cluster would declare it's principalType as SASL, and upstream filters would see the SASL principal through FilterContext#clientPrincipal, is that right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case I don't think the configuration validations hang together perfectly, we would know that the filter chain is well formed with a SASL producer and consumer, but we would ideally want to check the gateway transport uses mTLS.

Copy link
Member

@robobario robobario Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes me wonder a little if we should have separate annotations for Filters that consume the specific APIs so we can check that consumers of the TLS specific API have an mTLS gateway, and so that we can check consumers of the SASL specific API are after in a chain containing a SASL producer.

like @SaslAware and @MutualTLSAware

moved to a new thread https://github.com/kroxylicious/design/pull/71/files#r2214615029

let's keep this Thread about TLS -> SASL. I'm happy enough with the limitations, it will still be possible to do TLS -> SASL swapping.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you mean that a filter can see it as empty and later non-empty... well that's just a consequence of the fact that the Kafka protocol allows some APIs to be user prior to a successful SASL authenticate.

Yes this was the definition I was concerned about, but I think the proposal is right, the principal is just null until SASL has completed. I was mulling over the idea that it should fall back to mTLS instead of null. However I found @tombentley answer to Rob persuasive

From that point onwards I know what those user names mean, there's a single source of truth about where they come from and how they're managed. I don't really want one filter to use a TLS-based principal and other to use SASL, because that's really confusing.

* @return The authenticated principal for the server connection, or empty if the server
* has not successfully authenticated.
*/
Optional<? extends Principal> serverPrincipal();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proxy is both a client and a server! I think serverPrincipal is really ambiguous.

downstreamPrincipal and upstreamPrincipal seem much clearer to me or even receivedDownstreamPrincipal and sentUpstreamPrincipal.

Copy link
Member

@robobario robobario Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on the ambiguity, but I think we should continue to shy away from upstream/downstream. It feels like the language and what it implies is cultural/individual and has caused confusion in the past (despite it being common terminology in networking).

Maybe the pain is that we've named the entity talking to the proxy client. If we have multiple things acting as clients, calling one of them 'the client' is confusing.

What if we introduced a new term for the client like user? And the upstream server is the target. And the proxy is the proxy. Then we'd have:

userClientPrincipal
proxyClientPrincipal
proxyServerPrincipal
targetServerPrincipal

edit or maybe origin as a kind of opposite to target? originClientPrincipal.

Copy link
Member

@robobario robobario Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note, in this context I think upstreamServer would also be pretty unambiguous, it's just once we start having pervasive upstream/downstream language it gets harder to understand imo. target matches our config file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, maybe upstream and downstream language to describe the entities is fine in this networking context, IIRC the confusion in the past was more about how we were using it in documentation to describe the direction of flow of messages through filters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've actually grown to like upstream and downstream terminology.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think the, upstream downstream language gets a bit less helpful when describing the data flows, but it is much less ambiguous when talking about concerns within the proxy itself.

@robobario
Copy link
Member

robobario commented Jul 17, 2025

Thanks for the work @tombentley feels like its nearly there.

Participants: I've gone and resolved a stack of outdated/addressed old threads, please have a further look at your open threads and see if there's more outdated content there. Since we've got so many threads I think it's worth trying to keep only alive/unaddressed threads alive and aim to resolve them all.

Signed-off-by: Tom Bentley <[email protected]>
@tombentley
Copy link
Member Author

I've resolved all the comments which I reasonable could, and responded to almost all the others. If you think my responses address your comment please mark them resolved. Otherwise please leave a follow-up response.

@k-wall
Copy link
Member

k-wall commented Jul 17, 2025

@tombentley I reviewed your branch yesterday. I think it looks good. Thank you for the effort. Are you thinking to land all of it in one go, or chunk it?


// TODO Why not use the JDK's `X500PrivateCredential`?

### Protections for those configuring a virtual cluster
Copy link
Member

@robobario robobario Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moving a comment out of another thread:

Should we have separate annotations for Filters that consume the specific APIs so we can check that consumers of the TLS specific API have an mTLS gateway, and so that we can check consumers of the SASL specific API are after in a chain containing a SASL producer.

For example @ClientSaslAware for filters that want to use Optional<ClientSaslContext> clientSaslContext(); and @ClientTlsAware for filters that want to use Optional<ClientTlsContext> clientTlsContext();.

Copy link
Member

@robobario robobario Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure we should be so harsh on the clientTlsContext() since it's either present or not. Maybe they could use the annotation, optionally, to declare mTLS is a requirement for the Filter to operate to get that early feedback on Proxy start. (edit: and if it's optional, it could be omitted from this design proposal and added later if we need it)

Whereas for SASL since there's all this other machinery required like a SASL Producer, it would make more sense to be a requirement that the Filter be annotated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I see on your javadoc comment you are saying that @ClientPrincipalConsumer has that function already

}
```

// TODO Why not use the JDK's `X500PrivateCredential`?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems X500PrivateCredential contains the cert so this feels fine, not sure what is the benefit

@kidpollo
Copy link

Looks good to me! great job

Copy link
Member

@SamBarker SamBarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've be working with a different mental model for what serverPrincipal represented, which has only come out chatting with @k-wall and re-reading the comment threads.

To me I think there are two principals, maybe three now I think about what is currently serverPrincipal.

  1. clientPrincipal aka downstreamPrincipal -> the principal/identity the client sent to the proxy and is thus acting as.
    2.effectivePrincipal (what I thought serverPrincipal was) -> the principal/identity the proxy sent to the broker and thus is acting as
  2. serverPrincipal -> the identity the broker sent to the proxy (which is why mutual auth matters here).

When running in pass through mode clientPrincipal and effectivePrincipal are the same identity. There are however still two logical identities.

If we mandated a SaslInspector at the upstream tail of the filter chain we could ensure that the effectivePrincipal is always populated and potentially allows us to make serverSaslAuthenticationSuccess & serverSaslAuthenticationFailure a private API.

I think whats outlined in this PR will work, so I'm +0.5, but I am contending that the separatedPrincipals I mentioned above make life easier for plug-in authors.

Comment on lines +544 to +545
* Configuring multiple SASL terminators and/or SASL inspectors in the `filters` of a virtual cluster (because there should be a single producer of client identity). Similarly for SASL initiators (becausde there should be a single producer of server identity).

Copy link
Member

@SamBarker SamBarker Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are SASL Initiators implicitly SASL Terminators? Or should the runtime be validating that if an initiator is present then there is also a terminator before it in the chain?

The case I'm worried about is a SASL initiator observes an API versions request and thus initiates a SASL exchange with the upstream broker. At some point after that starts the client sends a SaslHandshakeRequest. Without a terminator in the picture the broker is going to recieve both SaslHandshakeRequests and who knows what state that will drive it into.

* @return The authenticated principal for the server connection, or empty if the server
* has not successfully authenticated.
*/
Optional<? extends Principal> serverPrincipal();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think the, upstream downstream language gets a bit less helpful when describing the data flows, but it is much less ambiguous when talking about concerns within the proxy itself.

Comment on lines +319 to +321
* @param saslPrincipal The authenticated principal.
*/
void serverSaslAuthenticationSuccess(String mechanism, String serverName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Javadoc says saslPrincipal is a parameter, however its not in the method signature. I think the javadoc is right and it should be a parameter. Otherwise how does serverPrincipal get initialised.

@k-wall
Copy link
Member

k-wall commented Jul 18, 2025

For a thought experiment, @SamBarker and I were discussing a use-case where you'd want to use the proxy to change the SASL mechanism. How would you do that with this model?

Say an organisation want to improve its security posture by mandating SCRAM-SHA. It has a large legacy of clients connecting using SASL_PLAIN. A big bang change is impractical. It needs to phase the introduction of SCRAM, so it deploys the proxy with filter(s) capable of rewriting SASL exchanges that use PLAIN to using SCRAM. The legacy clients using PLAIN keep working until they can be upgraded. For clients using SCRAM, the filter would be effectively a no-op.

How would you do this with this API? Could you implement a filter that is both a SASL Terminator and SASL Initiator?
I think the answer is yes. You wire the filter to incept the SaslHandshake and SaslAuthenticate. Any interactions that use SCRAM go straight through unchanged. The filter catches the PLAIN handshake/authentication attempts and initiates its own SCRAM authentication with the Broker. The filter returns a SaslAuthenticate response to the client with outcome based on outcome of the initiated SCRAM authentication. The filter wouldn't need the @ServerSaslProducer annotation as it wouldn't need to call serverSaslAuthenticationSuccess.

If there were other filters in the chain that need to know the resulting client principal, you could add an SaslInterceptor filter to chain (at position 2). It would hear both the passed through SCRAM SASL interactions and the initiated SCRAM SASL ones. It would be annotated @ClientSaslProducer. Any other filter can then call clientPrincipal().

Copy link
Member

@k-wall k-wall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thorough work @tombentley, thank you.

Copy link
Member

@robobario robobario left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Tom, I'm +1

}
```

### APIs for using client principals in a generic way
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continuing @SamBarker's comment here, as I can't thread off the top-level comment

  1. clientPrincipal aka downstreamPrincipal -> the principal/identity the client sent to the proxy and is thus acting as.
    2.effectivePrincipal (what I thought serverPrincipal was) -> the principal/identity the proxy sent to the broker and thus is acting as

to make sure I've got it:

The ClientPrincipal in the proposal is a different concept again from what you've listed @SamBarker, it's a chosen (by the proxy operator) principal for the downstream client.

For example when the proxy terminate mTLS and the user says their cluster has principalType TLS, it will be the x500 certificates, regardless of whether they've had to initiate SASL with the upstream. In this case it would be the same as your downstreamPrincipal but could be different from the effectivePrincipal.

When the proxy terminates mTLS and the user says their cluster has principalType SASL which they implement with an initiator plugin, the ClientPrincipal will be the SASL principle after initiation has succeeded. In this case it is the same as your effectivePrincipal, but different from the raw downstreamPrincipal.

Similar with a SASL swapping scenario, we could have an effectivePrincipal different from the raw downstreamPrincipal and the clientPrincipal could be either of them depending how/where we intercept the principal.

I think we could add in the "raw" principals received and sent by the proxy later if required, it would just be a bit of an ugly clash with the chosen naming.

I think whats outlined in this PR will work, so I'm +0.5, but I am contending that the separatedPrincipals I mentioned above make life easier for plug-in authors.

I think the proposed ClientPrincipal concept makes life easier for a class of Filters because the Filters don't have to know when to pick between the downstream principal and the proxyClient/effective principal for generic cases like authn, it's done by configuration. Also, by having principalType in terms of SSL and SASL we can introduce these extra protections at startup. I was thinking on whether the user could configure principalType: downstream|effective, but then you can only validate that there is some downstream auth set up, or some upstream auth mechanism. Maybe for the SASL swapping case it would be clearer to specify an upstream or downstream principal, rather than "SASL" 🤔 .

Some usages for the separate principals might be audit, I'm interested in logging out the raw downstream principal alongside the effective principle. As proposed I don't think we could get at the downstream SASL principal if we swapped for a different SASL principal.

@sorooshme-happening
Copy link

Hi everyone! Thanks for all the effort that you're putting into this project.

One of the use-cases that we have is the following:

User --(TLS + SASL)--> Kroxylicious (authN and authZ) --(no TLS / no Auth)--> Cluster

After reading the design, my understanding is that a filter chain like this should do the job?

  1. AuthenticationFilter (which is a SASLTerminator)
  2. AuthorizationFilter (which uses the newly available principal to perform authorization)

I'm mostly asking this question because of the following table in the design:
image

If my understanding is correct, in a scenario where the proxy is doing the SASL Termination and the virtual cluster principalType is none (basically the proxy "hiding" the authentication from the actual cluster), the behavior should be accepted?

@tombentley
Copy link
Member Author

@sorooshme-happening thanks for chiming in!

This proposal covers authentication, but not authorization.

To "hide" authentication you'd need a filter acting as a SASL terminator. That would handle the SaslHandshake and SaslAuthenticate requests locally in the proxy, so they won't propagate to the broker. How that appears to the broker depends on how the broker is configured.

  • If it's configured with security.protocol=PLAIN then the broker would see the proxy as an Anonymous client, which means you can't really do much authorization in the broker either.
  • If it's configured with security.protocol=SSL then the broker will use the TLS client certificate presented by the proxy to construct its KafkaPrincipal. This proposal adds a mechanism for a proxy plugin to select the certificate it presents (based on the TLS connection it has with the Kafka client, or the SASL authentication with the client), so you could propagate some kind of client identity to the broker.
  • If you want to configure the broker with security.protocol=SASL_PLAINTEXT or SASL_SSL then the proxy needs to perform some kind of SASL interaction with the broker. In this case there are two options.
    • Use a SASL Initiator as well as a SASL Terminator. This decouples the client-proxy and the proxy-broker SASL exchanges. So for example you could, for example do SASL PLAIN with the Kafka client but SCRAM-SHA-512 with the broker.
    • Use a SASL Inspector instead of a SASL Terminator. This means the Kafka client's SaslHandshake and SaslAuthenticate requests propagate to the broker, so the proxy and the broker will have a common idea of who the Kafka client is. (This case is not hiding authentication, of course, so it's not what you're looking for, but I mention it for completeness).

It's worth spelling out that this proposal only covers the API between filters and the proxy runtime. It doesn't mean the Kroxylicious project will itself provide any filters that act as SASL Terminators, SASL Inspector or SASL Initiators. There are some basic implementations of these things in the tests which could serve as inspiration. I think it's likely the project would provide implementations of these things in the fullness of time. But right now I don't want to hold back the API work.

So back to the "not authorization" part... I intend to make a proposal concerning authorization once I've implemented this proposal. Right now the idea is:

  • Introduce a SubjectBuilder plugin interface, which allows to construct a Subject from the ClientTlsContext and/or ClientSaslContext. This interface should also allow looking up Principals from remote systems (e.g. RolePrincipals from LDAP/Active Directory).
  • Have some kind of API for Filters to be able to ask "Is the current subject allowed to perform $ACTION on $RESOURCE?" (e.g. "connect to target cluster bar", or "read from topic foo")
  • Actions and resources should be extensible, so someone writing a Filter can define their own types of resource and enumerate the possible actions it supports.
  • Allow end users of the proxy to write their authorization policies in a single place (such as the proxy configuration file).
  • All this would have been relatively easy if Java 17 hadn't deprecated (and Java 21 removed) most of the authorization half of JAAS along with the SecurityManager.

All of the above is subject to change, and I don't know yet whether it's really going to be possible to do all those things. But if you have specific use cases for authorization that you're willing to share it would great if you could open an Issue describing them.

@tombentley
Copy link
Member Author

@ujjwalkalia did you have any feedback on the proposal? It would great to confirm that this meets your needs.

@ujjwalkalia
Copy link

@ujjwalkalia did you have any feedback on the proposal? It would great to confirm that this meets your needs.

Hey @tombentley , yes the proposal satisfies our requirement of making the mTLS based principal available to the filter context.

@tombentley
Copy link
Member Author

Replaced by multiple other, more narrow PRs, #72, #73, #74, #75

@tombentley tombentley closed this Jul 29, 2025
(e.g. SASL termination and SASL initiation in the same virtual cluster)

What's missing is an API where the server-facing TLS client certificate is chosen based on the client's identity.
For this purpose we will add a new plugin interface, `ServerTlsCredentialSupplierFactory`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robobario I was just looking at 0.14.0 and did not find a way/plugin to propagate TLS credentials to upstream. Having the ClientTlsContext in the Filter context is great! We can add our custom filtering logic that way however we would not be able to then propagate same creds upstream so that the brokers can use those creds for ACLs enforcement, throttling etc.

Is this api still being considered?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kidpollo that part of this proposal got factored into #75. I'd still like to implement that, but I've been working on other things recently, so I don't have an implementation. I'd like there to be at least a partial implementation before we commit to that proposal.

Copy link
Member

@robobario robobario Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, @kidpollo just to refresh me, did you want mTLS on the upstream side too?

If you want mTLS downstream and SASL upstream it's a different story, you could implement an initiator filter that uses the mTLS identity to drive a SASL handshake.

edit: I take same creds to mean mTLS

@k-wall k-wall added this to 2025_Q3 Sep 15, 2025
@k-wall k-wall moved this to Done in 2025_Q3 Sep 15, 2025
@k-wall k-wall mentioned this pull request Sep 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

9 participants