Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
229 changes: 229 additions & 0 deletions proposals/010-authorizer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,229 @@
<!-- This template is provided as an example with sections you may wish to comment on with respect to your proposal. Add or remove sections as required to best articulate the proposal. -->

# Authorizer Filter

The Authorizer filter gives the ability to add authorisation checks into a Kafka system which will be enforced by the proxy.

## Current situation

It is possible for a filter to implement its own business rules, enforcing authorization in some custom manner. However,
that approach does not have good separation of concerns. Authorization checks are an orthogonal concern, and security
best practice is to separate their enforcement from business logic.

## Motivation

We are identifying use-cases where making authorization decisions at the proxy is desirable. Examples include where one wishes to restrict a virtual cluster to a sub-set of the resources (say topics) of the cluster.

## Proposal

The Authorizer filter gives the ability to layer authorization checks into a Kafka system which with those authorization checks being enforced by the filter. These authorization checks are in addition to anythat may be imposed by the Kafka Cluster itself. This means that for an action to be allowed both the proxy’s authorizer and the Kafka broker’s authorizer will need to reach an ALLOW decision.

The Authorizer filter allows for authorization checks to be made in the following form:

`Principal P is [Allowed/Denied] Operation O On Resource R`.

where:

* Principal is the authenticated user.
* Operation is an action such as, but not limited to, Read, Write, Create, Delete.
* Resource identifies one or more resources, such as, but not limited to Topic, Group, Cluster, TransactionalId.

Unlike Apache Kafka authorizer system, the `from host` predicate is omitted. This is done to adhere to the modern security principle that there are no privileged network locations.

### Request authorization

The Authorizer filter will intercept all request messages that perform an action on a resource, and all response messages that list resources.

On receipt of a request message from the downstream, the filter will make an asynchronous call to the authorizer for the resource(s) involved in the request. If the authorization result for all resources is `ALLOWED`, the filter will forward the request to the broker.
If the authorization result is `DENIED` for any resource in the request, the filter will produce a short circuit error response denying the request using the appropriate authorization failed error code. The authorizer filter must not forward requests that fail authorization.

### Response resource filtering

On receipt of a response message from the upstream, the Authorizing filter will filter the resources so that the downstream receives only resources that they are authorized to `DESCRIBE`.

Some Kafka responses contain authorized operations values (originally introduced by [KIP-430](https://cwiki.apache.org/confluence/display/KAFKA/KIP-430+-+Return+Authorized+Operations+in+Describe+Responses)).
These give the client a way to know which operations are supported for a resource without having to try the operation.
Authorized operation values are bit fields where each position in the bitfield corresponds to a different operation.
The broker computes authorized operations values only when requested. The client sets an include authorized operations flag in the request
and the broker then computes the result which is returned in the response.

If authorized operation values present in the response (`>0`), the filter must compute the effective authorized operations values
given both the value returned from the upstream and authorized operation value computed for the ACLs imposed by itself.
The response must be updated with the effective value before it is returned to the client.
This equates to a bitwise-AND operation on the two values.

### Pluggable API

The Authorizer filter will have a pluggable API that allows different Authorizer implementations to be plugged in. This proposal will deliver a simple implementation of the API that allows authorization rules to be expressed in a separate file. Future work may
deliver alternative implementations that, say, delegate authorization decisions to externals systems (such as OPA), or implement other
authorizations schemes (such as RBAC).

### Operation/Resource Matrix

For the initial version, the system will be capable of making authorization decisions for topic operations and cluster connections only.
Future versions may support authorization decisions for other Kafka resources types (e.g. consumer group and transactional id).
The Authorizer will be designed to be open for extension so that it may be used to make authorization decisions about other entities (beyond those defined by Apache Kafka).

The table below sets out the authorization checks and filters will be implemented.

| Operation | Resource Type | Kafka Message |
|-----------|---------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| READ | Topic | Fetch, ShareFetch, ShareGroupFetch, ShareAcknowledge, AlterShareGroupOffsets, DeleteShareGroupOffsets, OffsetCommit, TxnOffsetCommit, OffsetDelete |
| WRITE | Topic | Produce, InitProducerId, AddPartitionsToTxn |
| CREATE | Topic | CreateTopics |
| DELETE | Topic | DeleteTopics |
| ALTER | Topic | AlterConfigs, IncrementalAlterConfigs, CreatePartitions |
| DESCRIBE | Topic | ListOffset, OffsetFetch, OffsetFetchForLeaderEpoch DescribeProducers, ConsumerGroupHeartbeat, ConsumerGroupDescribe, ShareGroupHeartbeat, ShareGroupDescribe, MetaData, DescribeTopicPartitions, ConsumerGroupDescribe |
| CONNECT | Cluster | SaslAuthenticate |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Cluster here a virtual cluster or a target cluster?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this was target cluster, it's hard to identify currently. We don't name our target cluster in the proxy configuration, it's an object embedded in a VirtualCluster container a bootstrapServers. Also, how would the Filter know which Target Cluster it's connected to? Maybe it implies we would have to elevate Target Cluster to a named entity in it's own right and give Filters knowledge of the Target Cluster they are connected to.

If it's virtual cluster, we can identify it by name in config and via the Filter API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for current purposes it would be fine if this were virtual cluster. But we should have explicit names (i.e. a VIRTUAL_CLUSTER resource type) so that it's unambiguous and we could add TARGET_CLUSTER` in the future without upset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intent was to allow access to the virtual cluster to be authorized. So VIRTUAL_CLUSTER resource type is sensible. Do we actually have a use-case for applying permissions to the TARGET_CLUSTER?


In general, the filter will make access decisions in the same manner as Kafka itself. This means it will apply the same authorizer checks that Kafka enforces itself and generate error responses in the same way. It will
From the client's perspective, it will be impossible for it to distinguish between the proxy and kafka cluster itself.
It will also use the same implied operation semantics as implemented by Kafka itself, such as where `ALTER` implies `DESCRIBE`, as described by
`org.apache.kafka.common.acl.AclOperation`.
Comment on lines +81 to +82
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should define what has responsibility for this. It could be:

  • A part of the contract of interface Authorizer
  • An implementation detail of the particular Authorizer implementation being proposed, so that other Authorizers are not required to respect this implication

If the former then it's natural to model the implication relation on the operation enum. If the latter then probably we're saying that the Authorizer implementation has to special case those topic operations.


There is one deviation. The filter will implement a `CONNECT` authorization check on the `CLUSTER` early, as the connection is made, once the principal is known. This allows the Authorizer filter to be used to gate access to virtual clusters.
* In the case of SASL, this will be performed on receipt of the Sasl Authentication Response. If the authorization check fails, the authentication will fail with an authorization failure and the connection closed.
* In case of TLS client-auth, this will be performed on receipt of the first request message. If the authorization check fails, a short circuit response will be sent containing an authorization failure and the connection closed. This feature won’t be part of the initial scope.

The filter will support messages that express topic identity using topic ids (i.e. those building on [KIP-516](https://cwiki.apache.org/confluence/display/KAFKA/KIP-516%3A+Topic+Identifiers)). It will resolve the topic id into a topic name before making the authorization check.

### File based Authorizer implementation

The initial scope will include an Authorizer implementation that is backed by authorization rules expressed in a separate file. This file will associate principals with ACL rules capable of expressing an allow-list of resources.
The initial version will be restricted to expressing allow lists of topics, but future version will extend this to allow for rules to be express about other resource types.

### APIs

#### Authorizer Filter

Filter Configuration:

```yaml
type: AuthorizerFilter
config:
authorizer: FileBasedAllowListAuthorizer
authorizerConfig:
rulesFile: /path/to/allow-list.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the operator, I guess the user would need to put the file in a ConfigMap which we'd need to mount. I think that should all just work via the interpolation syntax, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I believe it will.

```

Java APIs:

```java
// Inspired by org.apache.kafka.server.authorizer.Authorizer, except is asynchronous in nature.
interface Authorizer {
CompletionStage<List<AuthenticationResult>> authorize(AuthorizableRequestContext context, List<Action> actions);
}
```

```java
// Inspired by org.apache.kafka.server.authorizer.AuthorizableRequestContext
interface AuthorizableRequestContext {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You see to consider how you handle an anonymous client. In particular what is their principal.

Copy link
Member Author

@k-wall k-wall Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I hadn't thought about that at all. I suppose you might have two gateways, one attached to a SASL listener and one attached to listener using anonymous. You might want to give the anonymous connections fewer privileges.

I need to think about this one.

Two things come to mind immediate which I want to note down:

  • As things stand now, as Kafka doesn't use https://www.rfc-editor.org/rfc/rfc4505, there's no SASL negotiation, so there's nothing to actually call io.kroxylicious.proxy.filter.FilterContext#clientSaslAuthenticationSuccess. #clientSaslContext would return an empty optional.
  • Kafka uses User:ANONYMOUS is its ACL, but that feels weak. Somebody could define a user called ANOMYMOUS and get the rights ascribed to ANONYMOUS. How useful is such an attack vector? I'm not sure.

String principal();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a little too simplistic. Currently the proxy runtime may know any of

  • A SASL authorization id
  • A TLS subject DN
  • A set of TLS SANs

What you're saying here is that something knows how to turn those things into a single principal string. In general there is not a single answer to that, so it needs to be pluggable.

Futhermore, there's the (relatively) easy opportunity for making this API more general allowing other information to be loaded from a remote system to augment what's known about the client (think LDAP lookup).

Moreover, we should decide explicitly whether we are following:

  • the "Kafka way", where there's a single principal but this context can convey additional information like the client id, remote peer address, etc,
  • or the "JAAS way", where there is a Subject object which can hold a set of principals which can be used to model that additional information. In other words, you could have ClientId and ClientAddress as principals, and to make authorization decisions using that info.

As motivation, a use case for making an access control decision which takes into account clientId is: Suppose there is some team A in Example Corp which runs the internal KaaS, and some team B which runs a bunch of applications. Team B just has a single service account and uses different client ids for each of their apps. Team A find that one of Team Bs applications is causing trouble in the cluster, and need to disable it from connecting until they can get Team B to take a look. So they need to deny the (authorizedId, clientId)-pair, so that B's other (well-behaved applications) still run.

The difference between the two ways of exposing client Id in the API crops up mainly in the configuration language you propose for granting access. It's more natural to be able to say "deny CONNECT on CLUSTER to User:A and ClientId:naughty" (i.e. you just need support and and or on principal predicates in your configuration language). In contrast, exposing it via additional properties makes the language less elegant, because you're lost the regularity of the ${PrinicpalType}:${PrincipalName} syntax.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you're saying here is that something knows how to turn those things into a single principal string. In general there is not a single answer to that, so it needs to be pluggable.

I don't disagree at all. However, isn't it basically the same conversation that we had during the Authenticate API work (#71) (whether to use Subject, Principal, Set). There we decided to be content with a String until such times we need something richer. The proposal takes the approach.

We've got the prinicpalType in the rules yaml, so that gives us wiggle room to refactor the APIs and implemementation later, without breaking user-config.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, isn't it basically the same conversation that we had during the Authenticate API work (#71) (whether to use Subject, Principal, Set).

Yes, more or less.

There we decided to be content with a String until such times we need something richer.

I think now could be such a time, or at least it's worth discussing it again now.

Let me put my cards on the table: I like the JASS model where you have a Subject which can have 0 or more Principals associated with it, and where the framework itself is ambivalent about the concrete Principals which applications might define. I don't know why something different was done for authz in Kafka. They ended up having to solve similar problems anyway, such as exposing things like client id to authorizers, when they rewrote their Authorizer API.

I'm not advocating to use JAAS for authorization. I don't think itself JAAS gives use anything useful, especially with the removal of the security manager and all the associated classes. But we could easily reuse the model (Subject is a trivial record type, and Principal is a trivial interface), and doing so would seem to give us a route to making access control decisions on properties about subjects which are held externally, such as their roles.

But stepping away from JAAS specifically, my point is that:

  • authN currently gives us an identity as a string,
  • and different users will want to base their authZ decisions on different sources of identity (SASL, X509)

So we need some way of saying which identity should be used to grant access:

  • This could be as simple as an enum in the config which says USE_SASL or USE_X509,
  • It could be a PrincipalBuilder interface if we want a system which only admits a single kind of principal (the Kafka way),
  • It could be a SubjectBuilder if we want a system which admits multiple kinds of principal (the JAAS way)

Both the latter two have the option of supporting a lookup of some additional information in some external system (e.g. introspection endpoint, or AD/LDAP or whatever).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should go the JAAS way. We should add a method FilterContext#authenticatedSubject which will return the SASL and/or X509 Principals established in the channel. The subject could be mutable, allowing say an LDAP plugin to augment the Subject with Group principals.

The #authorize method will accept a Subject rather than a String. It will be up to the authorizer service implementation which Principal types it supports. The implementation proposed by this proposal will initially handle the SASL principals, but the door should be open to support X509 principals later. We'll probably need something analogous to Kafka's `ssl.principal.mapping.rules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken a look at adding:

interface FilterContext {
    // ...
    CompletionStage<Subject> authenticatedSubject()
}

It needs to be a CompletionStage if we wanted to support augmentation. But this has consequences in terms of the FilterHandler's state machine. Specifically, when a Filter impl's on*() method asks for the authenticatedSubject() and the stage is not done then we should be toggling autoread off. Doing that means autoread might be toggled for multiple different reasons, and we need to keep track of what the state ought to be. For example, a filter might do both a call to authenticatedSubject() and return a deferred response. In this case the completion of the stage for the subject should not turn autoread back on. That should only happen when the deferred response is completed too. I don't think it's terribly complicated, but it's not entirely trivial either.

I think to keep things simplish we should avoid making the creation of the subjects pluggable for now. This means that although authenticatedSubject() can return CompletionStage<Subject> in the API, we can assume for now that those futures are, in fact, always completed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why authenticatedSubject needs to return a CompletionStage. Won't there always be a mutable Subject, which might be empty? Filters or plugins will add to the Subject as they learn more about the connection.

aside: is worth separating out the new FilterContext API work from the authorizer? That would allow the Audit Filter work to continue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with that approach is that in introduces non-determinism into how the proxy can behave. For example different ordering of events between the subject being updated and the subject being queried by a filter (and making an authz decision based on that subject) might lead to false negative (or, worse, false positive) access control decisions:

  1. A filter provides an authorized id, which initiates a request is made to AD to get some roles.
  2. A filter queries the authenticated subject. Finds only a SaslId, and makes a decision based on that.
  3. The AD response is received.

We cannot block (because netty), nor otherwise prevent a filter making progress at step 1. That's because the API for providing the SASL authorized id is not async. Pausing invocation of the rest of filters in the pipeline after the authenticating filter returns does not work, because nothing prevents the same filter, having provided the id from immediately asking for the subject.

Therefore to avoid the badness in step 2 we have to avoid actually providing the subject until we have received the response from AD. But we can't simply block the thread (because netty), so returning a CompletionStage seems to me to by the only choice we have.

// scope for methods such as requestType(), requestVersion() etc to be added in future.
}
```

```java
// Inspired by org.apache.kafka.server.authorizer.Action
record Action(
AclOperation aclOperation,
ResourcePattern resourcePattern) {
}
```

```java
// The following types are inspire by the Kafka classes of the same name and have the same role. However, interfaces are used
// rather than enums to allow for extensibility (using pattern suggested by https://www.baeldung.com/java-extending-enums)
interface AclOperation {
String operationName();
}

enum CoreAclOperation implements AclOperation {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I don't like about this is that there's no safety mechanism to ensure that the operation is compatible with the resource type.

I was playing with a pattern last week where we would represent a resource type simply as an enum of the operations it supports (your ResourceType becomes a Class of that enum). Thus TopicResource ends up with READ, WRITE, DESCRIBE etc and ClusterResource ends up with CONNECT, DESCRIBE etc, and you can construct a type safe API where asking for an authorization decision on TopicResource.CONNECT is a compile time error.

Copy link
Member Author

@k-wall k-wall Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the sound of that suggestion. I was intending to iterate on the java interfaces in a pull request. You've started that already. That absolutely fine of course.

CREATE("Create"),
DELETE("Delete"),
READ("Read")/* ,... */
}

interface ResourceType {
String resourceType();
}


enum CoreResourceType implements ResourceType {
TOPIC("Topic"),
CLUSTER("Cluster")
}

interface PatternType {
boolean matches(String resourceName);
}


enum CorePatternType {
LITERAL() {
@Override
boolean matches(String pattern, String resourceName) {
return pattern.equals(resourceName);
}
},
MATCH() { /* ... */ },
PREFIXED { /* ... */ }
}

record ResourceNamePattern(PatternType patternType, String pattern) {
boolean matches(String resourceName) {
return patternType.matches(pattern, resourceName);
}
}
```

#### Rules File

The rules file expresses a mapping between principals (user type only with exact match) and an allow-list of resources.
If there is no permission expressed in the rules, then the operation is denied.

For the initial scope, only resource rules of type TOPIC are supported.

In order to allow future versions to support additional resource types without changing the meaning of existing rules files, the rules files are versioned.
The version described here is version 1. The user must specify the version number in their rules file.

For the `CLUSTER` `CONNECT` authorization check, this will be implemented implicitly. The check will return `ALLOW` if there is at least one resource rule for the principal. If there are no resource rules for the principal, the authorizer will return `DENY`.

```yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether YAML really is the right choice here. By saying it's YAML we end up in the space of "programming in YAML", which is at best painful. I think what you're specified is not bad, but it's not bad because it's limited and inflexible. For example it wouldn't cope with the clientId use case I mentioned earlier, because:

  • there's no way to say "ALLOW all these things EXCEPT for all those things". That can end up being very limiting because the only way around it is to write to enumerate exactly what you want to allow, which forces expansion of patterns, which could require very very many literals.
  • principals means "a subject with any of these principals". There's no way to match only subjects with principal serviceAccount:A and also clientId:naughty.

Of course you can model something more sophisticated using YAML, but really my point is that evolving the schema for YAML DSLs like this is just really really painful, and the resulting YAML gets less and less easy to read and write.

Perhaps it would be better to have a proper grammar (e.g. using Antlr, for example), so that future evolution is relatively easy.

allow READ on Topic with name='foo' to subjects with UserPrincipal in ('bob', 'grace')
allow * on Topic with name startingWith 'bar' to subjects with UserPrincipal in ('bob', 'grace')
allow * on Topic with name matching /baz*/ to subjects with UserPrincipal in ('bob', 'grace')

version: 1 # Mandatory must be 1. Version 1 is defined as supporting resourceType TOPIC only.
definitions:
- principals: [User:bob, User:grace] # Only User: prefixed principals will be supported.
resourceRules:
- resourceType: TOPIC # Only the topic resourceType is permitted
operations: [READ]
patternType: LITERAL
resourceName: foo
- type: TOPIC
operations: [ALL]
patternType: PREFIXED
resourceName: bar
- type: TOPIC
operations: [ALL]
patternType: MATCH
resourceName: baz*
```

## Affected/not affected projects

The kroxylicious repo.

## Compatibility

No issues - this proposal introduces a new filter./

## Rejected alternatives

### Reuse of the Kafka ACL interfaces/enumerations

The Kafka Client library includes interfaces and enumeration such as `org.apache.kafka.server.authorizer.Action`
and `org.apache.kafka.common.acl.AclOperation`. It would be technically possible to base the Authorizer's interfaces
on these types. This would have the advantage that it would help ensure that the ACL model of the Proxy followed
that of Kafka, but it would have also meant restricting the Authorizer to making access control decisions for the same
entities as Kafka does. We want to leave open the possibility to make access decisions about other resource types, beyond
those considered by Kafka today (such as record-level ACLs).