Skip to content

Conversation

fabio-rizzo-01
Copy link
Contributor

@fabio-rizzo-01 fabio-rizzo-01 commented Oct 13, 2025

What changes were proposed in this pull request?

The addition of a kms key used to encrypt data in the s3 bucket

Why are the changes needed?

For anyone wish to use data encryption.

Does this PR introduce any user-facing change?

Yes, the addition of new param in the catalog creation : kms key arn

How was this patch tested?

Tested in AWS EKS

CHANGELOG.md

Update catalogs creation to include AWS kms key to be used in S3 data encryption

@jbonofre jbonofre self-requested a review October 13, 2025 14:27
Comment on lines +249 to +256
IamStatement.Builder allowKms =
IamStatement.builder()
.effect(IamEffect.ALLOW)
.addAction("kms:GenerateDataKeyWithoutPlaintext")
.addAction("kms:Encrypt")
.addAction("kms:DescribeKey")
.addAction("kms:Decrypt")
.addAction("kms:GenerateDataKey");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add the kms encryption context to this statement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you suggesting to allow the user to specify a new param during catalog creation for the encryption context condition?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I mean for SSE, the encryption context is always the path to the file. In our IAM policy statement, we can specify that the key can be used only to en/decrypt files in the encrpytion context pattern we specify - e.g., see https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingKMSEncryption.html and https://docs.aws.amazon.com/kms/latest/developerguide/encrypt_context.html#encryption-context-authorization . The user doesn't need to specify this parameter - it's automatic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is needed, the policy for the key and S3 bucket is restrictive enough and adding more restriction would cause the GenerateDataKey operation during the assume role to probably fail.

This is an example of a policy generated while creating a table:

IamPolicy(version=2012-10-17, statements=[

IamStatement(effect=IamEffect(value=Allow), actions=[IamAction(value=s3:PutObject), IamAction(value=s3:DeleteObject)], resources=[IamResource(value=arn:aws:s3:::app-id-bucket/fns/t3/*)]), 

IamStatement(effect=IamEffect(value=Allow), actions=[IamAction(value=kms:GenerateDataKeyWithoutPlaintext), IamAction(value=kms:DescribeKey), IamAction(value=kms:Decrypt), IamAction(value=kms:GenerateDataKey), IamAction(value=kms:Encrypt)], resources=[IamResource(value=arn:aws:kms:us-east-1:*********:key/id)]), 

IamStatement(effect=IamEffect(value=Allow), actions=[IamAction(value=s3:ListBucket)], resources=[IamResource(value=arn:aws:s3:::app-id-bucket)], conditions=[IamCondition(operator=StringLike, key=s3:prefix, value=fns/t3/*)]), 

IamStatement(effect=IamEffect(value=Allow), actions=[IamAction(value=s3:GetBucketLocation)], resources=[IamResource(value=arn:aws:s3:::app-id-bucket)]), 

IamStatement(effect=IamEffect(value=Allow), actions=[IamAction(value=s3:GetObject), IamAction(value=s3:GetObjectVersion)], resources=[IamResource(value=arn:aws:s3:::app-id-bucket/fns/t3/*)])])


Also in https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingKMSEncryption.html#encryption-context
it says that the bucket path is added by default in the encryption context.

Checking the encryption context in CloudTrail for the GenerateDataKey request I can see:

"encryptionContext": {
"aws:s3:arn": "arn:aws:s3:::app-id-*****"
}

Comment on lines +253 to +255
.addAction("kms:Encrypt")
.addAction("kms:DescribeKey")
.addAction("kms:Decrypt")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we give encrypt or decrypt based on the priviledge ? for example TABLE_READ should get decrypt and TABLE_READ_AND_WRITE should both encrypt and decrypt ?

example: "arn:aws:iam::123456789001:user/abc1-b-self1234"
kmsKeyArn:
type: string
description: the aws kms key arn used to encrypt s3 data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by s3 data we mean data stored in s3 or iceberg data files ?
if its just data stored in s3, a part of iceberg metadata is generated by polaris itself, do we need to plumb the encryption to the fileIO polaris uses ?

if its just iceberg data files how are we making sure its not used against iceberg metadata files such as manifest lists ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the key is used by s3 to do encryption of any data in the bucket at rest, so as long as the IAM role has got the right entitlements on the key everything will work fine. I don't think anything needs to be changed on the polaris side.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am talking of setting these properties in the FileIO which polaris uses - https://iceberg.apache.org/docs/nightly/aws/#s3-server-side-encryption or are you relying on these catalog properties seeping automatically to the FileIO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow I might be missing something, how would you specify the kms key in Polaris without adding in the storage config, also we need to add the policy in there to be able to use right? thx

@dimas-b
Copy link
Contributor

dimas-b commented Oct 14, 2025

What's the relationship of this PR and #1424 ?

type: string
description: the aws user arn used to assume the aws role
example: "arn:aws:iam::123456789001:user/abc1-b-self1234"
kmsKeyArn:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By convention, adding REST API parameters needs to be discussed on the dev ML. Could you start a thread there and reference this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a thread for this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake... I did not see it 🤦

For ref: https://lists.apache.org/thread/to7lbdw1k6dym5c6gnk1nm32vqdw24qk

@fabio-rizzo-01
Copy link
Contributor Author

What's the relationship of this PR and #1424 ?

sorry I didn't see that one.

@dimas-b
Copy link
Contributor

dimas-b commented Oct 14, 2025

Re: #1424 - that PR appears to be dormant, so it's fine to offer an alternative. I was just trying to understand whether the feature is the same or not :)

Co-authored-by: Dmitri Bourlatchkov <[email protected]>

String arnPrefix = arnPrefixForPartition(awsPartition);
String arnPrefix = arnPrefixForPartition(storageConfigurationInfo.getAwsPartition());
String kmsKeyArn = storageConfigurationInfo.getKmsKeyArn();
Copy link
Contributor

@dimas-b dimas-b Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can / will this work with MinIO KMS? Cf. https://github.com/minio/minio/blob/master/docs/kms/README.md

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will need to test that.

Copy link
Contributor

@dimas-b dimas-b Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine either way, but if it did not work with MinIO it would be good to mention AWS (only) in the PR description.

Also, I'm pretty sure we'll have to support KMS in MinIO at some point, so if current changes were reusable it would be great (but not required).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants