-
Notifications
You must be signed in to change notification settings - Fork 316
adding support to use a kms key for s3 buckets data encryption #2802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
IamStatement.Builder allowKms = | ||
IamStatement.builder() | ||
.effect(IamEffect.ALLOW) | ||
.addAction("kms:GenerateDataKeyWithoutPlaintext") | ||
.addAction("kms:Encrypt") | ||
.addAction("kms:DescribeKey") | ||
.addAction("kms:Decrypt") | ||
.addAction("kms:GenerateDataKey"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add the kms encryption context to this statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you suggesting to allow the user to specify a new param during catalog creation for the encryption context condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I mean for SSE, the encryption context is always the path to the file. In our IAM policy statement, we can specify that the key can be used only to en/decrypt files in the encrpytion context pattern we specify - e.g., see https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingKMSEncryption.html and https://docs.aws.amazon.com/kms/latest/developerguide/encrypt_context.html#encryption-context-authorization . The user doesn't need to specify this parameter - it's automatic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is needed, the policy for the key and S3 bucket is restrictive enough and adding more restriction would cause the GenerateDataKey operation during the assume role to probably fail.
This is an example of a policy generated while creating a table:
IamPolicy(version=2012-10-17, statements=[
IamStatement(effect=IamEffect(value=Allow), actions=[IamAction(value=s3:PutObject), IamAction(value=s3:DeleteObject)], resources=[IamResource(value=arn:aws:s3:::app-id-bucket/fns/t3/*)]),
IamStatement(effect=IamEffect(value=Allow), actions=[IamAction(value=kms:GenerateDataKeyWithoutPlaintext), IamAction(value=kms:DescribeKey), IamAction(value=kms:Decrypt), IamAction(value=kms:GenerateDataKey), IamAction(value=kms:Encrypt)], resources=[IamResource(value=arn:aws:kms:us-east-1:*********:key/id)]),
IamStatement(effect=IamEffect(value=Allow), actions=[IamAction(value=s3:ListBucket)], resources=[IamResource(value=arn:aws:s3:::app-id-bucket)], conditions=[IamCondition(operator=StringLike, key=s3:prefix, value=fns/t3/*)]),
IamStatement(effect=IamEffect(value=Allow), actions=[IamAction(value=s3:GetBucketLocation)], resources=[IamResource(value=arn:aws:s3:::app-id-bucket)]),
IamStatement(effect=IamEffect(value=Allow), actions=[IamAction(value=s3:GetObject), IamAction(value=s3:GetObjectVersion)], resources=[IamResource(value=arn:aws:s3:::app-id-bucket/fns/t3/*)])])
Also in https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingKMSEncryption.html#encryption-context
it says that the bucket path is added by default in the encryption context.
Checking the encryption context in CloudTrail for the GenerateDataKey request I can see:
"encryptionContext": {
"aws:s3:arn": "arn:aws:s3:::app-id-*****"
}
.addAction("kms:Encrypt") | ||
.addAction("kms:DescribeKey") | ||
.addAction("kms:Decrypt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we give encrypt or decrypt based on the priviledge ? for example TABLE_READ should get decrypt and TABLE_READ_AND_WRITE should both encrypt and decrypt ?
example: "arn:aws:iam::123456789001:user/abc1-b-self1234" | ||
kmsKeyArn: | ||
type: string | ||
description: the aws kms key arn used to encrypt s3 data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by s3 data we mean data stored in s3 or iceberg data files ?
if its just data stored in s3, a part of iceberg metadata is generated by polaris itself, do we need to plumb the encryption to the fileIO polaris uses ?
if its just iceberg data files how are we making sure its not used against iceberg metadata files such as manifest lists ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the key is used by s3 to do encryption of any data in the bucket at rest, so as long as the IAM role has got the right entitlements on the key everything will work fine. I don't think anything needs to be changed on the polaris side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am talking of setting these properties in the FileIO which polaris uses - https://iceberg.apache.org/docs/nightly/aws/#s3-server-side-encryption or are you relying on these catalog properties seeping automatically to the FileIO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow I might be missing something, how would you specify the kms key in Polaris without adding in the storage config, also we need to add the policy in there to be able to use right? thx
What's the relationship of this PR and #1424 ? |
type: string | ||
description: the aws user arn used to assume the aws role | ||
example: "arn:aws:iam::123456789001:user/abc1-b-self1234" | ||
kmsKeyArn: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By convention, adding REST API parameters needs to be discussed on the dev
ML. Could you start a thread there and reference this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already a thread for this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My mistake... I did not see it 🤦
For ref: https://lists.apache.org/thread/to7lbdw1k6dym5c6gnk1nm32vqdw24qk
sorry I didn't see that one. |
Re: #1424 - that PR appears to be dormant, so it's fine to offer an alternative. I was just trying to understand whether the feature is the same or not :) |
Co-authored-by: Dmitri Bourlatchkov <[email protected]>
|
||
String arnPrefix = arnPrefixForPartition(awsPartition); | ||
String arnPrefix = arnPrefixForPartition(storageConfigurationInfo.getAwsPartition()); | ||
String kmsKeyArn = storageConfigurationInfo.getKmsKeyArn(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can / will this work with MinIO KMS? Cf. https://github.com/minio/minio/blob/master/docs/kms/README.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will need to test that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine either way, but if it did not work with MinIO it would be good to mention AWS (only) in the PR description.
Also, I'm pretty sure we'll have to support KMS in MinIO at some point, so if current changes were reusable it would be great (but not required).
What changes were proposed in this pull request?
The addition of a kms key used to encrypt data in the s3 bucket
Why are the changes needed?
For anyone wish to use data encryption.
Does this PR introduce any user-facing change?
Yes, the addition of new param in the catalog creation : kms key arn
How was this patch tested?
Tested in AWS EKS
CHANGELOG.md
Update catalogs creation to include AWS kms key to be used in S3 data encryption