diff --git a/iceberg/integrate-databricks-glue.mdx b/iceberg/integrate-databricks-glue.mdx new file mode 100644 index 00000000..bd5d0c0d --- /dev/null +++ b/iceberg/integrate-databricks-glue.mdx @@ -0,0 +1,76 @@ +--- +title: "Sink data into Iceberg and query with Databricks" +sidebarTitle: "Databricks (Glue)" +description: "Learn how to sink data from RisingWave into Iceberg and query it using Databricks." +--- + +This guide shows how to sink data from RisingWave into an Iceberg table backed by AWS Glue (catalog) and Amazon S3 (warehouse), and then query the table using Databricks. + +## Prerequisites + +Before you begin, make sure you have: + +- A running RisingWave cluster. + +- (Optional) An Iceberg compactor if you plan to sink upsert streams. Contact our [support team](mailto:cloud-support@risingwave-labs.com) or [sales team](mailto:sales@risingwave-labs.com) if you need this. + +- A Databricks cluster. + +- An Amazon S3 bucket. + +- AWS Glue. + +## Iceberg catalog and warehouse + +The Iceberg catalog should be AWS Glue. As for warehouse, we recommended using AWS S3. + +## Sink data from RisingWave into Iceberg + +Follow the [instruction](/iceberg/deliver-to-iceberg#use-with-different-storage-backends) to create a sink to sink your data into Iceberg table. + +Below are two examples. + +```sql Glue + S3 (append-only) +CREATE SINK glue_sink FROM my_data +WITH ( + connector = 'iceberg', + type = 'append-only', + warehouse.path = 's3://my-bucket/warehouse', + database.name = 'my_database', + table.name = 'my_table', + catalog.type = 'glue', + catalog.name = 'my_catalog', + s3.access.key = 'your-access-key', + s3.secret.key = 'your-secret-key', + s3.region = 'us-west-2' +); +``` + +```sql Glue + S3 (upsert) +CREATE SINK glue_sink FROM my_data +WITH ( + connector = 'iceberg', + type = 'append-only', + warehouse.path = 's3://my-bucket/warehouse', + database.name = 'my_database', + table.name = 'my_table', + catalog.type = 'glue', + catalog.name = 'my_catalog', + s3.access.key = 'your-access-key', + s3.secret.key = 'your-secret-key', + s3.region = 'us-west-2', + write_mode = 'copy-on-write', + enable_compaction = true, + compaction_interval_sec = 300 +); +``` + + +For `upsert` type, since Databricks doesn’t support reading position delete and equality delete files, please use Copy-on-Write mode `write_mode = 'copy-on-write'` and enable the Iceberg compaction as well. The `compaction_interval_sec` determines the freshness of the Iceberg table, since Copy-on-Write mode relies on the Iceberg compaction. + + +## Query Iceberg table in Databricks + +Follow [Unity catalog Lakehouse federation](https://docs.databricks.com/aws/en/query-federation/hms-federation-glue) to query Iceberg data from AWS Glue. + +Once configured, you can directly query the Iceberg table from Databricks. \ No newline at end of file diff --git a/iceberg/integrate-databricks-managed.mdx b/iceberg/integrate-databricks-managed.mdx new file mode 100644 index 00000000..35f16b29 --- /dev/null +++ b/iceberg/integrate-databricks-managed.mdx @@ -0,0 +1,82 @@ +--- +title: "Sink data into Databricks managed Iceberg tables" +sidebarTitle: "Databricks (Managed)" +description: "Learn how to sink data from RisingWave into Databricks managed Iceberg tables." +--- + +This guide shows how to sink data from RisingWave into Databricks managed Iceberg tables using Unity Catalog. + +## Enable external data access in Unity catalog + +1. Configure your Unity Catalog metastore to allow external data access. See [Enable external data access on the metastore](https://docs.databricks.com/aws/en/external-access/admin#external-data-access) for more details. + +2. Grant a principal Unity Catalog privileges on Databricks. + + ```sql + -- Users + GRANT EXTERNAL USE SCHEMA ON SCHEMA catalog_name.schema_name TO `user@company.com` + -- Service principal + GRANT EXTERNAL USE SCHEMA ON SCHEMA catalog_name.schema_name TO `32ab2e99-69a0-45bc-a110-123456eae110` + ``` + +3. Acquire Databricks credentials. See [Access Databricks tables from Apache Iceberg clients](https://docs.databricks.com/aws/en/external-access/iceberg) for more information. + + You need to fetch these parameters: + +- `catalog.uri` + - Format: `/api/2.1/unity-catalog/iceberg-rest` + - Value: `` is the Databricks [workspace URL](https://docs.databricks.com/aws/en/workspace/workspace-details#workspace-url). +- `catalog.oauth2_server_uri` +- `catalog.credential` + - Format: `:` + - Value: `` is OAuth client ID for the authenticating principal; `` is OAuth client secret for the authenticating principal. +- `catalog.scope`: `all-apis` +- `warehouse.path`: + - Format: `` + - Value: The name of the catalog in Unity Catalog that contains your tables. + +## Sink data into Databricks managed Iceberg table + +```sql +create table t_test1 ( + a int primary key, + b int +) append only; + +insert into t_test1 values (1, 2), (3, 4); + +create sink ice_sink from t_test1 +with ( + primary_key = 'a', + type = 'append-only', + connector = 'iceberg', + create_table_if_not_exists = true, + s3.region = 'ap-southeast-1', + catalog.type = 'rest_rust', + catalog.uri = 'https:///api/2.1/unity-catalog/iceberg-rest', + warehouse.path = '', + database.name = 'default', + table.name = 't_test1', + catalog.oauth2_server_uri = 'https:///oidc/v1/token', + catalog.credential=':', + catalog.scope='all-apis', + commit_checkpoint_interval = 3 +); +``` + +## Query Databricks managed Iceberg table + +Query from Databricks: + +```sql +select * from .default.t_test1 +``` + +## Limitation + +You can only use append only sink to Databricks managed Iceberg table. + + + + + diff --git a/iceberg/integrate-snowflake.mdx b/iceberg/integrate-snowflake.mdx new file mode 100644 index 00000000..33fba86a --- /dev/null +++ b/iceberg/integrate-snowflake.mdx @@ -0,0 +1,84 @@ +--- +title: "Sink data into Iceberg and query with Snowflake" +sidebarTitle: "Snowflake" +description: "Learn how to sink data from RisingWave into Iceberg and query it using Snowflake." +--- + +This guide shows how to sink data from RisingWave into Iceberg tables and query them using Snowflake. It covers both AWS Glue and REST catalog options, with S3 as the recommended warehouse. + +## Prerequisites + +Before you begin, make sure you have: + +- A running RisingWave cluster. + +- (Optional) An Iceberg compactor if you plan to sink upsert streams. Contact our support team if needed. + +- A Snowflake cluster + +- S3 bucket + +- AWS Glue or an Iceberg REST catalog + +## Catalog and warehouse + +Recommended Iceberg catalogs: + +- AWS Glue catalog. +- REST catalog. It could be any catalogs compatible with [Iceberg rest openapi spec](https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml). + +Recommended warehouse: AWS S3. + +## Sink data from RisingWave into Iceberg + +Follow the [instruction](/iceberg/deliver-to-iceberg#use-with-different-storage-backends) to create a sink to sink your data into Iceberg table. + +```sql Glue + S3 (append-only) +CREATE SINK glue_sink FROM my_data +WITH ( + connector = 'iceberg', + type = 'append-only', + warehouse.path = 's3://my-bucket/warehouse', + database.name = 'my_database', + table.name = 'my_table', + catalog.type = 'glue', + catalog.name = 'my_catalog', + s3.access.key = 'your-access-key', + s3.secret.key = 'your-secret-key', + s3.region = 'us-west-2' +); +``` + +```sql REST catalog + S3 +CREATE SINK rest_sink FROM my_data +WITH ( + connector = 'iceberg', + type = 'upsert', + primary_key = 'id', + warehouse.path = 's3://my-bucket/warehouse', + database.name = 'my_database', + table.name = 'my_table', + catalog.type = 'rest', + catalog.uri = 'http://rest-catalog:8181', + catalog.credential = 'username:password', + s3.access.key = 'your-access-key', + s3.secret.key = 'your-secret-key', + write_mode = 'copy-on-write', + enable_compaction = true, + compaction_interval_sec = 300 +); +``` + + +For `upsert` type, since Snowflake doesn’t support reading equality delete files, please use Copy-on-Write mode `write_mode = 'copy-on-write'` and enable the Iceberg compaction as well. The `compaction_interval_sec` determines the freshness of the Iceberg table, since Copy-on-Write mode relies on the Iceberg compaction. + + +## Configure Snowflake catalog integration + +Use Snowflake catalog integration to query Iceberg data from + +1. [Glue catalog](https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-catalog-integration-glue) +2. [REST catalog](https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-catalog-integration-rest) + +Once you finished the configuration, you can query your Iceberg table in Snowflake. + diff --git a/mint.json b/mint.json index 90e77ba1..8beac8ef 100644 --- a/mint.json +++ b/mint.json @@ -214,6 +214,14 @@ "iceberg/deliver-to-iceberg" ] }, + { + "group": "Integrate Iceberg with external engines", + "pages": [ + "iceberg/integrate-databricks-glue", + "iceberg/integrate-databricks-managed", + "iceberg/integrate-snowflake" + ] + }, { "group": "Create and manage Iceberg tables natively", "pages": [