Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 76 additions & 0 deletions iceberg/integrate-databricks-glue.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
---
title: "Sink data into Iceberg and query with Databricks"
sidebarTitle: "Databricks (Glue)"
description: "Learn how to sink data from RisingWave into Iceberg and query it using Databricks."
---

This guide shows how to sink data from RisingWave into an Iceberg table backed by AWS Glue (catalog) and Amazon S3 (warehouse), and then query the table using Databricks.

## Prerequisites

Before you begin, make sure you have:

- A running RisingWave cluster.

- (Optional) An Iceberg compactor if you plan to sink upsert streams. Contact our [support team](mailto:[email protected]) or [sales team](mailto:[email protected]) if you need this.

- A Databricks cluster.

- An Amazon S3 bucket.

- AWS Glue.

## Iceberg catalog and warehouse

The Iceberg catalog should be AWS Glue. As for warehouse, we recommended using AWS S3.

## Sink data from RisingWave into Iceberg

Follow the [instruction](/iceberg/deliver-to-iceberg#use-with-different-storage-backends) to create a sink to sink your data into Iceberg table.

Below are two examples.

```sql Glue + S3 (append-only)
CREATE SINK glue_sink FROM my_data
WITH (
connector = 'iceberg',
type = 'append-only',
warehouse.path = 's3://my-bucket/warehouse',
database.name = 'my_database',
table.name = 'my_table',
catalog.type = 'glue',
catalog.name = 'my_catalog',
s3.access.key = 'your-access-key',
s3.secret.key = 'your-secret-key',
s3.region = 'us-west-2'
);
```

```sql Glue + S3 (upsert)
CREATE SINK glue_sink FROM my_data
WITH (
connector = 'iceberg',
type = 'append-only',
warehouse.path = 's3://my-bucket/warehouse',
database.name = 'my_database',
table.name = 'my_table',
catalog.type = 'glue',
catalog.name = 'my_catalog',
s3.access.key = 'your-access-key',
s3.secret.key = 'your-secret-key',
s3.region = 'us-west-2',
write_mode = 'copy-on-write',
enable_compaction = true,
compaction_interval_sec = 300
);
```

<Note>
For `upsert` type, since Databricks doesn’t support reading position delete and equality delete files, please use Copy-on-Write mode `write_mode = 'copy-on-write'` and enable the Iceberg compaction as well. The `compaction_interval_sec` determines the freshness of the Iceberg table, since Copy-on-Write mode relies on the Iceberg compaction.
</Note>

## Query Iceberg table in Databricks

Follow [Unity catalog Lakehouse federation](https://docs.databricks.com/aws/en/query-federation/hms-federation-glue) to query Iceberg data from AWS Glue.

Once configured, you can directly query the Iceberg table from Databricks.
82 changes: 82 additions & 0 deletions iceberg/integrate-databricks-managed.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
---
title: "Sink data into Databricks managed Iceberg tables"
sidebarTitle: "Databricks (Managed)"
description: "Learn how to sink data from RisingWave into Databricks managed Iceberg tables."
---

This guide shows how to sink data from RisingWave into Databricks managed Iceberg tables using Unity Catalog.

## Enable external data access in Unity catalog

1. Configure your Unity Catalog metastore to allow external data access. See [Enable external data access on the metastore](https://docs.databricks.com/aws/en/external-access/admin#external-data-access) for more details.

2. Grant a principal Unity Catalog privileges on Databricks.

```sql
-- Users
GRANT EXTERNAL USE SCHEMA ON SCHEMA catalog_name.schema_name TO `[email protected]`
-- Service principal
GRANT EXTERNAL USE SCHEMA ON SCHEMA catalog_name.schema_name TO `32ab2e99-69a0-45bc-a110-123456eae110`
```

3. Acquire Databricks credentials. See [Access Databricks tables from Apache Iceberg clients](https://docs.databricks.com/aws/en/external-access/iceberg) for more information.

You need to fetch these parameters:

- `catalog.uri`
- Format: `<workspace-url>/api/2.1/unity-catalog/iceberg-rest`
- Value: `<workspace-url>` is the Databricks [workspace URL](https://docs.databricks.com/aws/en/workspace/workspace-details#workspace-url).
- `catalog.oauth2_server_uri`
- `catalog.credential`
- Format: `<oauth_client_id>:<oauth_client_secret>`
- Value: `<oauth_client_id>` is OAuth client ID for the authenticating principal; `<oauth_client_secret>` is OAuth client secret for the authenticating principal.
- `catalog.scope`: `all-apis`
- `warehouse.path`:
- Format: `<uc-catalog-name>`
- Value: The name of the catalog in Unity Catalog that contains your tables.

## Sink data into Databricks managed Iceberg table

```sql
create table t_test1 (
a int primary key,
b int
) append only;

insert into t_test1 values (1, 2), (3, 4);

create sink ice_sink from t_test1
with (
primary_key = 'a',
type = 'append-only',
connector = 'iceberg',
create_table_if_not_exists = true,
s3.region = 'ap-southeast-1',
catalog.type = 'rest_rust',
catalog.uri = 'https://<workspace-url>/api/2.1/unity-catalog/iceberg-rest',
warehouse.path = '<uc-catalog-name>',
database.name = 'default',
table.name = 't_test1',
catalog.oauth2_server_uri = 'https://<workspace-url>/oidc/v1/token',
catalog.credential='<oauth_client_id>:<oauth_client_secret>',
catalog.scope='all-apis',
commit_checkpoint_interval = 3
);
```

## Query Databricks managed Iceberg table

Query from Databricks:

```sql
select * from <uc-catalog-name>.default.t_test1
```

## Limitation

You can only use append only sink to Databricks managed Iceberg table.





84 changes: 84 additions & 0 deletions iceberg/integrate-snowflake.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
---
title: "Sink data into Iceberg and query with Snowflake"
sidebarTitle: "Snowflake"
description: "Learn how to sink data from RisingWave into Iceberg and query it using Snowflake."
---

This guide shows how to sink data from RisingWave into Iceberg tables and query them using Snowflake. It covers both AWS Glue and REST catalog options, with S3 as the recommended warehouse.

## Prerequisites

Before you begin, make sure you have:

- A running RisingWave cluster.

- (Optional) An Iceberg compactor if you plan to sink upsert streams. Contact our support team if needed.

- A Snowflake cluster

- S3 bucket

- AWS Glue or an Iceberg REST catalog

## Catalog and warehouse

Recommended Iceberg catalogs:

- AWS Glue catalog.
- REST catalog. It could be any catalogs compatible with [Iceberg rest openapi spec](https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml).

Recommended warehouse: AWS S3.

## Sink data from RisingWave into Iceberg

Follow the [instruction](/iceberg/deliver-to-iceberg#use-with-different-storage-backends) to create a sink to sink your data into Iceberg table.

```sql Glue + S3 (append-only)
CREATE SINK glue_sink FROM my_data
WITH (
connector = 'iceberg',
type = 'append-only',
warehouse.path = 's3://my-bucket/warehouse',
database.name = 'my_database',
table.name = 'my_table',
catalog.type = 'glue',
catalog.name = 'my_catalog',
s3.access.key = 'your-access-key',
s3.secret.key = 'your-secret-key',
s3.region = 'us-west-2'
);
```

```sql REST catalog + S3
CREATE SINK rest_sink FROM my_data
WITH (
connector = 'iceberg',
type = 'upsert',
primary_key = 'id',
warehouse.path = 's3://my-bucket/warehouse',
database.name = 'my_database',
table.name = 'my_table',
catalog.type = 'rest',
catalog.uri = 'http://rest-catalog:8181',
catalog.credential = 'username:password',
s3.access.key = 'your-access-key',
s3.secret.key = 'your-secret-key',
write_mode = 'copy-on-write',
enable_compaction = true,
compaction_interval_sec = 300
);
```

<Note>
For `upsert` type, since Snowflake doesn’t support reading equality delete files, please use Copy-on-Write mode `write_mode = 'copy-on-write'` and enable the Iceberg compaction as well. The `compaction_interval_sec` determines the freshness of the Iceberg table, since Copy-on-Write mode relies on the Iceberg compaction.
</Note>

## Configure Snowflake catalog integration

Use Snowflake catalog integration to query Iceberg data from

1. [Glue catalog](https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-catalog-integration-glue)
2. [REST catalog](https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-catalog-integration-rest)

Once you finished the configuration, you can query your Iceberg table in Snowflake.

8 changes: 8 additions & 0 deletions mint.json
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,14 @@
"iceberg/deliver-to-iceberg"
]
},
{
"group": "Integrate Iceberg with external engines",
"pages": [
"iceberg/integrate-databricks-glue",
"iceberg/integrate-databricks-managed",
"iceberg/integrate-snowflake"
]
},
{
"group": "Create and manage Iceberg tables natively",
"pages": [
Expand Down
Loading