Skip to content

Commit 19c36ad

Browse files
author
Dmitriy Fingerman
committed
HIVE-29285: Iceberg: Add docker-compose examples for REST Catalog integrations with Gravitino and Polaris.
1 parent 82e2d61 commit 19c36ad

File tree

14 files changed

+1177
-0
lines changed

14 files changed

+1177
-0
lines changed
Lines changed: 229 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,229 @@
1+
# Hive + Gravitino + Keycloak: Docker-Compose Setup
2+
3+
This repository contains a docker-compose-based setup integrating Apache Hive, Gravitino Iceberg REST server, and Keycloak for OAuth2 authentication. It allows Hive to use an Iceberg REST catalog secured via Keycloak.
4+
5+
## Table of Contents
6+
- Architecture Overview
7+
- Prerequisites
8+
- Quickstart
9+
- Configuration
10+
- Keycloak
11+
- Gravitino
12+
- Hive
13+
- Networking Notes
14+
15+
## Architecture Overview
16+
This diagram illustrates the key docker-compose components and their interactions in this setup:
17+
18+
```
19+
oAuth2 (REST API)
20+
+-------------------------------------------------------------------+
21+
| |
22+
| v
23+
+--------+----------+ +-------------------+ +-----------------+
24+
| | RESTCatalog | | oauth2 | |
25+
| Hive | (REST API) | Gravitino | (REST API) | Keycloak |
26+
| (HiveServer2) +-------------->| Iceberg REST +----------->| OAuth2 Auth |
27+
| | | Server | | Server |
28+
+--------+----------+ +---------+---------+ +-----------------+
29+
| |
30+
writes | writes metadata |
31+
data +------------------------------------+
32+
|
33+
v
34+
+-------------------+ +-------------------+
35+
| | creates dir | |
36+
| /warehouse |<--------------+ init |
37+
| (Docker volume) | sets | container |
38+
| | permissions | |
39+
+-------------------+ +-------------------+
40+
```
41+
42+
- Hive:
43+
- Runs HiveServer2, connects to Gravitino via Iceberg REST catalog.
44+
- Write Iceberg data files to the shared warehouse volume.
45+
- Gravitino:
46+
- Exposes REST API for Iceberg catalog.
47+
- Writes Iceberg metadata files to shared warehouse volume (.metadata.json).
48+
- Doesn't supports serving as oauth2 provider, so this example uses an external OAuth2 provider (Keyclock).
49+
- Keycloak:
50+
- OAuth2 server providing authentication and token issuance for Hive/Gravitino.
51+
- /warehouse:
52+
- Shared Docker volume for Iceberg table data and metadata.
53+
- Init container:
54+
- Creates shared /warehouse folder and sets filesystem permissions as a one time initialization step.
55+
56+
## Prerequisites
57+
- Docker & Docker Compose
58+
- Java (for local Hive beeline client)
59+
- ```$HIVE_HOME``` environment variable pointing to Hive installation (for connecting to Beeline)
60+
61+
## Quickstart
62+
### STEP 1: Build Hive Docker Image
63+
- Currently, only Hive 4.2.0-SNAPSHOT has the required Iceberg REST catalog client.
64+
- To build the Hive Docker image with Iceberg REST catalog client on your local machine, follow these steps:
65+
- Build Hive using Docker profile.
66+
- Copy ```apache-hive-4.2.0-SNAPSHOT-bin.tar.gz``` to ```packaging/cache``` folder.
67+
- Execute ```build.sh``` from ```packaging/src/docker/``` folder.
68+
69+
### STEP 2: Export the Hive version
70+
```shell
71+
export HIVE_VERSION=4.2.0-SNAPSHOT
72+
```
73+
74+
### STEP 3: Start services
75+
```shell
76+
docker-compose up -d
77+
```
78+
79+
### STEP 4: Connect to beeline
80+
```shell
81+
"${HIVE_HOME}/bin/beeline" -u "jdbc:hive2://localhost:10001/default" -n hive -p hive
82+
```
83+
84+
### STEP 5: Stop services:
85+
```shell
86+
docker-compose down -v
87+
```
88+
89+
### Configuration
90+
91+
#### Keycloak
92+
93+
- Realm: hive
94+
- Client: iceberg-client
95+
- Secret: iceberg-client-secret
96+
- Protocol: OpenID Connect
97+
- Audience: hive-iceberg
98+
- Imported via `realm-export.json` in Keycloak container.
99+
- Port: 8080
100+
101+
#### Gravitino
102+
103+
- HTTP port: 9001
104+
- Catalog backend: JDBC H2 (/tmp/gravitino_h2_db)
105+
- Warehouse: /warehouse (shared with Hive)
106+
- Iceberg REST Catalog Backend config:
107+
```
108+
# Backend type for the catalog. Here we use JDBC (H2 database) as the metadata store.
109+
gravitino.iceberg-rest.catalog-backend = jdbc
110+
111+
# JDBC connection URI for the H2 database storing catalog metadata.
112+
gravitino.iceberg-rest.uri = jdbc:h2:file:/tmp/gravitino_h2_db;AUTO_SERVER=TRUE
113+
114+
# JDBC driver class used to connect to the metadata database.
115+
gravitino.iceberg-rest.jdbc-driver = org.h2.Driver
116+
117+
# Database username for connecting to the metadata store.
118+
gravitino.iceberg-rest.jdbc-user = sa
119+
120+
# Database password for connecting to the metadata store (empty here).
121+
gravitino.iceberg-rest.jdbc-password = ""
122+
123+
# Whether to initialize the catalog schema on startup.
124+
gravitino.iceberg-rest.jdbc-initialize = true
125+
126+
# --- Warehouse Location (shared folder) ---
127+
128+
# Path to the Iceberg warehouse directory shared with Hive.
129+
gravitino.iceberg-rest.warehouse = file:///warehouse
130+
```
131+
- OAuth2 config pointing to Keycloak:
132+
```
133+
# Enables OAuth2 as the authentication mechanism for Gravitino.
134+
gravitino.authenticators = oauth
135+
136+
# URL of the Keycloak realm to request tokens from.
137+
gravitino.authenticator.oauth.serverUri = http://keycloak:8080/realms/hive
138+
139+
# Path to the OAuth2 token endpoint on Keycloak.
140+
gravitino.authenticator.oauth.tokenPath = /protocol/openid-connect/token
141+
142+
# OAuth2 scopes requested when obtaining a token. Includes "openid" and the custom "catalog" scope.
143+
gravitino.authenticator.oauth.scope = openid catalog
144+
145+
# OAuth2 client ID registered in Keycloak.
146+
gravitino.authenticator.oauth.clientId = iceberg-client
147+
148+
# OAuth2 client secret associated with the client ID.
149+
gravitino.authenticator.oauth.clientSecret = iceberg-client-secret
150+
151+
# Java class used to validate incoming JWT tokens using the JWKS endpoint.
152+
gravitino.authenticator.oauth.tokenValidatorClass = org.apache.gravitino.server.authentication.JwksTokenValidator
153+
154+
# URL to fetch JSON Web Key Set (JWKS) for verifying token signatures.
155+
gravitino.authenticator.oauth.jwksUri = http://keycloak:8080/realms/hive/protocol/openid-connect/certs
156+
157+
# Identifier for the OAuth2 provider configuration in Gravitino.
158+
gravitino.authenticator.oauth.provider = default
159+
160+
# JWT claim field(s) to extract as the principal/username (here, 'sub' claim).
161+
gravitino.authenticator.oauth.principalFields = sub
162+
163+
# Acceptable clock skew (in seconds) when validating token expiration times.
164+
gravitino.authenticator.oauth.allowSkewSecs = 60
165+
166+
# Expected audience claim in the token to ensure it is intended for this service.
167+
gravitino.authenticator.oauth.serviceAudience = hive-iceberg
168+
```
169+
170+
#### Hive
171+
172+
- Uses ```HiveRESTCatalogClient``` for connecting to Iceberg REST catalog (Gravitino).
173+
- Catalog configuration in ```hive-site.xml```:
174+
```
175+
<property>
176+
<name>metastore.catalog.default</name>
177+
<value>ice01</value>
178+
<description>Sets the default Iceberg catalog for Hive. Here, "ice01" is used.</description>
179+
</property>
180+
181+
<property>
182+
<name>metastore.client.impl</name>
183+
<value>org.apache.iceberg.hive.client.HiveRESTCatalogClient</value>
184+
<description>Specifies the client implementation to use for accessing Iceberg via REST.</description>
185+
</property>
186+
187+
<property>
188+
<name>iceberg.catalog.ice01.uri</name>
189+
<value>http://gravitino:9001/iceberg</value>
190+
<description>URI of the Iceberg REST server (Gravitino). Hive will send catalog requests here.</description>
191+
</property>
192+
193+
<property>
194+
<name>iceberg.catalog.ice01.type</name>
195+
<value>rest</value>
196+
<description>Defines the catalog type as "rest", indicating it uses a REST API backend.</description>
197+
</property>
198+
199+
<!-- Iceberg REST Catalog: OAuth2 authentication -->
200+
201+
<property>
202+
<name>iceberg.catalog.ice01.rest.auth.type</name>
203+
<value>oauth2</value>
204+
<description>Configures Hive to use OAuth2 for authenticating requests to the REST catalog.</description>
205+
</property>
206+
207+
<property>
208+
<name>iceberg.catalog.ice01.oauth2-server-uri</name>
209+
<value>http://keycloak:8080/realms/hive/protocol/openid-connect/token</value>
210+
<description>URL of the Keycloak OAuth2 token endpoint used to request access tokens.</description>
211+
</property>
212+
213+
<property>
214+
<name>iceberg.catalog.ice01.credential</name>
215+
<value>iceberg-client:iceberg-client-secret</value>
216+
<description>Client credentials (ID and secret) used to authenticate with Keycloak.</description>
217+
</property>
218+
```
219+
- HiveServer2 port: 10000 (mapped to 10001 in Docker Compose)
220+
221+
## Networking Notes
222+
223+
- All containers share a custom bridge network ```hive-net```.
224+
- Services communicate via container names: hive, gravitino, keycloak.
225+
- Ports mapped for host access:
226+
- Keycloak → 8080
227+
- Gravitino → 9001
228+
- HiveServer2 → 10001
229+
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
#!/bin/sh -x
2+
3+
apk add --no-cache acl
4+
5+
mkdir -p /tmp/hive/jars
6+
mkdir -p $WAREHOUSE
7+
chmod 777 $WAREHOUSE
8+
9+
# Give the hive user id full rwx access to all existing files and directories under $WAREHOUSE
10+
setfacl -R -m u:$HIVE_USER_ID:rwx $WAREHOUSE
11+
12+
# Ensure all new files/directories created inside $WAREHOUSE automatically grant rwx access to hive user id
13+
setfacl -d -m u:$HIVE_USER_ID:rwx $WAREHOUSE
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
version: "3.9"
2+
3+
services:
4+
keycloak:
5+
image: quay.io/keycloak/keycloak:25.0.1
6+
container_name: keycloak
7+
environment:
8+
KEYCLOAK_ADMIN: admin
9+
KEYCLOAK_ADMIN_PASSWORD: admin
10+
volumes:
11+
- ./keycloak/realm-export.json:/opt/keycloak/data/import/realm-export.json
12+
ports:
13+
- "8080:8080"
14+
networks:
15+
- hive-net
16+
command: [
17+
"start-dev",
18+
"--import-realm",
19+
"--health-enabled=true"
20+
]
21+
healthcheck:
22+
test: "exec 3<>/dev/tcp/localhost/9000 && \
23+
echo -e 'GET /health/ready HTTP/1.1\\r\\nHost: localhost\\r\\nConnection: close\\r\\n\\r\\n' >&3 && \
24+
cat <&3 | grep -q '200 OK'"
25+
interval: 5s
26+
timeout: 2s
27+
retries: 15
28+
29+
gravitino:
30+
image: apache/gravitino-iceberg-rest:1.0.0
31+
container_name: gravitino
32+
environment:
33+
JAVA_OPTS: "-Dlog4j2.formatMsgNoLookups=true"
34+
volumes:
35+
- ./gravitino:/tmp/gravitino
36+
- warehouse:/warehouse
37+
ports:
38+
- "9001:9001"
39+
networks:
40+
- hive-net
41+
entrypoint: /bin/bash /tmp/gravitino/init.sh
42+
healthcheck:
43+
test: [ "CMD", "/tmp/gravitino/healthcheck.sh" ]
44+
interval: 5s
45+
timeout: 60s
46+
retries: 5
47+
start_period: 20s
48+
49+
hive:
50+
image: apache/hive:${HIVE_VERSION}
51+
container_name: hive
52+
depends_on:
53+
keycloak:
54+
condition: service_healthy
55+
gravitino:
56+
condition: service_healthy
57+
environment:
58+
SERVICE_NAME: hiveserver2
59+
volumes:
60+
- ./hive/hive-site.xml:/opt/hive/conf/hive-site.xml
61+
- warehouse:/warehouse
62+
ports:
63+
- "10001:10000"
64+
networks:
65+
- hive-net
66+
entrypoint: '/bin/sh -c "/opt/hive/bin/schematool -dbType derby -initOrUpgradeSchema && /entrypoint.sh"'
67+
68+
init:
69+
image: alpine/curl
70+
container_name: init
71+
user: "0:0" # run as root
72+
environment:
73+
- WAREHOUSE=/warehouse
74+
- HIVE_USER_ID=1000
75+
volumes:
76+
- ./common/:/common
77+
- warehouse:/warehouse
78+
entrypoint: '/bin/sh -c /common/init.sh'
79+
80+
networks:
81+
hive-net:
82+
driver: bridge
83+
84+
volumes:
85+
warehouse:
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# --- HTTP Server ---
2+
gravitino.iceberg-rest.httpPort = 9001
3+
4+
# --- Iceberg REST Catalog Backend (JDBC/H2) ---
5+
gravitino.iceberg-rest.catalog-backend = jdbc
6+
gravitino.iceberg-rest.uri = jdbc:h2:file:/tmp/gravitino_h2_db;AUTO_SERVER=TRUE
7+
gravitino.iceberg-rest.jdbc-driver = org.h2.Driver
8+
gravitino.iceberg-rest.jdbc-user = sa
9+
gravitino.iceberg-rest.jdbc-password = ""
10+
gravitino.iceberg-rest.jdbc-initialize = true
11+
12+
# --- Warehouse Location (shared folder) ---
13+
gravitino.iceberg-rest.warehouse = file:///warehouse
14+
15+
# --- OAuth2 Authentication ---
16+
gravitino.authenticators = oauth
17+
18+
gravitino.authenticator.oauth.serverUri = http://keycloak:8080/realms/hive
19+
gravitino.authenticator.oauth.tokenPath = /protocol/openid-connect/token
20+
gravitino.authenticator.oauth.scope = openid catalog
21+
gravitino.authenticator.oauth.clientId = iceberg-client
22+
gravitino.authenticator.oauth.clientSecret = iceberg-client-secret
23+
24+
gravitino.authenticator.oauth.tokenValidatorClass = org.apache.gravitino.server.authentication.JwksTokenValidator
25+
gravitino.authenticator.oauth.jwksUri = http://keycloak:8080/realms/hive/protocol/openid-connect/certs
26+
gravitino.authenticator.oauth.provider = default
27+
gravitino.authenticator.oauth.principalFields = sub
28+
gravitino.authenticator.oauth.allowSkewSecs = 60
29+
gravitino.authenticator.oauth.serviceAudience = hive-iceberg
30+
31+
# --- Logging ---
32+
gravitino.logging.level = INFO
33+

0 commit comments

Comments
 (0)