|
| 1 | +# Hive + Gravitino + Keycloak: Docker-Compose Setup |
| 2 | + |
| 3 | +This repository contains a docker-compose-based setup integrating Apache Hive, Gravitino Iceberg REST server, and Keycloak for OAuth2 authentication. It allows Hive to use an Iceberg REST catalog secured via Keycloak. |
| 4 | + |
| 5 | +## Table of Contents |
| 6 | +- Architecture Overview |
| 7 | +- Prerequisites |
| 8 | +- Quickstart |
| 9 | +- Configuration |
| 10 | + - Keycloak |
| 11 | + - Gravitino |
| 12 | + - Hive |
| 13 | +- Networking Notes |
| 14 | + |
| 15 | +## Architecture Overview |
| 16 | +This diagram illustrates the key docker-compose components and their interactions in this setup: |
| 17 | + |
| 18 | +``` |
| 19 | + oAuth2 (REST API) |
| 20 | + +-------------------------------------------------------------------+ |
| 21 | + | | |
| 22 | + | v |
| 23 | ++--------+----------+ +-------------------+ +-----------------+ |
| 24 | +| | RESTCatalog | | oauth2 | | |
| 25 | +| Hive | (REST API) | Gravitino | (REST API) | Keycloak | |
| 26 | +| (HiveServer2) +-------------->| Iceberg REST +----------->| OAuth2 Auth | |
| 27 | +| | | Server | | Server | |
| 28 | ++--------+----------+ +---------+---------+ +-----------------+ |
| 29 | + | | |
| 30 | + writes | writes metadata | |
| 31 | + data +------------------------------------+ |
| 32 | + | |
| 33 | + v |
| 34 | ++-------------------+ +-------------------+ |
| 35 | +| | creates dir | | |
| 36 | +| /warehouse |<--------------+ init | |
| 37 | +| (Docker volume) | sets | container | |
| 38 | +| | permissions | | |
| 39 | ++-------------------+ +-------------------+ |
| 40 | +``` |
| 41 | + |
| 42 | +- Hive: |
| 43 | + - Runs HiveServer2, connects to Gravitino via Iceberg REST catalog. |
| 44 | + - Write Iceberg data files to the shared warehouse volume. |
| 45 | +- Gravitino: |
| 46 | + - Exposes REST API for Iceberg catalog. |
| 47 | + - Writes Iceberg metadata files to shared warehouse volume (.metadata.json). |
| 48 | + - Doesn't supports serving as oauth2 provider, so this example uses an external OAuth2 provider (Keyclock). |
| 49 | +- Keycloak: |
| 50 | + - OAuth2 server providing authentication and token issuance for Hive/Gravitino. |
| 51 | +- /warehouse: |
| 52 | + - Shared Docker volume for Iceberg table data and metadata. |
| 53 | +- Init container: |
| 54 | + - Creates shared /warehouse folder and sets filesystem permissions as a one time initialization step. |
| 55 | + |
| 56 | +## Prerequisites |
| 57 | +- Docker & Docker Compose |
| 58 | +- Java (for local Hive beeline client) |
| 59 | +- ```$HIVE_HOME``` environment variable pointing to Hive installation (for connecting to Beeline) |
| 60 | + |
| 61 | +## Quickstart |
| 62 | +### STEP 1: Build Hive Docker Image |
| 63 | +- Currently, only Hive 4.2.0-SNAPSHOT has the required Iceberg REST catalog client. |
| 64 | +- To build the Hive Docker image with Iceberg REST catalog client on your local machine, follow these steps: |
| 65 | + - Build Hive using Docker profile. |
| 66 | + - Copy ```apache-hive-4.2.0-SNAPSHOT-bin.tar.gz``` to ```packaging/cache``` folder. |
| 67 | + - Execute ```build.sh``` from ```packaging/src/docker/``` folder. |
| 68 | + |
| 69 | +### STEP 2: Export the Hive version |
| 70 | +```shell |
| 71 | +export HIVE_VERSION=4.2.0-SNAPSHOT |
| 72 | +``` |
| 73 | + |
| 74 | +### STEP 3: Start services |
| 75 | +```shell |
| 76 | +docker-compose up -d |
| 77 | +``` |
| 78 | + |
| 79 | +### STEP 4: Connect to beeline |
| 80 | +```shell |
| 81 | +"${HIVE_HOME}/bin/beeline" -u "jdbc:hive2://localhost:10001/default" -n hive -p hive |
| 82 | +``` |
| 83 | + |
| 84 | +### STEP 5: Stop services: |
| 85 | +```shell |
| 86 | +docker-compose down -v |
| 87 | +``` |
| 88 | + |
| 89 | +### Configuration |
| 90 | + |
| 91 | +#### Keycloak |
| 92 | + |
| 93 | +- Realm: hive |
| 94 | +- Client: iceberg-client |
| 95 | + - Secret: iceberg-client-secret |
| 96 | + - Protocol: OpenID Connect |
| 97 | + - Audience: hive-iceberg |
| 98 | +- Imported via `realm-export.json` in Keycloak container. |
| 99 | +- Port: 8080 |
| 100 | + |
| 101 | +#### Gravitino |
| 102 | + |
| 103 | +- HTTP port: 9001 |
| 104 | +- Catalog backend: JDBC H2 (/tmp/gravitino_h2_db) |
| 105 | +- Warehouse: /warehouse (shared with Hive) |
| 106 | +- Iceberg REST Catalog Backend config: |
| 107 | + ``` |
| 108 | + # Backend type for the catalog. Here we use JDBC (H2 database) as the metadata store. |
| 109 | + gravitino.iceberg-rest.catalog-backend = jdbc |
| 110 | + |
| 111 | + # JDBC connection URI for the H2 database storing catalog metadata. |
| 112 | + gravitino.iceberg-rest.uri = jdbc:h2:file:/tmp/gravitino_h2_db;AUTO_SERVER=TRUE |
| 113 | + |
| 114 | + # JDBC driver class used to connect to the metadata database. |
| 115 | + gravitino.iceberg-rest.jdbc-driver = org.h2.Driver |
| 116 | + |
| 117 | + # Database username for connecting to the metadata store. |
| 118 | + gravitino.iceberg-rest.jdbc-user = sa |
| 119 | + |
| 120 | + # Database password for connecting to the metadata store (empty here). |
| 121 | + gravitino.iceberg-rest.jdbc-password = "" |
| 122 | + |
| 123 | + # Whether to initialize the catalog schema on startup. |
| 124 | + gravitino.iceberg-rest.jdbc-initialize = true |
| 125 | + |
| 126 | + # --- Warehouse Location (shared folder) --- |
| 127 | + |
| 128 | + # Path to the Iceberg warehouse directory shared with Hive. |
| 129 | + gravitino.iceberg-rest.warehouse = file:///warehouse |
| 130 | + ``` |
| 131 | +- OAuth2 config pointing to Keycloak: |
| 132 | + ``` |
| 133 | + # Enables OAuth2 as the authentication mechanism for Gravitino. |
| 134 | + gravitino.authenticators = oauth |
| 135 | + |
| 136 | + # URL of the Keycloak realm to request tokens from. |
| 137 | + gravitino.authenticator.oauth.serverUri = http://keycloak:8080/realms/hive |
| 138 | + |
| 139 | + # Path to the OAuth2 token endpoint on Keycloak. |
| 140 | + gravitino.authenticator.oauth.tokenPath = /protocol/openid-connect/token |
| 141 | + |
| 142 | + # OAuth2 scopes requested when obtaining a token. Includes "openid" and the custom "catalog" scope. |
| 143 | + gravitino.authenticator.oauth.scope = openid catalog |
| 144 | + |
| 145 | + # OAuth2 client ID registered in Keycloak. |
| 146 | + gravitino.authenticator.oauth.clientId = iceberg-client |
| 147 | + |
| 148 | + # OAuth2 client secret associated with the client ID. |
| 149 | + gravitino.authenticator.oauth.clientSecret = iceberg-client-secret |
| 150 | + |
| 151 | + # Java class used to validate incoming JWT tokens using the JWKS endpoint. |
| 152 | + gravitino.authenticator.oauth.tokenValidatorClass = org.apache.gravitino.server.authentication.JwksTokenValidator |
| 153 | + |
| 154 | + # URL to fetch JSON Web Key Set (JWKS) for verifying token signatures. |
| 155 | + gravitino.authenticator.oauth.jwksUri = http://keycloak:8080/realms/hive/protocol/openid-connect/certs |
| 156 | + |
| 157 | + # Identifier for the OAuth2 provider configuration in Gravitino. |
| 158 | + gravitino.authenticator.oauth.provider = default |
| 159 | + |
| 160 | + # JWT claim field(s) to extract as the principal/username (here, 'sub' claim). |
| 161 | + gravitino.authenticator.oauth.principalFields = sub |
| 162 | + |
| 163 | + # Acceptable clock skew (in seconds) when validating token expiration times. |
| 164 | + gravitino.authenticator.oauth.allowSkewSecs = 60 |
| 165 | + |
| 166 | + # Expected audience claim in the token to ensure it is intended for this service. |
| 167 | + gravitino.authenticator.oauth.serviceAudience = hive-iceberg |
| 168 | + ``` |
| 169 | +
|
| 170 | +#### Hive |
| 171 | +
|
| 172 | +- Uses ```HiveRESTCatalogClient``` for connecting to Iceberg REST catalog (Gravitino). |
| 173 | +- Catalog configuration in ```hive-site.xml```: |
| 174 | + ``` |
| 175 | + <property> |
| 176 | + <name>metastore.catalog.default</name> |
| 177 | + <value>ice01</value> |
| 178 | + <description>Sets the default Iceberg catalog for Hive. Here, "ice01" is used.</description> |
| 179 | + </property> |
| 180 | + |
| 181 | + <property> |
| 182 | + <name>metastore.client.impl</name> |
| 183 | + <value>org.apache.iceberg.hive.client.HiveRESTCatalogClient</value> |
| 184 | + <description>Specifies the client implementation to use for accessing Iceberg via REST.</description> |
| 185 | + </property> |
| 186 | + |
| 187 | + <property> |
| 188 | + <name>iceberg.catalog.ice01.uri</name> |
| 189 | + <value>http://gravitino:9001/iceberg</value> |
| 190 | + <description>URI of the Iceberg REST server (Gravitino). Hive will send catalog requests here.</description> |
| 191 | + </property> |
| 192 | + |
| 193 | + <property> |
| 194 | + <name>iceberg.catalog.ice01.type</name> |
| 195 | + <value>rest</value> |
| 196 | + <description>Defines the catalog type as "rest", indicating it uses a REST API backend.</description> |
| 197 | + </property> |
| 198 | + |
| 199 | + <!-- Iceberg REST Catalog: OAuth2 authentication --> |
| 200 | + |
| 201 | + <property> |
| 202 | + <name>iceberg.catalog.ice01.rest.auth.type</name> |
| 203 | + <value>oauth2</value> |
| 204 | + <description>Configures Hive to use OAuth2 for authenticating requests to the REST catalog.</description> |
| 205 | + </property> |
| 206 | + |
| 207 | + <property> |
| 208 | + <name>iceberg.catalog.ice01.oauth2-server-uri</name> |
| 209 | + <value>http://keycloak:8080/realms/hive/protocol/openid-connect/token</value> |
| 210 | + <description>URL of the Keycloak OAuth2 token endpoint used to request access tokens.</description> |
| 211 | + </property> |
| 212 | + |
| 213 | + <property> |
| 214 | + <name>iceberg.catalog.ice01.credential</name> |
| 215 | + <value>iceberg-client:iceberg-client-secret</value> |
| 216 | + <description>Client credentials (ID and secret) used to authenticate with Keycloak.</description> |
| 217 | + </property> |
| 218 | + ``` |
| 219 | +- HiveServer2 port: 10000 (mapped to 10001 in Docker Compose) |
| 220 | +
|
| 221 | +## Networking Notes |
| 222 | +
|
| 223 | +- All containers share a custom bridge network ```hive-net```. |
| 224 | +- Services communicate via container names: hive, gravitino, keycloak. |
| 225 | +- Ports mapped for host access: |
| 226 | + - Keycloak → 8080 |
| 227 | + - Gravitino → 9001 |
| 228 | + - HiveServer2 → 10001 |
| 229 | +
|
0 commit comments