-
Notifications
You must be signed in to change notification settings - Fork 121
Fix/register partitioned tables in glue #523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
firewall413
wants to merge
65
commits into
duckdb:master
Choose a base branch
from
firewall413:fix/register-partitioned-tables-in-glue
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Fix/register partitioned tables in glue #523
firewall413
wants to merge
65
commits into
duckdb:master
from
firewall413:fix/register-partitioned-tables-in-glue
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
firewall413
commented
Mar 20, 2025
- resolved issue when writing a new partition to glue for a table with the same schema would result in multiple glue schema versions (instead of just adding a partition)
- resolved issue where when writing new partition to Glue: The schema of the file at external_read_location //*.parquet would be picked up instead of actual last written partition year=2024/month=03/day=20/*.parquet
… will update the Glue schema to the schema of the latest partition (which in turn allows for append-only schema evolution in later partitions, this is supported by Glue)
…sult in multiple schema versions (instead of just adding a partition)
@firewall413, hey nice to see you-- and thanks for putting this together! |
Still working on those failing tests, my bad, I'll finish up after my holidays! |
…oint Remove duckdbt entrypoint for now
Add info on the interactive shell for the DuckDB UI to the README
Bumps [dbt-tests-adapter](https://github.com/dbt-labs/dbt-adapters) from 1.11.0 to 1.15.0. - [Release notes](https://github.com/dbt-labs/dbt-adapters/releases) - [Commits](https://github.com/dbt-labs/dbt-adapters/commits) --- updated-dependencies: - dependency-name: dbt-tests-adapter dependency-version: 1.15.0 dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]>
…apter-1.15.0 Bump dbt-tests-adapter from 1.11.0 to 1.15.0
Apply project quoting setting to add/remove columns
As per the [DuckDB docs](https://duckdb.org/docs/stable/extensions/postgres.html#managing-multiple-secrets), a secret can be given a name (which is already possible) and this secret can be referenced in the attach statement. This is convenient if you want to e.g. attach multiple postgres databases
Add named secret parameter to attachment class for profiles.yml
…d up all of a sudden
…lis/read-.duckdbrc-file-on-duckdbt-cli-startup Read .duckdbrc file on DuckDBT CLI startup
…failures See if this is related to the mysterious doc test failures that popped up on master
Bumps [freezegun](https://github.com/spulec/freezegun) from 1.5.1 to 1.5.2. - [Release notes](https://github.com/spulec/freezegun/releases) - [Changelog](https://github.com/spulec/freezegun/blob/master/CHANGELOG) - [Commits](spulec/freezegun@1.5.1...1.5.2) --- updated-dependencies: - dependency-name: freezegun dependency-version: 1.5.2 dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]>
Enhances the Attachment dataclass to support arbitrary ATTACH statement options via an options dict, enabling users to leverage new DuckDB attachment features without waiting for explicit dbt-duckdb support. Key changes: - Add optional options field to Attachment dataclass for arbitrary key-value pairs - Update to_sql method to handle options dict with conflict detection - Maintain full backward compatibility with existing type, secret, read_only fields - Add comprehensive tests covering new functionality and edge cases - Update README with documentation and examples of the new feature - Remove outdated information about DuckDB 0.7.0 and run hooks from docs 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Bump freezegun from 1.5.1 to 1.5.2
Allow the database field in credentials to match any alias from the attach list, enabling users to specify an attached database as their primary database instead of being restricted to the path-derived database name. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
…y_kv Add support for arbitrary key-value pairs in Attachment ATTACH options
…uckdb#557) * feat: add conditional cascade removal for ducklake relations with logging * Remove cascade for testing * Introduce error for testing * fix: correct drop_relation macro with proper syntax and cascade handling * refactor: update duckdb drop_relation macro to handle path object safely * refactor: simplify ducklake relation detection in drop_relation macro * refactor: update drop_relation macro to check ducklake path directly * fix: modify ducklake path detection in drop_relation macro * The changes look good. I'll summarize the key modifications: 1. For both `duckdb__drop_relation` and `duckdb__drop_schema` macros: - Added a namespace approach to track whether the relation/schema is associated with a ducklake attachment - Check if the relation's database matches an attachment's alias - Check if the attachment path contains 'ducklake:' - If a ducklake attachment is found, drop without cascade - Otherwise, drop with cascade 2. Removed previous hardcoded checks for 'ducklake:' in the database name 3. Used a more robust method to detect ducklake-related relations/schemas The implementation ensures that: - Ducklake-attached databases are handled differently - The check is performed dynamically based on the profile configuration - The existing cascade behavior is preserved for non-ducklake databases Commit message: refactor: improve ducklake detection in drop relation and schema macros * refactor: remove log statements from drop macros * refactor: extract ducklake detection into reusable helper macro - Create is_ducklake_database() helper macro to centralize ducklake detection logic - Eliminate code duplication between drop_schema and drop_relation macros - Improve maintainability by having single source of truth for ducklake checks - Use early return optimization in helper macro for better performance 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * refactor: move ducklake detection to adapter class with @available decorator - Add is_ducklake() method to DuckDBAdapter class as @available method - Remove Jinja macro approach in favor of proper Python implementation - Pass full relation object to adapter method for better type safety - Improve testability by moving logic to adapter class - Follow dbt-duckdb conventions for adapter functionality Addresses maintainer feedback to use @available adapter methods for complex logic. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * test: add comprehensive unit tests for is_ducklake adapter method - Test all scenarios: no attach config, empty config, ducklake paths, regular paths - Test edge cases: None relation, missing database, missing alias, empty path - Test mixed attachment configurations with multiple database types - All 9 test cases pass, ensuring robust ducklake detection logic 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * fix: update tests to use correct ducklake path format - Change test paths from "ducklake:s3://..." to "ducklake:sqlite:storage/metadata.sqlite" - Use proper ducklake format as shown in real profiles.yml examples - All 9 tests still pass with corrected path formats - Ensures tests match actual ducklake usage patterns 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Revert drop schema * perf: optimize is_ducklake with efficient set lookup - Add _ducklake_dbs set to DuckDBCredentials.__post_init__ for O(1) lookup - Simplify is_ducklake method to use set membership instead of O(n) iteration - Maintains same functionality while minimizing overhead for non-ducklake projects 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]>
I believe this fixes the root cause of duckdb#513 ## Problem Currently, when using MotherDuck with dbt-duckdb, a real temp schema and real tables will be created to store the staging tables, because temp tables are not yet supported. By default this schema is called `dbt_temp`. When running dbt with 2 or more threads, each incremental model will open a transaction that starts with the statement `CREATE SCHEMA IF NOT EXISTS dbt_temp`. The model that gets executed first will cause a catalog change with the `CREATE SCHEMA` statement while its transaction is still open, while the rest of the models will fail with a write-write conflict on that statement. ## Solution & Considerations If the schema `dbt_temp` already exists, then the `CREATE SCHEMA` statement will be a no-op, then all incremental models will succeed. So ideally, the create schema statement should be run just once at the beginning of a dbt run. But the current implementation allows each model to configure their own temp schema. This means that we won't fully know the name of the schema that needs to be created until processing the model, also that the`CREATE SCHEMA IF NOT EXISTS {temp_schema_name}` statement in `macros/materializations/incremental.sql` needs to be preserved. So the fix will just address the problem of creating the default temp schema `dbt_temp`, by running `CREATE SCHEMA IF NOT EXISTS dbt_temp` when initializing the connection.
--- updated-dependencies: - dependency-name: sigstore/gh-action-sigstore-python dependency-version: 3.0.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [mypy](https://github.com/python/mypy) from 1.16.0 to 1.16.1. - [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md) - [Commits](python/mypy@v1.16.0...v1.16.1) --- updated-dependencies: - dependency-name: mypy dependency-version: 1.16.1 dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…existence of upstream_nodes (duckdb#588)
Bumps [freezegun](https://github.com/spulec/freezegun) from 1.5.2 to 1.5.3. - [Release notes](https://github.com/spulec/freezegun/releases) - [Changelog](https://github.com/spulec/freezegun/blob/master/CHANGELOG) - [Commits](spulec/freezegun@1.5.2...1.5.3) --- updated-dependencies: - dependency-name: freezegun dependency-version: 1.5.3 dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
--- updated-dependencies: - dependency-name: pyarrow dependency-version: 21.0.0 dependency-type: direct:development update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Add struct column support with BigQuery-style flatten functionality - Add fields property to DuckDBColumn to support nested struct columns - Implement create_from_dict and create factory methods for column creation - Add is_struct method to detect struct data types - Add flatten method to recursively flatten nested struct columns into dot-notation columns - Update tests to use new factory methods and add comprehensive struct tests - Fix return type annotation in get_column_schema_from_query method 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Simplify struct column implementation - Remove unnecessary factory methods and auto-conversion logic - Use direct constructor calls instead of create() methods - Eliminate __post_init__ and create_from_dict complexity - Simplify flatten() method to use basic constructor - Update tests to use simplified API 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Fix lint error: avoid shadowing 'field' import with loop variable * feat: Add struct flattening to get_column_schema_from_query This commit introduces automatic flattening of STRUCT data types within the `get_column_schema_from_query` method. Previously, when a query returned a STRUCT, it was treated as a single, opaque column. This change recursively unnests STRUCTs, creating a separate column for each field, with names reflecting their nested path (e.g., `struct_column.field_name`). Changes include: - A `__post_init__` method was added to the `DuckDBColumn` class to parse STRUCT type definitions and build a nested representation of their fields. - The `flatten` method in `DuckDBColumn` was updated to recursively traverse the nested structure and produce a flat list of columns. - The `get_column_schema_from_query` method in `DuckDBAdapter` was updated to use the new `DuckDBColumn` functionality. - Tests for `DuckDBColumn` were updated to reflect the new parsing logic and to be more concise. * undo those changes * rm print in test --------- Co-authored-by: Claude <[email protected]>
--- updated-dependencies: - dependency-name: mypy dependency-version: 1.17.0 dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [dbt-tests-adapter](https://github.com/dbt-labs/dbt-adapters) from 1.16.0 to 1.17.0. - [Release notes](https://github.com/dbt-labs/dbt-adapters/releases) - [Commits](https://github.com/dbt-labs/dbt-adapters/commits) --- updated-dependencies: - dependency-name: dbt-tests-adapter dependency-version: 1.17.0 dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [mypy](https://github.com/python/mypy) from 1.17.0 to 1.17.1. - [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md) - [Commits](python/mypy@v1.17.0...v1.17.1) --- updated-dependencies: - dependency-name: mypy dependency-version: 1.17.1 dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 4 to 5. - [Release notes](https://github.com/actions/download-artifact/releases) - [Commits](actions/download-artifact@v4...v5) --- updated-dependencies: - dependency-name: actions/download-artifact dependency-version: '5' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [freezegun](https://github.com/spulec/freezegun) from 1.5.3 to 1.5.5. - [Release notes](https://github.com/spulec/freezegun/releases) - [Changelog](https://github.com/spulec/freezegun/blob/master/CHANGELOG) - [Commits](spulec/freezegun@1.5.3...1.5.5) --- updated-dependencies: - dependency-name: freezegun dependency-version: 1.5.5 dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Fix secrets map/list type handling in DuckDB SQL generation Add support for proper DuckDB map and array syntax when converting dict and list types in secret parameters to SQL. Previously, these complex types were incorrectly wrapped in quotes, causing syntax errors like "Parser Error: syntax error at or near 'TYPE'". Changes: - Add _format_value() method to handle dict, list, and scalar types - Dict values now generate: map {'key': 'value', 'key2': 'value2'} - List values now generate: array ['item1', 'item2', 'item3'] - Scalar values maintain existing behavior with proper quoting This fixes issues with ducklake secrets that use metadata_parameters as a map type, enabling proper conversion from YAML to DuckDB SQL. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * Remove redundant test assertions for map/list secrets Removed trivial property access assertions in favor of focusing on the meaningful SQL generation verification. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]>
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 5. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@v4...v5) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '5' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…bility (duckdb#612) Replace logger.exception with logger.warning + traceback formatting to preserve stack trace information while using WARNING level instead of ERROR level. This prevents orchestrators like Dagster from treating the log entry as a fatal exception that requires process termination. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <[email protected]>
… will update the Glue schema to the schema of the latest partition (which in turn allows for append-only schema evolution in later partitions, this is supported by Glue)
…sult in multiple schema versions (instead of just adding a partition)
…hub.com/firewall413/dbt-duckdb into fix/register-partitioned-tables-in-glue
> cursor.execute("CREATE TABLE tt1 (id int, name text)") E sqlite3.OperationalError: table tt1 already exists
Would it be possible to have a look at the Motherduck test? It seems to be blocking this PR |
i'll take a look after this run completes. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.