-
Notifications
You must be signed in to change notification settings - Fork 0
Schema migrations: Template file blueprints and minimal CrateDB implementation #107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThis change introduces schema migration support to the CrateDB Fivetran Destination connector. It adds a new Changes
Sequence DiagramsequenceDiagram
participant Client
participant CrateDBDestinationImpl
participant SchemaMigrationHelper
participant TableMetadataHelper
participant Database
Client->>CrateDBDestinationImpl: Migrate(request, context)
CrateDBDestinationImpl->>CrateDBDestinationImpl: Extract schema & operation type
alt Operation Type Match
CrateDBDestinationImpl->>SchemaMigrationHelper: handle_drop/copy/rename/add/...(op, schema, table)
activate SchemaMigrationHelper
alt Simple In-Memory Op
SchemaMigrationHelper->>TableMetadataHelper: Mutate table metadata
TableMetadataHelper-->>SchemaMigrationHelper: Updated state
else SQL-Generating Op
SchemaMigrationHelper->>Database: Execute DDL/DML
Database-->>SchemaMigrationHelper: Result
SchemaMigrationHelper->>TableMetadataHelper: Sync in-memory state
end
SchemaMigrationHelper-->>CrateDBDestinationImpl: MigrateResponse(success)
deactivate SchemaMigrationHelper
else Unknown Operation
CrateDBDestinationImpl-->>Client: MigrateResponse(unsupported)
end
CrateDBDestinationImpl-->>Client: MigrateResponse
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related issues
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
a60831c to
1ce0df3
Compare
c4949c1 to
9dfa63d
Compare
9dfa63d to
b7d7503
Compare
b7d7503 to
f414b75
Compare
Migrate::*| class CrateDBDestinationImpl(destination_sdk_pb2_grpc.DestinationConnectorServicer): | ||
| table_map: t.Dict[str, FivetranTable] = {} | ||
|
|
||
| def __init__(self): | ||
| self.metadata = sa.MetaData() | ||
| self.engine: sa.Engine = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
table_map must go away, because this is part of the simulation. Subsequent patches will take care of that.
| def Migrate(self, request, context): | ||
| """ | ||
| Example implementation of the new Migrate RPC introduced for schema migration support. | ||
| This method inspects which migration operation (oneof) was requested and logs / handles it. | ||
| For demonstration, all recognized operations return `success`. | ||
| :param request: The migration request contains details of the operation. | ||
| :param context: gRPC context | ||
| Note: This is just for demonstration, so no logic for migration is implemented | ||
| rather different migration methods are just manipulating table_map to simulate | ||
| the migration operations. | ||
| """ | ||
| self._configure_database(request.configuration.get("url")) | ||
| migration_helper = SchemaMigrationHelper(self.engine, CrateDBDestinationImpl.table_map) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dito: table_map must go away. This is part of the simulation.
f414b75 to
7ee9a0f
Compare
16cfbd0 to
b7c57e6
Compare
e4b18d2 to
d2d0766
Compare
surister
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some nitpick comments at your decision.
Also a curious question, can Fivetran emit schema migrations operations that we don't support, if so how is it handled?
Currently, in 0.0.x lands, I am just going with the flow of gradually implementing all the code needed to satisfy the Fivetran SDK tester, without thinking about much else. I admit I had the same question on my mind, but put it away since then. Exploring the surface of possible operations and matching them against CrateDB's capabilities absolutely makes sense, and should finally condensate into some pieces of documentation to satisfy readers with the same question on their minds. In general, I think the API and code blueprints provided by Fivetran pretty much outline the surface and provide excellent rails to run the implementation, paired with the SDK tester that runs the trajectory of test case files. In this spirit, I am accepting relevant resources from the fivetran_partner_sdk as reference about relevant topics, but I am also curious if that leaves some gaps. I think a proper quality assurance procedure should take care about those details. Let's have @coderabbitai create a ticket about this to track it into the future? |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain@amotl: I'll create an issue to track this important QA and documentation work around schema migration operations support. 🏁 Script executed: Length of output: 64 @amotl: I've created issue #119 to track this important work. The issue outlines the need to explore the full surface of Fivetran schema migration operations, match them against CrateDB's capabilities, identify any gaps, and document the support matrix for users. Your approach of incrementally implementing features to satisfy the SDK tester while deferring comprehensive analysis makes sense for early development. This issue will ensure that the QA and documentation work gets proper attention in future iterations. |
Source: Fivetran Partner SDK repository https://github.com/fivetran/fivetran_partner_sdk/tree/main/examples/destination_connector/python
This satisfies SDK tester's `schema_migrations_input_dml.json`.
The tables `transaction_drop` and `transaction_new` no longer exist
at the end of the workflow: `transaction_drop` has been dropped, and
`transaction_new` was renamed to `transaction_renamed`.
Inquiring them with `DescribeTable` fails with:
Partner Sdk Warning. Message: Table not found: transaction_{drop,new}
The `ALTER TABLE ... ADD COLUMN ... DEFAULT` statement of CrateaDB does not respect the right semantics of the DEFAULT clause yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
tests/test_integration.py (1)
141-152: Tidy up unused fixture argument and capturedoutto satisfy Ruff.The test logic looks fine, but Ruff flags the unused
servicesargument andoutvariable. You can keep the behavior and silence the warnings by marking them as intentionally unused:-@pytest.mark.parametrize("services", ["./tests/data/fivetran_migrations_dml"], indirect=True) -def test_integration_fivetran_migrations_dml(capfd, services): +@pytest.mark.parametrize("services", ["./tests/data/fivetran_migrations_dml"], indirect=True) +def test_integration_fivetran_migrations_dml(capfd, _services): @@ - # Read out stdout and stderr. - out, err = capfd.readouterr() + # Read out stdout and stderr. + _, err = capfd.readouterr()src/cratedb_fivetran_destination/main.py (3)
32-39: Annotatetable_mapas aClassVarto satisfy Ruff and clarify intent.
table_mapis a mutable class-level dictionary shared across instances. Ruff (RUF012) prefers these to be annotated as class variables. You can keep the behavior and make the typing intent explicit:-class CrateDBDestinationImpl(destination_sdk_pb2_grpc.DestinationConnectorServicer): - table_map: t.Dict[str, FivetranTable] = {} +class CrateDBDestinationImpl(destination_sdk_pb2_grpc.DestinationConnectorServicer): + table_map: t.ClassVar[t.Dict[str, FivetranTable]] = {}This also makes it clearer to readers and type checkers that the map is global to the servicer class, not per-instance state.
269-321: Migrate wiring is aligned with the blueprint and minimal-goal approach.The
Migrateimplementation:
- Configures the database from
request.configuration.- Extracts
schema/tableviarequest.details.- Logs the chosen operation and delegates to the corresponding
SchemaMigrationHelper.handle_*method.- Returns an
unsupported=Trueresponse for unknown or missing operations.That matches the intended “minimal implementation + table_map simulation” approach and should be sufficient to satisfy the current SDK tester DML scenarios.
If you ever want to trim some overhead later, you could lazily call
DescribeTableonly for operations that actually needtable_obj(e.g. copy/add) to avoid an extra reflection round-trip on simple drop/rename operations, but that feels optional at current scale.
374-400: Consider reusingschema_name()intable_fullname()for consistency.You introduced
schema_name(request)to handle both regular and migration-style requests, buttable_fullname()still hardcodesrequest.schema_name. To keep schema resolution consistent across helpers and future-proof against additional request shapes, you could delegate:def table_fullname(self, request): """ Return full-qualified table name from request object. """ - table_name = self.table_name(request) - return f'"{request.schema_name}"."{table_name}"' + table_name = self.table_name(request) + schema_name = self.schema_name(request) + return f'"{schema_name}"."{table_name}"'Today this changes behavior only if you ever call
table_fullnamewith a request that usesdetails.schemarather thanschema_name, but it keeps everything in sync with your new helper.src/cratedb_fivetran_destination/schema_migration_helper.py (1)
155-199: CrateDB-specific ADD COLUMN with default workaround looks correct; consider safer quoting for future.The
add_column_with_default_valuebranch correctly implements the documented CrateDB workaround by:
- Adding the new column without a
DEFAULT, then- Optionally backfilling via
UPDATEwhendefault_value is not None.This matches the guidance around avoiding
ALTER TABLE ... ADD COLUMN ... DEFAULTand instead pairingADD COLUMNwith anUPDATEfor existing rows, while still updating the in-memory table metadata.One thing to keep in mind for later hardening: the backfill statement embeds
default_valuedirectly into the SQL string:sql_bag.add( f'UPDATE "{schema}"."{table}" SET "{new_col.name}" = \'{default_value}\';' )If a future migration ever supplies a string default containing quotes or other special characters, this could break the SQL or require careful escaping. You already use a bound-parameter pattern in
handle_update_column_value; reusing a similar approach here (even if that means running the DDL and UPDATE separately instead of via a singleSqlBag) would make this more robust when you move beyond the minimal SDK-tester scope.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
CHANGES.md(1 hunks)src/cratedb_fivetran_destination/main.py(4 hunks)src/cratedb_fivetran_destination/schema_migration_helper.py(1 hunks)tests/data/fivetran_migrations_dml/configuration.json(1 hunks)tests/data/fivetran_migrations_dml/schema_migrations_input_dml.json(1 hunks)tests/test_integration.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- tests/data/fivetran_migrations_dml/schema_migrations_input_dml.json
- tests/data/fivetran_migrations_dml/configuration.json
- CHANGES.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-29T21:01:36.016Z
Learnt from: amotl
Repo: crate/cratedb-fivetran-destination PR: 107
File: src/cratedb_fivetran_destination/schema_migration_helper.py:1-2
Timestamp: 2025-11-29T21:01:36.016Z
Learning: In the crate/cratedb-fivetran-destination repository, there is no ruff.toml file. Ruff configuration is in pyproject.toml, and E501 (line-too-long) is NOT disabled project-wide; it is actually enabled as part of the Pycodestyle rules.
Applied to files:
src/cratedb_fivetran_destination/schema_migration_helper.py
🧬 Code graph analysis (1)
src/cratedb_fivetran_destination/schema_migration_helper.py (2)
src/cratedb_fivetran_destination/model.py (4)
SqlBag(182-198)TypeMap(57-132)execute(196-198)add(192-194)src/cratedb_fivetran_destination/util.py (1)
log_message(10-11)
🪛 Ruff (0.14.7)
src/cratedb_fivetran_destination/schema_migration_helper.py
1-1: Unused noqa directive (non-enabled: E501)
Remove unused noqa directive
(RUF100)
185-185: Unused noqa directive (non-enabled: ERA001)
Remove unused noqa directive
(RUF100)
tests/test_integration.py
142-142: Unused function argument: services
(ARG001)
148-148: Unpacked variable out is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
src/cratedb_fivetran_destination/main.py
33-33: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
🔇 Additional comments (2)
src/cratedb_fivetran_destination/main.py (1)
242-268: DescribeTable’s schema handling for migration requests looks correct.Using
schema_name = self.schema_name(request)andtable_name = self.table_name(request)allowsDescribeTableto work uniformly for both “legacy” requests (withschema_name/table) and Migrate requests that only populatedetails.schema/details.table. This is consistent with howMigratelater callsDescribeTable(request, context)directly.No changes needed here from my side.
src/cratedb_fivetran_destination/schema_migration_helper.py (1)
307-366: TableMetadataHelper utilities are a reasonable way to simulate in-memory table changes.The helper methods (
create_table_copy,remove_column_from_table,remove_history_mode_columns,add_history_mode_columns,add_soft_delete_column) use protobuf cloning and simple list operations oncolumns, which is appropriate for maintaining the simulatedtable_mapstate the blueprint relies on.Given the current goal is to minimally simulate history/soft-delete behavior for the SDK tests, this abstraction looks fine and keeps the migration handlers decently readable.
About
Minimally implement schema migrations, aka.
MigrategRPC endpoint.Details
This satisfies SDK tester's
schema_migrations_input_dml.json. The blueprint for theschema_migration_helper.pywas taken from the Fivetran repository.Review
8aee4fd adds the original blueprint templates provided by Fivetran, and d2b8c23 amends just that amount of code needed to satisfy the new test cases defined by
schema_migrations_input_dml.json. In this spirit, you will find quite a few more "[...] manipulation to simulate [...]" spots in theschema_migration_helper.pyfile. They will be filled out by subsequent patches.Details
The third commit, 7ee9a0f, is related to an issue we observed with one of the SDK tester workflow definition files, see fivetran/fivetran_partner_sdk#154.
The fourth commit, d2d0766, is related to another issue we observed with the
ALTER TABLE ... ADD COLUMN ... DEFAULTstatement of CrateDB, see crate/crate#18783.References