Skip to content

Conversation

@amotl
Copy link
Member

@amotl amotl commented Nov 28, 2025

About

Minimally implement schema migrations, aka. Migrate gRPC endpoint.

Details

This satisfies SDK tester's schema_migrations_input_dml.json. The blueprint for the schema_migration_helper.py was taken from the Fivetran repository.

Review

8aee4fd adds the original blueprint templates provided by Fivetran, and d2b8c23 amends just that amount of code needed to satisfy the new test cases defined by schema_migrations_input_dml.json. In this spirit, you will find quite a few more "[...] manipulation to simulate [...]" spots in the schema_migration_helper.py file. They will be filled out by subsequent patches.

Details

$ ag "manipulation to simulate"
src/cratedb_fivetran_destination/schema_migration_helper.py
42:            # table-map manipulation to simulate drop column in history mode, replace with actual logic.
109:            # table-map manipulation to simulate copy table to history mode, replace with actual logic.
166:            # table-map manipulation to simulate add column in history mode, replace with actual logic.
223:            # table-map manipulation to simulate soft delete to live, replace with actual logic.
235:            # table-map manipulation to simulate soft delete to history, replace with actual logic.
248:            # table-map manipulation to simulate history to soft delete, replace with actual logic.
261:            # table-map manipulation to simulate history to live, replace with actual logic.
273:            # table-map manipulation to simulate live to soft delete, replace with actual logic.
285:            # table-map manipulation to simulate live to history, replace with actual logic.

The third commit, 7ee9a0f, is related to an issue we observed with one of the SDK tester workflow definition files, see fivetran/fivetran_partner_sdk#154.

The fourth commit, d2d0766, is related to another issue we observed with the ALTER TABLE ... ADD COLUMN ... DEFAULT statement of CrateDB, see crate/crate#18783.

References

@coderabbitai
Copy link

coderabbitai bot commented Nov 28, 2025

Walkthrough

This change introduces schema migration support to the CrateDB Fivetran Destination connector. It adds a new SchemaMigrationHelper class that handles migration operations (drop, copy, rename, add, update column value, and table sync mode migrations), integrates a Migrate RPC handler in the main implementation, and includes test data and integration tests for schema migrations.

Changes

Cohort / File(s) Summary
Documentation
CHANGES.md
Updated changelog noting schema migrations support in the Unreleased section.
Core Migration Implementation
src/cratedb_fivetran_destination/schema_migration_helper.py
New module introducing SchemaMigrationHelper class with handlers for drop, copy, rename, add, update_column_value, and table_sync_mode_migration operations; includes TableMetadataHelper with static methods for manipulating in-memory table metadata; defines three metadata flag constants (FIVETRAN_START, FIVETRAN_END, FIVETRAN_ACTIVE).
Main Implementation Integration
src/cratedb_fivetran_destination/main.py
Added class-level table_map to CrateDBDestinationImpl; introduced Migrate RPC handler that delegates to operation-specific handlers; added static helper methods schema_name(request) and extended table_name(request) to extract details from request objects.
Test Data
tests/data/fivetran_migrations_dml/...
Added test configuration and schema migration input files: configuration.json (database connection settings) and schema_migrations_input_dml.json (migration scenarios covering copy_column, update_column_value, add_column_with_default_value, copy_table, rename_column, rename_table, drop_table operations).
Integration Test
tests/test_integration.py
Added test_integration_fivetran_migrations_dml(capfd, services) test function to verify schema migration DML execution and DescribeTable output.

Sequence Diagram

sequenceDiagram
    participant Client
    participant CrateDBDestinationImpl
    participant SchemaMigrationHelper
    participant TableMetadataHelper
    participant Database

    Client->>CrateDBDestinationImpl: Migrate(request, context)
    CrateDBDestinationImpl->>CrateDBDestinationImpl: Extract schema & operation type
    alt Operation Type Match
        CrateDBDestinationImpl->>SchemaMigrationHelper: handle_drop/copy/rename/add/...(op, schema, table)
        activate SchemaMigrationHelper
        alt Simple In-Memory Op
            SchemaMigrationHelper->>TableMetadataHelper: Mutate table metadata
            TableMetadataHelper-->>SchemaMigrationHelper: Updated state
        else SQL-Generating Op
            SchemaMigrationHelper->>Database: Execute DDL/DML
            Database-->>SchemaMigrationHelper: Result
            SchemaMigrationHelper->>TableMetadataHelper: Sync in-memory state
        end
        SchemaMigrationHelper-->>CrateDBDestinationImpl: MigrateResponse(success)
        deactivate SchemaMigrationHelper
    else Unknown Operation
        CrateDBDestinationImpl-->>Client: MigrateResponse(unsupported)
    end
    CrateDBDestinationImpl-->>Client: MigrateResponse
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • schema_migration_helper.py: Multiple handler methods with SQL generation, in-memory state mutation, and operation-specific logic paths require careful verification of correctness and edge case handling.
  • main.py: New Migrate handler and helper methods introduce integration points; verify that schema and table extraction from request objects is robust across different request structures.
  • Test data and integration test: Validate that the migration scenarios in JSON files exercise all major code paths and that assertions properly verify end-state consistency.

Possibly related issues

Suggested reviewers

  • seut
  • surister

Poem

🐰 Hops through the schema with glee,
Migrations now dance wild and free,
Copy, rename, and drop with flair,
Table metadata floating through air,
CrateDB blooms where changes take care!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding schema migration template blueprints and a minimal CrateDB implementation of the Migrate gRPC endpoint.
Description check ✅ Passed The description is directly related to the changeset, explaining the purpose of implementing schema migrations and detailing the structure of commits, approach, and future work.
Docstring Coverage ✅ Passed Docstring coverage is 86.96% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch migrations-dml

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@amotl amotl force-pushed the migrations-ddl branch 2 times, most recently from a60831c to 1ce0df3 Compare November 28, 2025 11:40
Base automatically changed from migrations-ddl to main November 28, 2025 11:49
@amotl amotl changed the title Add support for Migrate::* Schema migrations: Template file blueprints and minimal CrateDB implementation Nov 28, 2025
Comment on lines 32 to 37
class CrateDBDestinationImpl(destination_sdk_pb2_grpc.DestinationConnectorServicer):
table_map: t.Dict[str, FivetranTable] = {}

def __init__(self):
self.metadata = sa.MetaData()
self.engine: sa.Engine = None
Copy link
Member Author

@amotl amotl Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

table_map must go away, because this is part of the simulation. Subsequent patches will take care of that.

Comment on lines +269 to +283
def Migrate(self, request, context):
"""
Example implementation of the new Migrate RPC introduced for schema migration support.
This method inspects which migration operation (oneof) was requested and logs / handles it.
For demonstration, all recognized operations return `success`.
:param request: The migration request contains details of the operation.
:param context: gRPC context
Note: This is just for demonstration, so no logic for migration is implemented
rather different migration methods are just manipulating table_map to simulate
the migration operations.
"""
self._configure_database(request.configuration.get("url"))
migration_helper = SchemaMigrationHelper(self.engine, CrateDBDestinationImpl.table_map)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dito: table_map must go away. This is part of the simulation.

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as spam.

Copy link
Member

@surister surister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some nitpick comments at your decision.

Also a curious question, can Fivetran emit schema migrations operations that we don't support, if so how is it handled?

@amotl
Copy link
Member Author

amotl commented Dec 2, 2025

Also a curious question, can Fivetran emit schema migrations operations that we don't support, if so how is it handled?

Currently, in 0.0.x lands, I am just going with the flow of gradually implementing all the code needed to satisfy the Fivetran SDK tester, without thinking about much else. I admit I had the same question on my mind, but put it away since then.

Exploring the surface of possible operations and matching them against CrateDB's capabilities absolutely makes sense, and should finally condensate into some pieces of documentation to satisfy readers with the same question on their minds.

In general, I think the API and code blueprints provided by Fivetran pretty much outline the surface and provide excellent rails to run the implementation, paired with the SDK tester that runs the trajectory of test case files. In this spirit, I am accepting relevant resources from the fivetran_partner_sdk as reference about relevant topics, but I am also curious if that leaves some gaps.

I think a proper quality assurance procedure should take care about those details. Let's have @coderabbitai create a ticket about this to track it into the future?

@coderabbitai
Copy link

coderabbitai bot commented Dec 2, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

@amotl: I'll create an issue to track this important QA and documentation work around schema migration operations support.


🏁 Script executed:

#!/bin/bash
# Create a GitHub issue to track schema migration operations support documentation and verification

gh issue create \
  --repo crate/cratedb-fivetran-destination \
  --title "Document and verify schema migration operations support matrix" \
  --body "## Context

This issue tracks the need to explore, document, and verify which Fivetran schema migration operations are supported by CrateDB and how unsupported operations are handled.

## Background

During review of PR #107, the question arose: Can Fivetran emit schema migration operations that CrateDB doesn't support, and if so, how should they be handled?

The current 0.0.x implementation focuses on satisfying the Fivetran SDK tester test cases without comprehensive analysis of all possible operations.

## Tasks

- [ ] Explore the complete surface of possible schema migration operations that Fivetran can emit
- [ ] Match these operations against CrateDB's capabilities
- [ ] Identify gaps between Fivetran operations and CrateDB support
- [ ] Document which operations are supported vs. unsupported
- [ ] Document how unsupported operations are currently handled (or should be handled)
- [ ] Create user-facing documentation to clarify supported operations

## References

- PR #107: https://github.com/crate/cratedb-fivetran-destination/pull/107
- Fivetran Partner SDK: https://github.com/fivetran/fivetran_partner_sdk
- Related: GH-86

## Requested by

@amotl in https://github.com/crate/cratedb-fivetran-destination/pull/107" \
  --assignee amotl

Length of output: 64


@amotl: I've created issue #119 to track this important work. The issue outlines the need to explore the full surface of Fivetran schema migration operations, match them against CrateDB's capabilities, identify any gaps, and document the support matrix for users.

Your approach of incrementally implementing features to satisfy the SDK tester while deferring comprehensive analysis makes sense for early development. This issue will ensure that the QA and documentation work gets proper attention in future iterations.

amotl added 5 commits December 4, 2025 20:27
This satisfies SDK tester's `schema_migrations_input_dml.json`.
The tables `transaction_drop` and `transaction_new` no longer exist
at the end of the workflow: `transaction_drop` has been dropped, and
`transaction_new` was renamed to `transaction_renamed`.

Inquiring them with `DescribeTable` fails with:
  Partner Sdk Warning. Message: Table not found: transaction_{drop,new}
The `ALTER TABLE ... ADD COLUMN ... DEFAULT` statement of CrateaDB
does not respect the right semantics of the DEFAULT clause yet.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
tests/test_integration.py (1)

141-152: Tidy up unused fixture argument and captured out to satisfy Ruff.

The test logic looks fine, but Ruff flags the unused services argument and out variable. You can keep the behavior and silence the warnings by marking them as intentionally unused:

-@pytest.mark.parametrize("services", ["./tests/data/fivetran_migrations_dml"], indirect=True)
-def test_integration_fivetran_migrations_dml(capfd, services):
+@pytest.mark.parametrize("services", ["./tests/data/fivetran_migrations_dml"], indirect=True)
+def test_integration_fivetran_migrations_dml(capfd, _services):
@@
-    # Read out stdout and stderr.
-    out, err = capfd.readouterr()
+    # Read out stdout and stderr.
+    _, err = capfd.readouterr()
src/cratedb_fivetran_destination/main.py (3)

32-39: Annotate table_map as a ClassVar to satisfy Ruff and clarify intent.

table_map is a mutable class-level dictionary shared across instances. Ruff (RUF012) prefers these to be annotated as class variables. You can keep the behavior and make the typing intent explicit:

-class CrateDBDestinationImpl(destination_sdk_pb2_grpc.DestinationConnectorServicer):
-    table_map: t.Dict[str, FivetranTable] = {}
+class CrateDBDestinationImpl(destination_sdk_pb2_grpc.DestinationConnectorServicer):
+    table_map: t.ClassVar[t.Dict[str, FivetranTable]] = {}

This also makes it clearer to readers and type checkers that the map is global to the servicer class, not per-instance state.


269-321: Migrate wiring is aligned with the blueprint and minimal-goal approach.

The Migrate implementation:

  • Configures the database from request.configuration.
  • Extracts schema/table via request.details.
  • Logs the chosen operation and delegates to the corresponding SchemaMigrationHelper.handle_* method.
  • Returns an unsupported=True response for unknown or missing operations.

That matches the intended “minimal implementation + table_map simulation” approach and should be sufficient to satisfy the current SDK tester DML scenarios.

If you ever want to trim some overhead later, you could lazily call DescribeTable only for operations that actually need table_obj (e.g. copy/add) to avoid an extra reflection round-trip on simple drop/rename operations, but that feels optional at current scale.


374-400: Consider reusing schema_name() in table_fullname() for consistency.

You introduced schema_name(request) to handle both regular and migration-style requests, but table_fullname() still hardcodes request.schema_name. To keep schema resolution consistent across helpers and future-proof against additional request shapes, you could delegate:

     def table_fullname(self, request):
         """
         Return full-qualified table name from request object.
         """
-        table_name = self.table_name(request)
-        return f'"{request.schema_name}"."{table_name}"'
+        table_name = self.table_name(request)
+        schema_name = self.schema_name(request)
+        return f'"{schema_name}"."{table_name}"'

Today this changes behavior only if you ever call table_fullname with a request that uses details.schema rather than schema_name, but it keeps everything in sync with your new helper.

src/cratedb_fivetran_destination/schema_migration_helper.py (1)

155-199: CrateDB-specific ADD COLUMN with default workaround looks correct; consider safer quoting for future.

The add_column_with_default_value branch correctly implements the documented CrateDB workaround by:

  • Adding the new column without a DEFAULT, then
  • Optionally backfilling via UPDATE when default_value is not None.

This matches the guidance around avoiding ALTER TABLE ... ADD COLUMN ... DEFAULT and instead pairing ADD COLUMN with an UPDATE for existing rows, while still updating the in-memory table metadata.

One thing to keep in mind for later hardening: the backfill statement embeds default_value directly into the SQL string:

sql_bag.add(
    f'UPDATE "{schema}"."{table}" SET "{new_col.name}" = \'{default_value}\';'
)

If a future migration ever supplies a string default containing quotes or other special characters, this could break the SQL or require careful escaping. You already use a bound-parameter pattern in handle_update_column_value; reusing a similar approach here (even if that means running the DDL and UPDATE separately instead of via a single SqlBag) would make this more robust when you move beyond the minimal SDK-tester scope.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e4b18d2 and b7530ee.

📒 Files selected for processing (6)
  • CHANGES.md (1 hunks)
  • src/cratedb_fivetran_destination/main.py (4 hunks)
  • src/cratedb_fivetran_destination/schema_migration_helper.py (1 hunks)
  • tests/data/fivetran_migrations_dml/configuration.json (1 hunks)
  • tests/data/fivetran_migrations_dml/schema_migrations_input_dml.json (1 hunks)
  • tests/test_integration.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • tests/data/fivetran_migrations_dml/schema_migrations_input_dml.json
  • tests/data/fivetran_migrations_dml/configuration.json
  • CHANGES.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-29T21:01:36.016Z
Learnt from: amotl
Repo: crate/cratedb-fivetran-destination PR: 107
File: src/cratedb_fivetran_destination/schema_migration_helper.py:1-2
Timestamp: 2025-11-29T21:01:36.016Z
Learning: In the crate/cratedb-fivetran-destination repository, there is no ruff.toml file. Ruff configuration is in pyproject.toml, and E501 (line-too-long) is NOT disabled project-wide; it is actually enabled as part of the Pycodestyle rules.

Applied to files:

  • src/cratedb_fivetran_destination/schema_migration_helper.py
🧬 Code graph analysis (1)
src/cratedb_fivetran_destination/schema_migration_helper.py (2)
src/cratedb_fivetran_destination/model.py (4)
  • SqlBag (182-198)
  • TypeMap (57-132)
  • execute (196-198)
  • add (192-194)
src/cratedb_fivetran_destination/util.py (1)
  • log_message (10-11)
🪛 Ruff (0.14.7)
src/cratedb_fivetran_destination/schema_migration_helper.py

1-1: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)


185-185: Unused noqa directive (non-enabled: ERA001)

Remove unused noqa directive

(RUF100)

tests/test_integration.py

142-142: Unused function argument: services

(ARG001)


148-148: Unpacked variable out is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

src/cratedb_fivetran_destination/main.py

33-33: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

🔇 Additional comments (2)
src/cratedb_fivetran_destination/main.py (1)

242-268: DescribeTable’s schema handling for migration requests looks correct.

Using schema_name = self.schema_name(request) and table_name = self.table_name(request) allows DescribeTable to work uniformly for both “legacy” requests (with schema_name/table) and Migrate requests that only populate details.schema / details.table. This is consistent with how Migrate later calls DescribeTable(request, context) directly.

No changes needed here from my side.

src/cratedb_fivetran_destination/schema_migration_helper.py (1)

307-366: TableMetadataHelper utilities are a reasonable way to simulate in-memory table changes.

The helper methods (create_table_copy, remove_column_from_table, remove_history_mode_columns, add_history_mode_columns, add_soft_delete_column) use protobuf cloning and simple list operations on columns, which is appropriate for maintaining the simulated table_map state the blueprint relies on.

Given the current goal is to minimally simulate history/soft-delete behavior for the SDK tests, this abstraction looks fine and keeps the migration handlers decently readable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fivetran-Tester: schema_migrations_input_dml.json fails with "Partner Sdk Warning. Message: Table not found: transaction_drop"

3 participants