Skip to content

Conversation

rahil-c
Copy link
Contributor

@rahil-c rahil-c commented Sep 21, 2025

Describe the issue this Pull Request addresses

Summary and Changelog

Impact

Risk Level

Documentation Update

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions bot added the size:M PR with lines of changes in (100, 300] label Sep 21, 2025
@yihua yihua added this to the release-1.1.0 milestone Sep 21, 2025
@rahil-c rahil-c marked this pull request as ready for review September 22, 2025 01:47
@github-actions github-actions bot added size:L PR with lines of changes in (300, 1000] and removed size:M PR with lines of changes in (100, 300] labels Sep 22, 2025
@rahil-c
Copy link
Contributor Author

rahil-c commented Sep 22, 2025

@linliu-code can you take a look?

"Target should have all records from source");

// Phase 3: Upgrade target table from v6 to v9
HoodieDeltaStreamer.Config upgradeTargetConfig = TestHelpers.makeConfigForHudiIncrSrc(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to consider upgrade to v8, having some commit, and then upgrade to v9?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think am thinking to maybe just parametrize this test at least for the targetTable version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is end to end functional test which will add up to test run time.
So, I was trying to be cautious.
most folks will likely either be in V6 or V9. so, we can leave it as is for now.

String finalCheckpoint = finalCommitMetadata.get().getExtraMetadata().get(STREAMER_CHECKPOINT_KEY);

// Checkpoint should have advanced from the pre-upgrade checkpoint
assertNotEquals(checkpointBeforeUpgrade, finalCheckpoint,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check if finalCheckpoint should be larger than checkpointBeforeUpgrade?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me add that

@nsivabalan nsivabalan self-assigned this Sep 23, 2025
@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

.load(targetTablePath)
.count();

assertEquals(sourceRecordCountOriginal, targetRecordCountOriginal,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we are doing bulk_insert, should be easy to do entire data equality right?
just drop meta fields from both table and compare the dataframes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok will do a df equality comparison to ensure data is same.

*/
@ParameterizedTest
@EnumSource(value = HoodieTableVersion.class, names = {"EIGHT", "NINE"})
public void testIncrementalSourceWithSourceTableUpgrade(HoodieTableVersion targetUpgradeVersion) throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets rename the arg to sourceTableFinalTableVersion
its bit confusing as of now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do so

targetMetaClient.reloadActiveTimeline();
assertEquals(HoodieTableVersion.SIX, targetMetaClient.getTableConfig().getTableVersion());

// Verify record counts match between source and target
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the test is very monolith or rather fatter.
can we de-compose this into multiple smaller methods and re-use.
for eg, ingestion 100 records across 3 commits should be a private method.

  • comparing num commits in timeline
  • Comparing versions for a given table
  • comparing data equality across source and target tables.
    all these should be moved to private method and should be re-used

.load(targetTablePath)
.count();

assertEquals(sourceRecordCount, targetRecordCount,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data equality checks please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will make this a util and call it in several places

.load(targetTablePath)
.count();

assertEquals(sourceRecordCount, targetRecordCount,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets validate full data
you can cache the source data if need be to avoid triggering reads to hudi table everytime.

targetMetaClient, finalTargetInstant);
assertTrue(finalCommitMetadata.isPresent());

// The first time after upgrading, target checkpoint read is still in v1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, is this expected to be in V1 or V2?
and how are we validating the checkpoint format?

assertEquals(3, targetMetaClient.getActiveTimeline().getCommitsTimeline().countInstants(),
"Target should have 3 commits (as it batches source table commits into target table commits) + upgrade commit");

// Clean up
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move clean up to finally block

/**
* Test incremental source functionality when source table is upgraded from v6 to v8/v9
* while target table remains at v6. This validates backward compatibility for cross-version
* incremental sync scenarios.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we extend the test to also upgrade target table to V9 as last step. and add few more commits to source table and do one round of validation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok will try making that change.

UtilitiesTestBase.Helpers.deleteFileFromDfs(fs, targetTablePath);
}

@ParameterizedTest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we extend the test to also upgrade source table to V9 as last step. and add few more commits to source table and do one round of validation.

@yihua yihua removed this from the release-1.1.0 milestone Oct 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L PR with lines of changes in (300, 1000]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants