Skip to content

Conversation

kennethmhc
Copy link
Contributor

This PR adds/fixes/changes...

  • please summarize your changes to the code
  • and make sure to include all changes to user-facing APIs

JIRA Issue: -

Priority for Review: -

Related PRs: -

How Has This Been Tested?

  • Unit Tests
  • Integration Tests
  • Manual Tests on VM

Checklist For The Assigned Reviewer:

- [ ] Checked if merge conflicts with master exist
- [ ] Checked if stylechecks for Java and Python pass
- [ ] Checked if all docstrings were added and/or updated appropriately
- [ ] Ran spellcheck on docstring
- [ ] Checked if guides & concepts need to be updated
- [ ] Checked if naming conventions for parameters and variables were followed
- [ ] Checked if private methods are properly declared and used
- [ ] Checked if hard-to-understand areas of code are commented
- [ ] Checked if tests are effective
- [ ] Built and deployed changes on dev VM and tested manually
- [x] (Checked if all type annotations were added and/or updated appropriately)

Copy link
Contributor

@bubriks bubriks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some feedback

self._stream = True
if time_travel_format is None:
if engine.get_type() == "python":
if not online_enabled and not self._stream:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if not online_enabled and not self._stream:
if not self._stream:

Does having it online enabled change anything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is for now we don't have a way to push data from offline to online. And delta-rs only write to offline. And according to Jim, the main use case for using delta is primarily fast write/read from offline only.

@@ -2478,8 +2477,15 @@ def __init__(
else:
# initialized by user
# for python engine we always use stream feature group
if engine.get_type() == "python":
self._stream = True
if time_travel_format is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if time_travel_format is HUDI wont it fail? since self._stream = True would not be set.

Comment on lines +288 to +290
if isinstance(engine.get_instance(), engine.spark.Engine):
spark_session = engine.get_instance()._spark_session
spark_context = engine.get_instance()._spark_context
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this work if engine is python?

for delta_vacuum you did:

if isinstance(engine.get_instance(), engine.spark.Engine):
                spark_session = engine.get_instance()._spark_session
                spark_context = engine.get_instance()._spark_context
            else:
                spark_session = None
                spark_context = None

@@ -1487,6 +1488,63 @@ def test_save_dataframe_stream(self, mocker):
assert mock_python_engine_write_dataframe_kafka.call_count == 1
assert mock_python_engine_legacy_save_dataframe.call_count == 0

def test_save_dataframe_delta_time_travel_format(self, mocker):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will need workflow tests + havent checked, but i think we should have a test to make sure that if no time travel format or stream is not specified we should pick hudi and have it stream enabled by default

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea we have the workflow test test_feature_pipeline.py
https://github.com/logicalclocks/loadtest/pull/609

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants