-
Notifications
You must be signed in to change notification settings - Fork 87
[RAPTOR-13895] Implement inline predictor based on DRUM score #1504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a0ae3e4
to
ccc9397
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice.
Maybe implement that inline.py
as a test
try: | ||
setup_required_environment_variables(options) | ||
except Exception as exc: | ||
print(str(exc)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be a python thing I don't know about (and this probably doesn't really matter that much), but should we send this to stderr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh nevermind. This just moves the code up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I just lift-and-shifted it, did not want to alter but it should use logger.exception() instead of this.
@klichukb So I ran a final test on this now after fixing around agents to support this or drum server based on a flag. It worked great for all agents across all testing vectors. Let me know if I can help add stuff to get this PR finished. Thanks! |
The Needs Review labels were added based on the following file changes. Team @datarobot/core-modeling (#predictive-ai) was assigned because of changes in files:custom_model_runner/datarobot_drum/drum/common.py custom_model_runner/datarobot_drum/drum/drum.py custom_model_runner/datarobot_drum/drum/main.py custom_model_runner/datarobot_drum/drum/root_predictors/drum_inline_utils.py custom_model_runner/datarobot_drum/drum/root_predictors/generic_predictor.py tests/functional/test_drum_inline_utils.py Team @datarobot/genai-systems (#genai-systems) was assigned because of changes in files:custom_model_runner/datarobot_drum/drum/common.py custom_model_runner/datarobot_drum/drum/drum.py custom_model_runner/datarobot_drum/drum/main.py custom_model_runner/datarobot_drum/drum/root_predictors/drum_inline_utils.py custom_model_runner/datarobot_drum/drum/root_predictors/generic_predictor.py requirements_test.txt tests/fixtures/python3_dummy_chat/README.md tests/fixtures/python3_dummy_chat/custom.py tests/fixtures/python3_dummy_chat/moderation_config.yaml tests/functional/test_drum_inline_utils.py If you think that there are some issues with ownership, please discuss with C&A domain at #sdtk slack channel and create PR to update DRCODEOWNERS\CODEOWNERS file. |
fc8d423
to
4a3f0d6
Compare
Looks great! |
@yakov-g @mjnitz02 I tried an e2e test with moderations to make sure that the inline runner executes the moderations pipeline. Its crazy because it triples the duration of tests even with bumping 8G to 16G for the functional test suite. Installing moderations is adding more than 30 minutes, I'm not sure why this is so bad. I'm skipping the test to get things in for the release, but I think we have to figure out how to get the test in. |
@datarobot/core-modeling folks, I'd need to get in for the branch cut for the agentic work, feel free to provide your review, we can address it. |
Rationale
Agentic workflow requires running DRUM with the target custom model code in-place in a codespace session in order to test the model inline end to end.
Currently this is done using "drum server", and communicating over network. While it works, its quite a heavy and slow test, which involves processes, threads and working with stdin/stdout piping to get feedback.
The idea is to use the exiting "drum score" functionality which uses an alternative predictor, that does not spin up a web server, exposing the predictor itself to work with directly.
Provided:
input.json
And the following example