[RAPTOR-13895] Implement inline predictor based on DRUM score #1504

klichukb · 2025-06-06T16:52:06Z

Rationale

Agentic workflow requires running DRUM with the target custom model code in-place in a codespace session in order to test the model inline end to end.
Currently this is done using "drum server", and communicating over network. While it works, its quite a heavy and slow test, which involves processes, threads and working with stdin/stdout piping to get feedback.

The idea is to use the exiting "drum score" functionality which uses an alternative predictor, that does not spin up a web server, exposing the predictor itself to work with directly.

Provided:
input.json

{
  "model": "gpt-4o-mini",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user",   "content": "Tell me a joke about penguins." }
  ],
  "temperature": 0.7,
  "top_p": 1.0,
  "max_tokens": 256,
  "n": 1,
  "stream": false,
  "stop": null
}

And the following example

import json

payload = json.loads(open("input.json", "r").read())
code_dir = (
    '/datarobot-user-models/model_templates/python3_dummy_chat'
)

with drum_inline_predictor(target_type=TargetType.AGENTIC_WORKFLOW.value, custom_model_dir=code_dir,
                           target_name='response') as predictor:
    result = predictor.chat(payload)
    print(result)

yakov-g

Very nice.
Maybe implement that inline.py as a test

nullspoon · 2025-06-12T23:47:26Z

custom_model_runner/datarobot_drum/drum/main.py

+        try:
+            setup_required_environment_variables(options)
+        except Exception as exc:
+            print(str(exc))


Might be a python thing I don't know about (and this probably doesn't really matter that much), but should we send this to stderr?

Oh nevermind. This just moves the code up.

Yeah I just lift-and-shifted it, did not want to alter but it should use logger.exception() instead of this.

mjnitz02 · 2025-06-17T19:56:53Z

@klichukb So I ran a final test on this now after fixing around agents to support this or drum server based on a flag. It worked great for all agents across all testing vectors. Let me know if I can help add stuff to get this PR finished. Thanks!

engprod-2 · 2025-06-20T21:45:53Z

The Needs Review labels were added based on the following file changes.

Team @datarobot/core-modeling (#predictive-ai) was assigned because of changes in files:

custom_model_runner/datarobot_drum/drum/common.py
custom_model_runner/datarobot_drum/drum/drum.py
custom_model_runner/datarobot_drum/drum/main.py
custom_model_runner/datarobot_drum/drum/root_predictors/drum_inline_utils.py
custom_model_runner/datarobot_drum/drum/root_predictors/generic_predictor.py
tests/functional/test_drum_inline_utils.py

Team @datarobot/genai-systems (#genai-systems) was assigned because of changes in files:

custom_model_runner/datarobot_drum/drum/common.py
custom_model_runner/datarobot_drum/drum/drum.py
custom_model_runner/datarobot_drum/drum/main.py
custom_model_runner/datarobot_drum/drum/root_predictors/drum_inline_utils.py
custom_model_runner/datarobot_drum/drum/root_predictors/generic_predictor.py
requirements_test.txt
tests/fixtures/python3_dummy_chat/README.md
tests/fixtures/python3_dummy_chat/custom.py
tests/fixtures/python3_dummy_chat/moderation_config.yaml
tests/functional/test_drum_inline_utils.py

If you think that there are some issues with ownership, please discuss with C&A domain at #sdtk slack channel and create PR to update DRCODEOWNERS\CODEOWNERS file.

yakov-g · 2025-06-20T22:12:54Z

Looks great!

klichukb · 2025-06-22T12:55:24Z

@yakov-g @mjnitz02 I tried an e2e test with moderations to make sure that the inline runner executes the moderations pipeline. Its crazy because it triples the duration of tests even with bumping 8G to 16G for the functional test suite. Installing moderations is adding more than 30 minutes, I'm not sure why this is so bad.

I'm skipping the test to get things in for the release, but I think we have to figure out how to get the test in.

klichukb · 2025-06-22T21:44:54Z

@datarobot/core-modeling folks, I'd need to get in for the branch cut for the agentic work, feel free to provide your review, we can address it.

klichukb force-pushed the yolo/really-yolo branch from a0ae3e4 to ccc9397 Compare June 12, 2025 23:10

yakov-g approved these changes Jun 12, 2025

View reviewed changes

nullspoon approved these changes Jun 12, 2025

View reviewed changes

klichukb changed the title ~~Demo~~ [YOLO] Implement inline predictor based on DRUM score Jun 20, 2025

devexp-slackbot bot added the Close Jira on Merge label Jun 20, 2025

devexp-slackbot bot changed the title ~~[YOLO] Implement inline predictor based on DRUM score~~ [RAPTOR-13895] Implement inline predictor based on DRUM score Jun 20, 2025

klichukb marked this pull request as ready for review June 20, 2025 21:45

klichukb added the 00 - Ready for Review label Jun 20, 2025

engprod-2 bot added Needs Review: Core Modeling Needs Review: GenAI Systems labels Jun 20, 2025

klichukb removed the Needs Review: GenAI Systems label Jun 20, 2025

klichukb added 13 commits June 21, 2025 00:52

Demo

7da3e60

.

39216ad

.

3c39d27

.

1c456e2

.

2433c8b

.

09ad7cc

.

2a08116

.

ed81438

.

df53171

.

6764df1

.

7891edf

.

9ab8ee0

.

4a3f0d6

klichukb force-pushed the yolo/really-yolo branch from fc8d423 to 4a3f0d6 Compare June 20, 2025 21:52

klichukb added 2 commits June 21, 2025 01:20

Add moderations as dep to tests

4f11b26

type annotation

b111417

klichukb added 2 commits June 22, 2025 14:43

Move moderations to a separate file and bump resources for harness

f11d4e3

No moderation test for now, its triples the duration of the test suite

86e9147

klichukb added 2 commits June 22, 2025 15:56

Proper skip

b264370

.

91d260f

klichukb merged commit 121a0c3 into master Jun 22, 2025
30 checks passed

svc-engprod-git1 removed the Needs Review: Core Modeling label Jun 22, 2025

svc-engprod-git1 deleted the yolo/really-yolo branch June 22, 2025 21:45

svc-engprod-git1 removed 00 - Ready for Review Close Jira on Merge labels Jun 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RAPTOR-13895] Implement inline predictor based on DRUM score #1504

[RAPTOR-13895] Implement inline predictor based on DRUM score #1504

Uh oh!

klichukb commented Jun 6, 2025 •

edited

Loading

Uh oh!

yakov-g left a comment

Uh oh!

nullspoon Jun 12, 2025

Uh oh!

nullspoon Jun 12, 2025

Uh oh!

klichukb Jun 20, 2025

Uh oh!

mjnitz02 commented Jun 17, 2025

Uh oh!

engprod-2 bot commented Jun 20, 2025

Uh oh!

yakov-g commented Jun 20, 2025

Uh oh!

klichukb commented Jun 22, 2025

Uh oh!

klichukb commented Jun 22, 2025

Uh oh!

Uh oh!

Uh oh!

[RAPTOR-13895] Implement inline predictor based on DRUM score #1504

[RAPTOR-13895] Implement inline predictor based on DRUM score #1504

Uh oh!

Conversation

klichukb commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale

Uh oh!

yakov-g left a comment

Choose a reason for hiding this comment

Uh oh!

nullspoon Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

nullspoon Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

klichukb Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

mjnitz02 commented Jun 17, 2025

Uh oh!

engprod-2 bot commented Jun 20, 2025

Uh oh!

yakov-g commented Jun 20, 2025

Uh oh!

klichukb commented Jun 22, 2025

Uh oh!

klichukb commented Jun 22, 2025

Uh oh!

Uh oh!

Uh oh!

klichukb commented Jun 6, 2025 •

edited

Loading