Skip to content

More general python codegen #1053

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from
Open

Conversation

ponyisi
Copy link
Collaborator

@ponyisi ponyisi commented May 8, 2025

Allow the function signature

run_query(file_path, output_path: Path)

in addition to

run_query(file_path)

so Python transformers can create arbitrary ROOT file output.

@BenGalewsky
Copy link
Contributor

I don't understand how this works? How does the sidecar know to pick up the user specified file?

@ponyisi
Copy link
Collaborator Author

ponyisi commented May 9, 2025

@BenGalewsky The sidecar tells the codegen shim what file name to produce. This PR introduces a route where user code can just directly produce this file, instead of returning an awkward array which the codegen shim then writes to that file.

Copy link
Collaborator

@gordonwatts gordonwatts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if statements here are getting quite complex. I don't think we have any tests for this. Is this getting to the point we need those?

output = generated_transformer.run_query(file_path)

ttime = time.time()
if output_format == 'root-file':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about parquet, or all the other output formats that are allowed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is why we should really have a "raw" option for the output format. The most obvious need for this code path is running RDataFrame transformations, where the output is in fact going to be a ROOT file.

Conceptually I guess the big question is whether we want to try to share the Python transformer code between the uproot science image (where a "translate awkward output to root/parquet" step is natural) and the C++ ROOT image (where in the end the user writing the Python code is responsible for the output ... unless we ask them to return an RDF object for snapshotting, or something - note this isn't an entirely implausible route, it's possible to do RDF -> Awkward -> Parquet).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that is a good point. In my mind I was thinking this was common code.

We have a capabilities matrix, right - so the user can't request an unsupported format? Or if you request parquet from somethign that doesn't support it you get back root? I can't remember how we handle that now (in the old days, we'd just make best effort to pay attention to that request).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants