-
-
Notifications
You must be signed in to change notification settings - Fork 22
More general python codegen #1053
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
I don't understand how this works? How does the sidecar know to pick up the user specified file? |
@BenGalewsky The sidecar tells the codegen shim what file name to produce. This PR introduces a route where user code can just directly produce this file, instead of returning an awkward array which the codegen shim then writes to that file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The if statements here are getting quite complex. I don't think we have any tests for this. Is this getting to the point we need those?
output = generated_transformer.run_query(file_path) | ||
|
||
ttime = time.time() | ||
if output_format == 'root-file': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about parquet, or all the other output formats that are allowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this is why we should really have a "raw" option for the output format. The most obvious need for this code path is running RDataFrame
transformations, where the output is in fact going to be a ROOT file.
Conceptually I guess the big question is whether we want to try to share the Python transformer code between the uproot science image (where a "translate awkward output to root/parquet" step is natural) and the C++ ROOT image (where in the end the user writing the Python code is responsible for the output ... unless we ask them to return an RDF object for snapshotting, or something - note this isn't an entirely implausible route, it's possible to do RDF -> Awkward -> Parquet).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that is a good point. In my mind I was thinking this was common code.
We have a capabilities matrix, right - so the user can't request an unsupported format? Or if you request parquet from somethign that doesn't support it you get back root? I can't remember how we handle that now (in the old days, we'd just make best effort to pay attention to that request).
Allow the function signature
in addition to
so Python transformers can create arbitrary ROOT file output.