-
Notifications
You must be signed in to change notification settings - Fork 3
feat: more flexible helper for batch reduction #155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
edcf4c2
7dbbf2c
25add6b
eda422e
98a9389
743743a
45bcd0e
7fb806b
f097188
4f9cbd2
7c1750d
1926849
fc4214a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,10 +8,15 @@ | |
import sciline | ||
import scipp as sc | ||
import scipy.optimize as opt | ||
from orsopy.fileio.orso import OrsoDataset | ||
|
||
from ess.reflectometry import orso | ||
from ess.reflectometry.types import ReflectivityOverQ | ||
from ess.reflectometry.types import ( | ||
Filename, | ||
ReducibleData, | ||
ReflectivityOverQ, | ||
SampleRun, | ||
) | ||
from ess.reflectometry.workflow import with_filenames | ||
|
||
_STD_TO_FWHM = sc.scalar(2.0) * sc.sqrt(sc.scalar(2.0) * sc.log(sc.scalar(2.0))) | ||
|
||
|
@@ -279,18 +284,18 @@ def combine_curves( | |
) | ||
|
||
|
||
def orso_datasets_from_measurements( | ||
def from_measurements( | ||
workflow: sciline.Pipeline, | ||
runs: Sequence[Mapping[type, Any]], | ||
runs: Sequence[Mapping[type, Any]] | Mapping[Any, Mapping[type, Any]], | ||
target: type | Sequence[type] = orso.OrsoIofQDataset, | ||
*, | ||
scale_to_overlap: bool = True, | ||
) -> list[OrsoDataset]: | ||
'''Produces a list of ORSO datasets containing one | ||
reflectivity curve for each of the provided runs. | ||
scale_to_overlap: bool | tuple[sc.Variable, sc.Variable] = False, | ||
) -> list | Mapping: | ||
'''Produces a list of datasets containing for each of the provided runs. | ||
Each entry of :code:`runs` is a mapping of parameters and | ||
values needed to produce the dataset. | ||
|
||
Optionally, the reflectivity curves can be scaled to overlap in | ||
Optionally, the results can be scaled so that the reflectivity curves overlap in | ||
the regions where they have the same Q-value. | ||
|
||
Parameters | ||
|
@@ -299,38 +304,81 @@ def orso_datasets_from_measurements( | |
The sciline workflow used to compute `ReflectivityOverQ` for each of the runs. | ||
|
||
runs: | ||
The sciline parameters to be used for each run | ||
The sciline parameters to be used for each run. | ||
|
||
target: | ||
The domain type(s) to compute for each run. | ||
|
||
scale_to_overlap: | ||
If True the curves will be scaled to overlap. | ||
Note that the curve of the first run is unscaled and | ||
the rest are scaled to match it. | ||
If ``True`` the curves will be scaled to overlap. | ||
If a tuple then this argument will be passed as the ``critical_edge_interval`` | ||
argument to the ``scale_reflectivity_curves_to_overlap`` function. | ||
|
||
Returns | ||
--------- | ||
list of the computed ORSO datasets, containing one reflectivity curve each | ||
''' | ||
reflectivity_curves = [] | ||
for parameters in runs: | ||
names = runs.keys() if hasattr(runs, 'keys') else None | ||
runs = runs.values() if hasattr(runs, 'values') else runs | ||
|
||
def init_workflow(workflow, parameters): | ||
wf = workflow.copy() | ||
for name, value in parameters.items(): | ||
wf[name] = value | ||
reflectivity_curves.append(wf.compute(ReflectivityOverQ)) | ||
|
||
scale_factors = ( | ||
scale_reflectivity_curves_to_overlap([r.hist() for r in reflectivity_curves])[1] | ||
if scale_to_overlap | ||
else (1,) * len(runs) | ||
) | ||
for tp, value in parameters.items(): | ||
if tp is Filename[SampleRun]: | ||
continue | ||
wf[tp] = value | ||
|
||
if Filename[SampleRun] in parameters: | ||
if isinstance(parameters[Filename[SampleRun]], list | tuple): | ||
wf = with_filenames( | ||
wf, | ||
SampleRun, | ||
parameters[Filename[SampleRun]], | ||
) | ||
else: | ||
wf[Filename[SampleRun]] = parameters[Filename[SampleRun]] | ||
return wf | ||
|
||
if scale_to_overlap: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As discussed during the standup, I don't really get why we need the helper function, as opposed to just having different workflows.
You can then view the graph of what is being done for everything (because with the helper function you can't see it all, only pieces, and they are not easy to get to either if you just installed the package?) In any case, we should avoid having a long discussion here on github, this was just to have a starting point for an in-person discussion. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's possible we can implement the same thing using sciline map reduce. Let us see the helper in this PR as a reference implementation and try to re-implement as one big workflow. |
||
results = [] | ||
for parameters in runs: | ||
wf = init_workflow(workflow, parameters) | ||
results.append(wf.compute((ReflectivityOverQ, ReducibleData[SampleRun]))) | ||
|
||
scale_factors = scale_reflectivity_curves_to_overlap( | ||
[r[ReflectivityOverQ].hist() for r in results], | ||
critical_edge_interval=scale_to_overlap | ||
if isinstance(scale_to_overlap, list | tuple) | ||
else None, | ||
)[1] | ||
|
||
datasets = [] | ||
for parameters, curve, scale_factor in zip( | ||
runs, reflectivity_curves, scale_factors, strict=True | ||
): | ||
wf = workflow.copy() | ||
for name, value in parameters.items(): | ||
wf[name] = value | ||
wf[ReflectivityOverQ] = scale_factor * curve | ||
dataset = wf.compute(orso.OrsoIofQDataset) | ||
for i, parameters in enumerate(runs): | ||
wf = init_workflow(workflow, parameters) | ||
if scale_to_overlap: | ||
# Optimization in case we can avoid re-doing some work: | ||
# Check if any of the targets need ReducibleData if | ||
# ReflectivityOverQ already exists. | ||
# If they don't, we can avoid recomputing ReducibleData. | ||
targets = target if hasattr(target, '__len__') else (target,) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can there ever be a danger that someone passed a string as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess, but like you said, passing a string is illegal anyway and will lead to error at some point. |
||
if any( | ||
_workflow_needs_quantity_A_even_if_quantity_B_is_set( | ||
wf[t], ReducibleData[SampleRun], ReflectivityOverQ | ||
) | ||
for t in targets | ||
): | ||
wf[ReducibleData[SampleRun]] = ( | ||
scale_factors[i] * results[i][ReducibleData[SampleRun]] | ||
) | ||
else: | ||
wf[ReflectivityOverQ] = scale_factors[i] * results[i][ReflectivityOverQ] | ||
|
||
dataset = wf.compute(target) | ||
datasets.append(dataset) | ||
return datasets | ||
return datasets if names is None else dict(zip(names, datasets, strict=True)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So here is my proposal. Step 1: add scaling param to workflowAdd an extra parameter in the base workflow called Step 2: create
|
||
|
||
|
||
def _workflow_needs_quantity_A_even_if_quantity_B_is_set(workflow, A, B): | ||
wf = workflow.copy() | ||
wf[B] = 'Not important' | ||
return A in wf.underlying_graph |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
# SPDX-License-Identifier: BSD-3-Clause | ||
# Copyright (c) 2024 Scipp contributors (https://github.com/scipp) | ||
import pytest | ||
import sciline | ||
import scipp as sc | ||
from scipp.testing import assert_allclose | ||
|
||
from amor.pipeline_test import amor_pipeline # noqa: F401 | ||
from ess.amor import data | ||
from ess.amor.types import ChopperPhase | ||
from ess.reflectometry.tools import from_measurements | ||
from ess.reflectometry.types import ( | ||
DetectorRotation, | ||
Filename, | ||
QBins, | ||
ReducedReference, | ||
ReferenceRun, | ||
ReflectivityOverQ, | ||
SampleRotation, | ||
SampleRun, | ||
) | ||
|
||
# The files used in the AMOR reduction workflow have some scippnexus warnings | ||
pytestmark = pytest.mark.filterwarnings( | ||
"ignore:.*Invalid transformation, .*missing attribute 'vector':UserWarning", | ||
) | ||
|
||
|
||
@pytest.fixture | ||
def pipeline_with_1632_reference(amor_pipeline): # noqa: F811 | ||
amor_pipeline[ChopperPhase[ReferenceRun]] = sc.scalar(7.5, unit='deg') | ||
amor_pipeline[ChopperPhase[SampleRun]] = sc.scalar(7.5, unit='deg') | ||
amor_pipeline[Filename[ReferenceRun]] = data.amor_run('1632') | ||
amor_pipeline[ReducedReference] = amor_pipeline.compute(ReducedReference) | ||
return amor_pipeline | ||
|
||
|
||
@pytestmark | ||
def test_from_measurements_tool_concatenates_event_lists( | ||
pipeline_with_1632_reference: sciline.Pipeline, | ||
): | ||
pl = pipeline_with_1632_reference | ||
|
||
run = { | ||
Filename[SampleRun]: list(map(data.amor_run, (1636, 1639, 1641))), | ||
QBins: sc.geomspace( | ||
dim='Q', start=0.062, stop=0.18, num=391, unit='1/angstrom' | ||
), | ||
DetectorRotation[SampleRun]: sc.scalar(0.140167, unit='rad'), | ||
SampleRotation[SampleRun]: sc.scalar(0.0680678, unit='rad'), | ||
} | ||
results = from_measurements( | ||
pl, | ||
[run], | ||
target=ReflectivityOverQ, | ||
scale_to_overlap=False, | ||
) | ||
|
||
results2 = [] | ||
for fname in run[Filename[SampleRun]]: | ||
pl.copy() | ||
pl[Filename[SampleRun]] = fname | ||
pl[QBins] = run[QBins] | ||
pl[DetectorRotation[SampleRun]] = run[DetectorRotation[SampleRun]] | ||
pl[SampleRotation[SampleRun]] = run[SampleRotation[SampleRun]] | ||
results2.append(pl.compute(ReflectivityOverQ).hist().data) | ||
|
||
assert_allclose(sum(results2), results[0].hist().data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we think of a better name? I don't have great suggestions, I thought of something like
batch_reduction
, but it's not super descriptive either...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure either. I would like to avoid
reduction
in the name because I feel the term is already overloaded in our context, and strictly speaking you don't have to request a "reduced" data set here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked copilot, and the suggestion was
compute_reflectivity_datasets
.I guess it's because of the last thing in the docstring that says
which I guess should be changed if you can request other data than reflectivity curves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's true, I'll change it.
By the way, I asked Nico earlier today and he's thinking about a better name, so we are probably going to have a suggestion from him soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed some options and we came up with
compute_pipeline_targets
orget_workflow_targets
. They both seem reasonably descriptive to me but maybe the shorter one is betterThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to be annoying, but those names seem very similar to Sciline's
pipeline.compute
andpipeline.get
methods, so it is not very clear what this function is bringing on top of Sciline's methods.In addition, if this is a function that is supposed to compute any kind of results/targets, then the
scale_to_overlap
argument seems misplaced, as I'm assuming the scaling could only be applied to a particular type of results (the reflectivity curves).It feels like the scope of the helper is too wide.
Can we instead either:
or
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that the name is very similar to scilines
.get
and.compute
methods. It is a bit confusing. I'm ready to go with thebatch_reduction
name if that is what you prefer.It's true that the
scale_to_overlap
argument isn't relevant for all results/targets. But to say it's only relevant for reflectivity curves is incorrect. It is relevant for all results that display some sort of intensity from the measurements, normalized or not.Do you mean the data array containing the reflectivity curve, or the ORSO dataset containing the reflectivity curve?
Why do you think that is better to limit the set of targets it applies to? Do you think the current user interface is error prone? In that case, can you provide a concrete example?
I think the interface is quite unobtrusive, if the user only wants a reflectivity curve they will just only ever pass the
ReflectivityOverQ
target to the function, and don't even have to be aware that it can do anything else.Isn't this what we considered together and concluded that it was difficult to do with sciline currently? I'd be happy to consider this if you made a quick POC showing how it could be done and what the user interface would look like.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess what I meant was either have a function that only computes things where
scale_to_overlap
is relevant, or leave thescale_to_overlap
out, and have another function that applies it to the resulting data as a second step?Another thing I thought of was if it was possible to make the function return some sort of proxy object, which would resemble a dict but would have extra methods like
out.scale_to_overlap(*args)
which could be used to apply the scaling easily, still in a one-liner?I guess that's maybe sort of what I meant by
Can we make the function do most of the
init_workflow
work to prepare the pipelines, and then the object it returns could have additional methods like.scale_to_overlap()
or.compute(Target)
, which would allow the user to compute any target for a whole bunch of grouped or ungrouped measurements?So as opposed to
it wouldn't be mapping on a pure
sciline.Pipeline
, but would do what your function does, store an intermediate state in a new custom class which allows to compute anything from there?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making it a two step process is problematic because in the first step you don't know if you want to scale the results, and that might mean you compute results that you just throw away because the user was actually interested in the scaled results. If you don't materialize the results in step 1 but instead return some sort of lazy object then the user has to explicitly materialize the results, for no reason, from their perspective, only as a side effect of an optimization in the case when the user wants scaled results.
I agree it's not optimal that the
scale_to_overlap
argument is not applicable to all kinds of targets that can be requested, but I also find it hard to see that it would cause a problem in practice when using the workflow.Complicating the interface and the implementation only to avoid this is not worth it IMO.
I implemented the current version in collaboration with Nico to address the issues he and I experienced when using the workflow.
This PR has been so delayed that we've now lost several months of potential feedback and improvements, to add a small helper function in the periphery of the application, and now you want a rewrite with a new experimental proxy object interface.
In my opinion it would be best to just merge this or a lightly modified version of it, and then if you think there are opportunities for significant gains in UX by doing something different you can implement that and open a PR.
If you don't agree I think it's better that we sit down and sort this out together in person.