Finishing the manipulations.py Rewrite #47

ehewins · 2023-09-17T17:42:57Z

Description

This pull request covers the last part of my summer project: the manipulations.py rewrite. It also includes the new analytical unit tests, from branch 44-new-unit-tests-for-slicer-averagers, except in this branch they point towards the new manipulations module instead of the old one.

The details of the manipulations.py rewrite are as follows:

Parent classes for Cartesian and Polar regions of interest setup have now been added, drastically reducing the amount of repeated code.
The directional averagers (SlabX, Ring, SectorQ, etc) now all use a common DirectionalAverage class for the bulk of their computation. This too reduces repeated code and increases maintainability.
The method of weighting datapoints based on their position relative to the ROI (inside or outside) has been retooled to make it easier to implement fractional binning later down the line.
All the data manipulators require an nbins parameter now, whereas before some needed bin_width instead.
Instead of measuring phi angles from the negative x-axis, they are measured from the positive x-axis and are supplied to the averagers with the same -pi to pi interval used by the slicers themselves.

The main functionality change is how DirectionalAverage considers the weights of the datapoints. In both the new and old manipulations.py these weights are either 0 or 1, though now it should be a lot easier to implement fractional weights. Previously, the weight assigned to a datapoint was calculated on-the-fly as part of a for loop which ran over each datapoint and was also responsible for part of the averaging process. The new methodology separates out the weighting and averaging processes, creating an nxm weights matrix where n is the number of bins the data is sorted into, and m is the number of datapoints. The averaging is then done using matrix multiplication and numpy functions. This new method will certainly be more memory heavy than the old method, and it could be slower for large datasets. My hope is however that the switch from python functions to numpy functions will mitigate this, and the improvement to the quality of results once fractional weights are added should more than justify the resource usage.

Dependencies needed: scipy - needed for new unit tests.
Note that merging this pull request requires that a sister pull request in the sasview repo also be merged: SasView/sasview#2615

Fixes #46 (issue announcing rewrite plan) and fixes #44 (issue about dodgy unit tests) and fixes #43 (wedge averaging had issues with full-circle ROIs).

How Has This Been Tested?

There are two ways this rewrite has been tested. The less rigorous method involved using this branch plus the sasview branch mentioned above to confirm that the plotted results look the same before and after using the same data file, slicer and slicer parameters. The more rigorous method involves the new unit tests. When the file utest_averaging_analytical.py from the branch 44-new-unit-tests-for-slicer-averagers is run, you see that averagers from the previous version of manipulations.py all pass these tests. The version of utest_averaging_analytical.py included in this branch tests the new averagers, and these also pass the tests. Note there are other minor changes made to the tests for compatibility with the new averagers.

Work still to be done

At a superficial level, there are some files which need renaming. The new version of manipulations.py goes by the name new_manipulations.py. Once everyone is satisfied with this new version, it should be renamed to replace the old one. There are references to new_manipulations in utest_averaging_analytical.py on the sasdata side, as well as in Plotter2D.py, AnnulusSlicer.py, BoxSlicer.py, BoxSum.py, WedgeSlicer.py, and SectorSlicer.py on the sasview side. These should also be changed accordingly.

There is also some more serious work to do before this rewrite can truly be considered complete. For the most part we've already reached feature parity with the old manipulations.py, but there are some blind spots which need acknowledging.

In the old version of manipulations.py, the _Sector class had the option to bin the data logarithmically. As far as I can tell this functionality was never called upon by sasview. There may be users who use it in custom scripts however, so it would still be worth implementing here. As a bonus, the feature would now be available to all slicers.
Currently, no attention is paid to the data's resolution function. In the old manipulations.py, CircularAverage and _Sector call a function by the name of get_dq_data. I don't quite understand the maths, but I get the impression from Lucas that it doesn't function as it should. The last thing needed to complete this rewrite would be proper consideration of the dqx_data and dqy_data properties of the supplied Data2D object to compute the dx property of the returned Data1D object. The maths involved here is a little outside my comfort zone, so I'd be happy to delegate this responsibility to someone else.
I'd like to restructure the unit tests file. At the moment the tests are grouped in classes based on the averager they're testing, but if instead they were grouped by the kind of test being performed I could use unittest's setUp and tearDown methods to reduce the amount of repeated code. They also only test the averagers under expected conditions. Although the old tests were no more rigorous, I think it would be good write some tests for edge cases.

Review Checklist (please remove items if they don't apply):

Code has been reviewed
Functionality has been tested
Windows installer (GH artifact) has been tested (installed and worked)
MacOSX installer (GH artifact) has been tested (installed and worked)
User documentation is available and complete (if required)
Developers documentation is available and complete (if required)
The introduced changes comply with SasView license (BSD 3-Clause)

krzywon

Code looks good. Locally, I am seeing two errors during testing related to zero division, but the ci succeeds, so likely a local environment issue. I think replacing the scipy requirement with something already included in the package would be good, but not a show-stopper.

Keeping the existing manipulations file is a good idea now that I've had a minute to think about it, but a deprecation message would be helpful. That is something that can be added later.

sasdata/data_util/new_manipulations.py

requirements.txt

test/sasdataloader/utest_averaging_analytical.py

sasdata/data_util/new_manipulations.py

lucas-wilkins · 2023-09-27T19:23:14Z

My fractional binning stuff definitely needs scipy

…

On Wed, 27 Sept 2023, 19:13 Jeff Krzywon, ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ On sasdata/data_util/new_manipulations.py <#47 (comment)>: Disregard. See my main note. Keeping this is good for backward compatibility. — Reply to this email directly, view it on GitHub <#47 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACKU4SU644FHYZIDH4AOI43X4RUF7ANCNFSM6AAAAAA43YDHVE> . You are receiving this because your review was requested.Message ID: ***@***.***>

wpotrzebowski · 2023-10-10T13:25:31Z

Required for 6.0.0

wpotrzebowski · 2023-10-10T13:27:16Z

@lucas-wilkins will take a look at it

krzywon

I think this is ready now, but I made changes, so I would suggest an alternate review prior to merging.

sasdata/data_util/manipulations.py

lucas-wilkins · 2023-10-24T15:45:51Z

I've not checked the behaviour of the slicers, but I assume that is still all good.

wpotrzebowski · 2023-11-07T14:36:39Z

@smk78 will try to take a look

smk78 · 2023-12-06T12:22:49Z

Testing https://github.com/SasView/sasview/actions/runs/7079272736 on W10/x64:

After loading 2D datasets (one .dat, one .h5) & doing Create New > Right-click > Slicers, I see:

Perform Circular Average:
Nothing happens but you get an error:

ERROR: Traceback (most recent call last): File "sas\qtgui\Plotting\Plotter2D.py", line 349, in onCircularAverage File "sas\qtgui\Plotting\Plotter2D.py", line 318, in circularAverage File "sasdata\data_util\averaging.py", line 585, in __call__ File "sasdata\data_util\averaging.py", line 91, in __init__ TypeError: Parameter 'nbins' must be an integer

But files (obviously) have integer numbers of bins.

Sector (Q View):
Sector plot appears. Slicer tool appears.
Slicer tool can be rotated by mouse, but slicer width bars don't move. But, if you click on a box on a slicer width bar the tool stops responding but if you click in the 2D plot the sector plot updates, so I think there is something weird with the updating of the tool on the 2D plot?
Also, there appears to be no feedback to Slicer Parameters box.
Trying to use Slicer Parameters box to control this slicer does nothing and throws errors like this:

ERROR: Traceback (most recent call last): File "sas\qtgui\Plotting\SlicerModel.py", line 66, in setParamsFromModelItem File "sas\qtgui\Plotting\Slicers\SectorSlicer.py", line 274, in setParams File "sas\qtgui\Plotting\Slicers\SectorSlicer.py", line 383, in update AttributeError: 'LineInteractor' object has no attribute 'alpha'
ERROR: Traceback (most recent call last): File "sas\qtgui\Plotting\SlicerModel.py", line 56, in setParamsFromModelItem File "sas\qtgui\Plotting\Slicers\SectorSlicer.py", line 248, in getParams ValueError: Phi left and phi right are different 0.518482, -0.005688

Annulus (Phi View)
Annulus plot appears. Slicer tool appears.
Slicer tool can be manipulated by mouse. Radius values update in the Slicer Parameters box. Slicer tool can be controlled from the Slicer Parameters box. Results look sensible.
But, I do note that that the two rings of the tool can be dragged through each other, which is a bit confusing.
And at some point this error popped up:

ERROR: Traceback (most recent call last): File "sas\qtgui\Plotting\Plotter2D.py", line 349, in onCircularAverage File "sas\qtgui\Plotting\Plotter2D.py", line 318, in circularAverage File "sasdata\data_util\averaging.py", line 585, in __call__ File "sasdata\data_util\averaging.py", line 91, in __init__ TypeError: Parameter 'nbins' must be an integer

Box Sum
Box Sum box appears. Slicer tool appears.
Slicer tool can be manipulated by mouse. Box size values and Box parameters update in the Slicer Parameters box. Until... you try and manipulate the tool from the Box Sum box. Nothing then happens and the tool stops working.
Also, for the slicers that use the Slicer Parameters box there is an Apply button to action changes made in the Slicer Parameters box. The Box Sum box does not have an Apply button.

More to follow...

smk78 · 2023-12-06T13:23:58Z

Box Averaging in Qx or Qy
Box Average plot appears. Slicer tool appears. Slicer tool can be manipulated by mouse. Plot updates. Until... you try and manipulate the tool from the Slicer Parameters box. Nothing then happens and the tool stops working.

ERROR: Traceback (most recent call last): File "sas\qtgui\Plotting\SlicerModel.py", line 66, in setParamsFromModelItem File "sas\qtgui\Plotting\Slicers\BoxSlicer.py", line 242, in setParams KeyError: 'qy_max'
ERROR: Traceback (most recent call last): File "sas\qtgui\Plotting\SlicerModel.py", line 66, in setParamsFromModelItem File "sas\qtgui\Plotting\Slicers\BoxSlicer.py", line 242, in setParams KeyError: 'qy_max'

Wedge Averaging in Q
Wedge Average plot appears. Slicer tool appears. Slicer tool can be manipulated by mouse. Plot updates. And... with this slicer you can control the tool from the Slicer Parameters box!
Wedge Averaging in Phi
Slicer tool appears but then throws an immediate error:

ERROR: Traceback (most recent call last): File "sas\qtgui\Plotting\Plotter2D.py", line 480, in onWedgeAveragingPhi File "sas\qtgui\Plotting\Plotter2D.py", line 399, in setSlicer File "sas\qtgui\Plotting\Slicers\WedgeSlicer.py", line 355, in __init__ File "sas\qtgui\Plotting\Slicers\WedgeSlicer.py", line 79, in __init__ File "sas\qtgui\Plotting\Slicers\WedgeSlicer.py", line 361, in _post_data File "sas\qtgui\Plotting\Slicers\WedgeSlicer.py", line 177, in _post_data File "sasdata\data_util\averaging.py", line 899, in __call__ ValueError: operands could not be broadcast together with shapes (82,) (100,) (82,)

This tool is also persistent; ie, once it has appeared, if you change the slicer it is not cleared on the plot.

Also, weirdly, the top 14% of the 2D image gets erased with this slicer!

General point
In the right-click context menu the Slicers are called:

But in the Slicer Parameters box they are called:

And the plot labels are:
(unable to generate plot for Circular Average)
SectorQ...
AnnulusPhi...
SlabX...
SlabY...
WedgeQ...
(unable to generate plot for Wedge Averaging in Phi)

I think some consistency of naming would be helpful.

smk78

Have performed functionality review on W10/x64. Unfortunately, there are some issues. Please see comments on this PR.

lucas-wilkins · 2023-12-06T13:32:36Z

Also, weirdly, the top 14% of the 2D image gets erased with this slicer!

Are you sure its not 18%

lucas-wilkins · 2023-12-06T17:15:06Z

Note to @krzywon - I'm working on this now, that's why everything is failing. Will finish fixing soon.

smk78 · 2023-12-07T11:45:14Z

Also, weirdly, the top 14% of the 2D image gets erased with this slicer!

Are you sure its not 18%

I was going by the ticks on the axis! A seventh being ~14%.

…been identical anyway

… since rewrite.

…to match

…thon versions 3.8 and 3.9

…tically relevant

codescene-delta-analysis

Gates Failed
New code is healthy (2 new files with code health below 9.00)
Enforce critical code health rules (1 file with Low Cohesion)

Gates Passed
1 Quality Gates Passed

See analysis details in CodeScene

Reason for failure

New code is healthy	Violations	Code Health Impact
utest_averaging_analytical.py	4 rules	6.29	Suppress
averaging.py	4 rules	6.99	Suppress

Enforce critical code health rules	Violations	Code Health Impact
utest_averaging_analytical.py	1 critical rule	6.29	Suppress

Quality Gate Profile: The Bare Minimum
Want more control? Customize Code Health rules or catch issues early with our IDE extension and CLI tool.

codescene-delta-analysis · 2025-10-27T14:10:44Z

test/sasmanipulations/utest_averaging_analytical.py

+    def test_slabx_init(self):
+        """
+        Test that SlabX's __init__ method does what it's supposed to.
+        """
+        qx_min = 1
+        qx_max = 2
+        qy_min = 3
+        qy_max = 4
+        nbins = 100
+        fold = True
+
+        slab_object = SlabX(qx_min=qx_min, qx_max=qx_max, qy_min=qy_min,
+                            qy_max=qy_max, nbins=nbins, fold=fold)
+
+        self.assertEqual(slab_object.qx_min, qx_min)
+        self.assertEqual(slab_object.qx_max, qx_max)
+        self.assertEqual(slab_object.qy_min, qy_min)
+        self.assertEqual(slab_object.qy_max, qy_max)
+        self.assertEqual(slab_object.nbins, nbins)
+        self.assertEqual(slab_object.fold, fold)


❌ New issue: Code Duplication
The module contains 27 functions with similar structure: BoxavgTests.test_boxavg_init,BoxavgTests.test_boxavg_multiple_detectors,BoxavgTests.test_boxavg_subset_total,BoxavgTests.test_boxavg_total and 23 more functions

_Suppress

codescene-delta-analysis · 2025-10-27T14:10:45Z

test/sasmanipulations/utest_averaging_analytical.py

+    def __init__(self, data2d=None, err_data=None):
+        if data2d is not None:
+            matrix = np.asarray(data2d)
+        else:
+            msg = "Data must be supplied to convert to Data2D"
+            raise ValueError(msg)
+
+        if matrix.ndim != 2:
+            msg = "Supplied array must have 2 dimensions to convert to Data2D"
+            raise ValueError(msg)
+
+        if err_data is not None:
+            err_data = np.asarray(err_data)
+            if err_data.shape != matrix.shape:
+                msg = "Data and errors must have the same shape"
+                raise ValueError(msg)
+
+        # qmax can be any number, 1 just makes things simple.
+        self.qmax = 1
+        qx_bins = np.linspace(start=-1 * self.qmax,
+                              stop=self.qmax,
+                              num=matrix.shape[1],
+                              endpoint=True)
+        qy_bins = np.linspace(start=-1 * self.qmax,
+                              stop=self.qmax,
+                              num=matrix.shape[0],
+                              endpoint=True)
+
+        # Creating arrays in Data2D's preferred format.
+        data2d = matrix.flatten()
+        if err_data is None or np.any(err_data <= 0):
+            # Error data of some kind is needed, so we fabricate some
+            err_data = np.sqrt(np.abs(data2d))  # TODO - use different approach
+        else:
+            err_data = err_data.flatten()
+        qx_data = np.tile(qx_bins, (len(qy_bins), 1)).flatten()
+        qy_data = np.tile(qy_bins, (len(qx_bins), 1)).swapaxes(0, 1).flatten()
+        q_data = np.sqrt(qx_data * qx_data + qy_data * qy_data)
+        mask = np.ones(len(data2d), dtype=bool)
+
+        # Creating a Data2D object to use for testing the averagers.
+        self.data = data_info.Data2D(data=data2d, err_data=err_data,
+                                     qx_data=qx_data, qy_data=qy_data,
+                                     q_data=q_data, mask=mask)


❌ New issue: Complex Method
MatrixToData2D.init has a cyclomatic complexity of 12, threshold = 9

_Suppress

codescene-delta-analysis · 2025-10-27T14:10:45Z