-
Notifications
You must be signed in to change notification settings - Fork 3
Finishing the manipulations.py Rewrite #47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
krzywon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good. Locally, I am seeing two errors during testing related to zero division, but the ci succeeds, so likely a local environment issue. I think replacing the scipy requirement with something already included in the package would be good, but not a show-stopper.
Keeping the existing manipulations file is a good idea now that I've had a minute to think about it, but a deprecation message would be helpful. That is something that can be added later.
|
My fractional binning stuff definitely needs scipy
…On Wed, 27 Sept 2023, 19:13 Jeff Krzywon, ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
On sasdata/data_util/new_manipulations.py
<#47 (comment)>:
Disregard. See my main note. Keeping this is good for backward
compatibility.
—
Reply to this email directly, view it on GitHub
<#47 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACKU4SU644FHYZIDH4AOI43X4RUF7ANCNFSM6AAAAAA43YDHVE>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
|
Required for 6.0.0 |
|
@lucas-wilkins will take a look at it |
krzywon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ready now, but I made changes, so I would suggest an alternate review prior to merging.
|
I've not checked the behaviour of the slicers, but I assume that is still all good. |
|
@smk78 will try to take a look |
|
Testing https://github.com/SasView/sasview/actions/runs/7079272736 on W10/x64: After loading 2D datasets (one .dat, one .h5) & doing Create New > Right-click > Slicers, I see:
But files (obviously) have integer numbers of bins.
More to follow... |
This tool is also persistent; ie, once it has appeared, if you change the slicer it is not cleared on the plot. Also, weirdly, the top 14% of the 2D image gets erased with this slicer!
But in the Slicer Parameters box they are called: And the plot labels are: I think some consistency of naming would be helpful. |
smk78
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have performed functionality review on W10/x64. Unfortunately, there are some issues. Please see comments on this PR.
Are you sure its not 18% |
|
Note to @krzywon - I'm working on this now, that's why everything is failing. Will finish fixing soon. |
I was going by the ticks on the axis! A seventh being ~14%. |
…been identical anyway
…thon versions 3.8 and 3.9
03c45b5 to
cc05ea7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gates Failed
New code is healthy
(2 new files with code health below 9.00)
Enforce critical code health rules
(1 file with Low Cohesion)
Gates Passed
1 Quality Gates Passed
See analysis details in CodeScene
Reason for failure
| New code is healthy | Violations | Code Health Impact | |
|---|---|---|---|
| utest_averaging_analytical.py | 4 rules | 6.29 | Suppress |
| averaging.py | 4 rules | 6.99 | Suppress |
| Enforce critical code health rules | Violations | Code Health Impact | |
|---|---|---|---|
| utest_averaging_analytical.py | 1 critical rule | 6.29 | Suppress |
Quality Gate Profile: The Bare Minimum
Want more control? Customize Code Health rules or catch issues early with our IDE extension and CLI tool.
| def test_slabx_init(self): | ||
| """ | ||
| Test that SlabX's __init__ method does what it's supposed to. | ||
| """ | ||
| qx_min = 1 | ||
| qx_max = 2 | ||
| qy_min = 3 | ||
| qy_max = 4 | ||
| nbins = 100 | ||
| fold = True | ||
|
|
||
| slab_object = SlabX(qx_min=qx_min, qx_max=qx_max, qy_min=qy_min, | ||
| qy_max=qy_max, nbins=nbins, fold=fold) | ||
|
|
||
| self.assertEqual(slab_object.qx_min, qx_min) | ||
| self.assertEqual(slab_object.qx_max, qx_max) | ||
| self.assertEqual(slab_object.qy_min, qy_min) | ||
| self.assertEqual(slab_object.qy_max, qy_max) | ||
| self.assertEqual(slab_object.nbins, nbins) | ||
| self.assertEqual(slab_object.fold, fold) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ New issue: Code Duplication
The module contains 27 functions with similar structure: BoxavgTests.test_boxavg_init,BoxavgTests.test_boxavg_multiple_detectors,BoxavgTests.test_boxavg_subset_total,BoxavgTests.test_boxavg_total and 23 more functions
| def __init__(self, data2d=None, err_data=None): | ||
| if data2d is not None: | ||
| matrix = np.asarray(data2d) | ||
| else: | ||
| msg = "Data must be supplied to convert to Data2D" | ||
| raise ValueError(msg) | ||
|
|
||
| if matrix.ndim != 2: | ||
| msg = "Supplied array must have 2 dimensions to convert to Data2D" | ||
| raise ValueError(msg) | ||
|
|
||
| if err_data is not None: | ||
| err_data = np.asarray(err_data) | ||
| if err_data.shape != matrix.shape: | ||
| msg = "Data and errors must have the same shape" | ||
| raise ValueError(msg) | ||
|
|
||
| # qmax can be any number, 1 just makes things simple. | ||
| self.qmax = 1 | ||
| qx_bins = np.linspace(start=-1 * self.qmax, | ||
| stop=self.qmax, | ||
| num=matrix.shape[1], | ||
| endpoint=True) | ||
| qy_bins = np.linspace(start=-1 * self.qmax, | ||
| stop=self.qmax, | ||
| num=matrix.shape[0], | ||
| endpoint=True) | ||
|
|
||
| # Creating arrays in Data2D's preferred format. | ||
| data2d = matrix.flatten() | ||
| if err_data is None or np.any(err_data <= 0): | ||
| # Error data of some kind is needed, so we fabricate some | ||
| err_data = np.sqrt(np.abs(data2d)) # TODO - use different approach | ||
| else: | ||
| err_data = err_data.flatten() | ||
| qx_data = np.tile(qx_bins, (len(qy_bins), 1)).flatten() | ||
| qy_data = np.tile(qy_bins, (len(qx_bins), 1)).swapaxes(0, 1).flatten() | ||
| q_data = np.sqrt(qx_data * qx_data + qy_data * qy_data) | ||
| mask = np.ones(len(data2d), dtype=bool) | ||
|
|
||
| # Creating a Data2D object to use for testing the averagers. | ||
| self.data = data_info.Data2D(data=data2d, err_data=err_data, | ||
| qx_data=qx_data, qy_data=qy_data, | ||
| q_data=q_data, mask=mask) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ New issue: Complex Method
MatrixToData2D.init has a cyclomatic complexity of 12, threshold = 9
| def __init__(self, qx_min: float = 0, qx_max: float = 0, | ||
| qy_min: float = 0, qy_max: float = 0) -> None: | ||
| """ | ||
| Assign the variables used to label the properties of the Data2D object. | ||
| Also establish the upper and lower bounds defining the ROI. | ||
| The units of these parameters are A^-1 | ||
| :param qx_min: Lower bound of the ROI along the Q_x direction. | ||
| :param qx_max: Upper bound of the ROI along the Q_x direction. | ||
| :param qy_min: Lower bound of the ROI along the Q_y direction. | ||
| :param qy_max: Upper bound of the ROI along the Q_y direction. | ||
| """ | ||
|
|
||
| super().__init__() | ||
| self.qx_min = qx_min | ||
| self.qx_max = qx_max | ||
| self.qy_min = qy_min | ||
| self.qy_max = qy_max |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ New issue: Code Duplication
The module contains 14 functions with similar structure: Boxavg.init,Boxsum.init,CartesianROI.init,CircularAverage.call and 10 more functions
| def __init__(self, | ||
| major_axis: ArrayLike, | ||
| minor_axis: ArrayLike, | ||
| major_lims: tuple[float, float] | None = None, | ||
| minor_lims: tuple[float, float] | None = None, | ||
| nbins: int = 100): | ||
| """ | ||
| Set up direction of averaging, limits on the ROI, & the number of bins. | ||
| :param major_axis: Coordinate data for axis onto which the 2D data is | ||
| projected. | ||
| :param minor_axis: Coordinate data for the axis perpendicular to the | ||
| major axis. | ||
| :param major_lims: Lower and upper bounds of the ROI along the major | ||
| axis. Given as a 2 element tuple/list. | ||
| :param minor_lims: Lower and upper bounds of the ROI along the minor | ||
| axis. Given as a 2 element tuple/list. | ||
| :param nbins: The number of bins the major axis is divided up into. | ||
| """ | ||
|
|
||
| if any(not hasattr(coordinate_data, "__array__") for | ||
| coordinate_data in (major_axis, minor_axis)): | ||
| msg = "Must provide major & minor coordinate arrays for binning." | ||
| raise ValueError(msg) | ||
|
|
||
| if any(lims is not None and len(lims) != 2 for | ||
| lims in (major_lims, minor_lims)): | ||
| msg = "Limits arrays must have 2 elements or be NoneType" | ||
| raise ValueError(msg) | ||
|
|
||
| if not isinstance(nbins, int): | ||
| # TODO: Make classes that depend on this provide ints, its quite a thing to fix though | ||
| try: | ||
| nbins = int(nbins) | ||
| except: | ||
| msg = f"Parameter 'nbins' must should be convertable to an integer via int(), got type {type(nbins)} (={nbins})" | ||
| raise TypeError(msg) | ||
|
|
||
| self.major_axis = np.asarray(major_axis) | ||
| self.minor_axis = np.asarray(minor_axis) | ||
| if self.major_axis.size != self.minor_axis.size: | ||
| msg = "Major and minor axes must have same length" | ||
| raise ValueError(msg) | ||
| # In some cases all values from a given axis are part of the ROI. | ||
| # An alternative approach may be needed for fractional weights. | ||
| if major_lims is None: | ||
| self.major_lims = (self.major_axis.min(), self.major_axis.max()) | ||
| else: | ||
| self.major_lims = major_lims | ||
| if minor_lims is None: | ||
| self.minor_lims = (self.minor_axis.min(), self.minor_axis.max()) | ||
| else: | ||
| self.minor_lims = minor_lims | ||
| self.nbins = nbins | ||
| # Assume a linear spacing for now, but allow for log, fibonacci, etc. implementations in the future | ||
| # Add one to bin because this is for the limits, not centroids. | ||
| self.bin_limits = np.linspace(self.major_lims[0], self.major_lims[1], self.nbins + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ New issue: Complex Method
DirectionalAverage.init has a cyclomatic complexity of 17, threshold = 9
| def __init__(self, | ||
| major_axis: ArrayLike, | ||
| minor_axis: ArrayLike, | ||
| major_lims: tuple[float, float] | None = None, | ||
| minor_lims: tuple[float, float] | None = None, | ||
| nbins: int = 100): | ||
| """ | ||
| Set up direction of averaging, limits on the ROI, & the number of bins. | ||
| :param major_axis: Coordinate data for axis onto which the 2D data is | ||
| projected. | ||
| :param minor_axis: Coordinate data for the axis perpendicular to the | ||
| major axis. | ||
| :param major_lims: Lower and upper bounds of the ROI along the major | ||
| axis. Given as a 2 element tuple/list. | ||
| :param minor_lims: Lower and upper bounds of the ROI along the minor | ||
| axis. Given as a 2 element tuple/list. | ||
| :param nbins: The number of bins the major axis is divided up into. | ||
| """ | ||
|
|
||
| if any(not hasattr(coordinate_data, "__array__") for | ||
| coordinate_data in (major_axis, minor_axis)): | ||
| msg = "Must provide major & minor coordinate arrays for binning." | ||
| raise ValueError(msg) | ||
|
|
||
| if any(lims is not None and len(lims) != 2 for | ||
| lims in (major_lims, minor_lims)): | ||
| msg = "Limits arrays must have 2 elements or be NoneType" | ||
| raise ValueError(msg) | ||
|
|
||
| if not isinstance(nbins, int): | ||
| # TODO: Make classes that depend on this provide ints, its quite a thing to fix though | ||
| try: | ||
| nbins = int(nbins) | ||
| except: | ||
| msg = f"Parameter 'nbins' must should be convertable to an integer via int(), got type {type(nbins)} (={nbins})" | ||
| raise TypeError(msg) | ||
|
|
||
| self.major_axis = np.asarray(major_axis) | ||
| self.minor_axis = np.asarray(minor_axis) | ||
| if self.major_axis.size != self.minor_axis.size: | ||
| msg = "Major and minor axes must have same length" | ||
| raise ValueError(msg) | ||
| # In some cases all values from a given axis are part of the ROI. | ||
| # An alternative approach may be needed for fractional weights. | ||
| if major_lims is None: | ||
| self.major_lims = (self.major_axis.min(), self.major_axis.max()) | ||
| else: | ||
| self.major_lims = major_lims | ||
| if minor_lims is None: | ||
| self.minor_lims = (self.minor_axis.min(), self.minor_axis.max()) | ||
| else: | ||
| self.minor_lims = minor_lims | ||
| self.nbins = nbins | ||
| # Assume a linear spacing for now, but allow for log, fibonacci, etc. implementations in the future | ||
| # Add one to bin because this is for the limits, not centroids. | ||
| self.bin_limits = np.linspace(self.major_lims[0], self.major_lims[1], self.nbins + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ New issue: Excess Number of Function Arguments
DirectionalAverage.init has 5 arguments, max arguments = 4
| def __init__(self, qx_min: float = 0, qx_max: float = 0, qy_min: float = 0, | ||
| qy_max: float = 0, nbins: int = 100, fold: bool = False): | ||
| """ | ||
| Set up the ROI boundaries, the binning of the output 1D data, and fold. | ||
| The units of these parameters are A^-1 | ||
| :param qx_min: Lower bound of the ROI along the Q_x direction. | ||
| :param qx_max: Upper bound of the ROI along the Q_x direction. | ||
| :param qy_min: Lower bound of the ROI along the Q_y direction. | ||
| :param qy_max: Upper bound of the ROI along the Q_y direction. | ||
| :param nbins: The number of bins data is sorted into along Q_x. | ||
| :param fold: Whether the two halves of the ROI along Q_x should be | ||
| folded together during averaging. | ||
| """ | ||
| super().__init__(qx_min=qx_min, qx_max=qx_max, | ||
| qy_min=qy_min, qy_max=qy_max) | ||
| self.nbins = nbins | ||
| self.fold = fold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ New issue: Excess Number of Function Arguments
SlabX.init has 6 arguments, max arguments = 4
| def __init__(self, qx_min: float = 0, qx_max: float = 0, qy_min: float = 0, | ||
| qy_max: float = 0, nbins: int = 100, fold: bool = False): | ||
| """ | ||
| Set up the ROI boundaries, the binning of the output 1D data, and fold. | ||
| The units of these parameters are A^-1 | ||
| :param qx_min: Lower bound of the ROI along the Q_x direction. | ||
| :param qx_max: Upper bound of the ROI along the Q_x direction. | ||
| :param qy_min: Lower bound of the ROI along the Q_y direction. | ||
| :param qy_max: Upper bound of the ROI along the Q_y direction. | ||
| :param nbins: The number of bins data is sorted into along Q_y. | ||
| :param fold: Whether the two halves of the ROI along Q_y should be | ||
| folded together during averaging. | ||
| """ | ||
| super().__init__(qx_min=qx_min, qx_max=qx_max, | ||
| qy_min=qy_min, qy_max=qy_max) | ||
| self.nbins = nbins | ||
| self.fold = fold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ New issue: Excess Number of Function Arguments
SlabY.init has 6 arguments, max arguments = 4
| def __init__(self, r_min: float, r_max: float, phi_min: float, | ||
| phi_max: float, nbins: int = 100, fold: bool = True) -> None: | ||
| """ | ||
| Set up the ROI boundaries, the binning of the output 1D data, and fold. | ||
| The units are A^-1 for radial parameters, and radians for anglar ones. | ||
| :param r_min: Lower limit for |Q| values to use during averaging. | ||
| :param r_max: Upper limit for |Q| values to use during averaging. | ||
| :param phi_min: Lower limit for φ values (in the primary ROI). | ||
| :param phi_max: Upper limit for φ values (in the primary ROI). | ||
| :param nbins: The number of bins data is sorted into along the |Q| axis | ||
| :param fold: Whether the primary and secondary ROIs should be folded | ||
| together during averaging. | ||
| """ | ||
| super().__init__(r_min=r_min, r_max=r_max, | ||
| phi_min=phi_min, phi_max=phi_max) | ||
|
|
||
| self.nbins = nbins | ||
| self.fold = fold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ New issue: Excess Number of Function Arguments
SectorQ.init has 6 arguments, max arguments = 4
| def __init__(self, r_min: float, r_max: float, phi_min: float, | ||
| phi_max: float, nbins: int = 100) -> None: | ||
| """ | ||
| Set up the ROI boundaries, and the binning of the output 1D data. | ||
| The units are A^-1 for radial parameters, and radians for anglar ones. | ||
| :param r_min: Lower limit for |Q| values to use during averaging. | ||
| :param r_max: Upper limit for |Q| values to use during averaging. | ||
| :param phi_min: Lower limit for φ values (in the primary ROI). | ||
| :param phi_max: Upper limit for φ values (in the primary ROI). | ||
| :param nbins: The number of bins data is sorted into along the |Q| axis | ||
| """ | ||
| super().__init__(r_min=r_min, r_max=r_max, | ||
| phi_min=phi_min, phi_max=phi_max) | ||
| self.nbins = nbins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ New issue: Excess Number of Function Arguments
WedgeQ.init has 5 arguments, max arguments = 4
| def __init__(self, r_min: float, r_max: float, phi_min: float, | ||
| phi_max: float, nbins: int = 100) -> None: | ||
| """ | ||
| Set up the ROI boundaries, and the binning of the output 1D data. | ||
| The units are A^-1 for radial parameters, and radians for anglar ones. | ||
| :param r_min: Lower limit for |Q| values to use during averaging. | ||
| :param r_max: Upper limit for |Q| values to use during averaging. | ||
| :param phi_min: Lower limit for φ values to use during averaging. | ||
| :param phi_max: Upper limit for φ values to use during averaging. | ||
| :param nbins: The number of bins data is sorted into along the φ axis. | ||
| """ | ||
| super().__init__(r_min=r_min, r_max=r_max, | ||
| phi_min=phi_min, phi_max=phi_max) | ||
| self.nbins = nbins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ New issue: Excess Number of Function Arguments
WedgePhi.init has 5 arguments, max arguments = 4


Description
This pull request covers the last part of my summer project: the
manipulations.pyrewrite. It also includes the new analytical unit tests, from branch44-new-unit-tests-for-slicer-averagers, except in this branch they point towards the new manipulations module instead of the old one.The details of the
manipulations.pyrewrite are as follows:DirectionalAverageclass for the bulk of their computation. This too reduces repeated code and increases maintainability.nbinsparameter now, whereas before some neededbin_widthinstead.The main functionality change is how
DirectionalAverageconsiders the weights of the datapoints. In both the new and oldmanipulations.pythese weights are either 0 or 1, though now it should be a lot easier to implement fractional weights. Previously, the weight assigned to a datapoint was calculated on-the-fly as part of a for loop which ran over each datapoint and was also responsible for part of the averaging process. The new methodology separates out the weighting and averaging processes, creating annxmweights matrix wherenis the number of bins the data is sorted into, andmis the number of datapoints. The averaging is then done using matrix multiplication and numpy functions. This new method will certainly be more memory heavy than the old method, and it could be slower for large datasets. My hope is however that the switch from python functions to numpy functions will mitigate this, and the improvement to the quality of results once fractional weights are added should more than justify the resource usage.Dependencies needed: scipy - needed for new unit tests.
Note that merging this pull request requires that a sister pull request in the sasview repo also be merged: SasView/sasview#2615
Fixes #46 (issue announcing rewrite plan) and fixes #44 (issue about dodgy unit tests) and fixes #43 (wedge averaging had issues with full-circle ROIs).
How Has This Been Tested?
There are two ways this rewrite has been tested. The less rigorous method involved using this branch plus the sasview branch mentioned above to confirm that the plotted results look the same before and after using the same data file, slicer and slicer parameters. The more rigorous method involves the new unit tests. When the file
utest_averaging_analytical.pyfrom the branch44-new-unit-tests-for-slicer-averagersis run, you see that averagers from the previous version ofmanipulations.pyall pass these tests. The version ofutest_averaging_analytical.pyincluded in this branch tests the new averagers, and these also pass the tests. Note there are other minor changes made to the tests for compatibility with the new averagers.Work still to be done
At a superficial level, there are some files which need renaming. The new version of
manipulations.pygoes by the namenew_manipulations.py. Once everyone is satisfied with this new version, it should be renamed to replace the old one. There are references tonew_manipulationsinutest_averaging_analytical.pyon the sasdata side, as well as inPlotter2D.py,AnnulusSlicer.py,BoxSlicer.py,BoxSum.py,WedgeSlicer.py, andSectorSlicer.pyon the sasview side. These should also be changed accordingly.There is also some more serious work to do before this rewrite can truly be considered complete. For the most part we've already reached feature parity with the old
manipulations.py, but there are some blind spots which need acknowledging.manipulations.py, the_Sectorclass had the option to bin the data logarithmically. As far as I can tell this functionality was never called upon by sasview. There may be users who use it in custom scripts however, so it would still be worth implementing here. As a bonus, the feature would now be available to all slicers.manipulations.py,CircularAverageand_Sectorcall a function by the name ofget_dq_data. I don't quite understand the maths, but I get the impression from Lucas that it doesn't function as it should. The last thing needed to complete this rewrite would be proper consideration of thedqx_dataanddqy_dataproperties of the supplied Data2D object to compute thedxproperty of the returned Data1D object. The maths involved here is a little outside my comfort zone, so I'd be happy to delegate this responsibility to someone else.Review Checklist (please remove items if they don't apply):