-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Change keep_attrs
default to True
#10726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This is indeed a breaking change, which might require a deprecation cycle. Not sure, though, we could also decide that the benefits outweigh the disruption.
should binary operations support (doesn't have to be this PR, though) |
it's difficult to do a deprecation cycle here — we would need to do something like "give a warning anytime we run an operation that drops an attrs" — could be quite noisy. some discussion on this in #3891. not impossible though...
|
BREAKING CHANGE: Change keep_attrs default from False to True This changes the default behavior of xarray operations to preserve attributes by default, which better aligns with user expectations and scientific workflows where metadata preservation is critical. Migration guide: - To restore previous behavior globally: xr.set_options(keep_attrs=False) - To restore for specific operations: use keep_attrs=False parameter - Alternative: use .drop_attrs() method after operations Closes pydata#3891, pydata#4510, pydata#9920
e19c755
to
8453961
Compare
failures in pydata#10726 + `main`
Integrated coordinate preservation feature from main with our keep_attrs changes.
…alse The merge incorrectly preserved coordinate attributes even when keep_attrs=False. Now coordinates have their attrs cleared when keep_attrs=False, consistent with data variables.
- When keep_attrs=True: restore attrs from original coords (func may have dropped them) - When keep_attrs=False: clear all attrs - More efficient than previous implementation
Group attribute operations by keep_attrs value for cleaner, more readable code with identical functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks on the right track, though Claude seems to be inserting a lot of specific comments about "now" that will get stale very quickly and should be removed.
Per Stefan's review, remove 'now' from comments that describe behavior changes, as these become stale over time. Replace with timeless descriptions that simply state the current behavior.
for more information, see https://pre-commit.ci
- Remove PR-specific comment from variable.py that only makes sense in context - Remove redundant comment from test_computation.py - Clarify comment in test_dataarray.py about argmin preserving attrs
yes agree, thanks for checking now removed! is everyone OK to move forward? please raise if not! |
I wonder if a better default would be "drop_conflicts" would resolve @Illviljan's concerns about binary arithmetic with mismatched units, which was likely part of the original motivation for the default The main limitation of |
Instead of only preserving the left operand's attributes, binary operations now combine attributes from both operands using the drop_conflicts strategy: - Matching attributes (same key, same value) are kept - Conflicting attributes (same key, different values) are dropped - Non-conflicting attributes from both operands are preserved This provides more intuitive behavior when combining data with partially overlapping metadata.
For backward compatibility, when one operand has no attributes (None), keep the left operand's attributes instead of merging. This maintains the existing test expectations while still providing drop_conflicts behavior when both operands have attributes.
for more information, see https://pre-commit.ci
When one operand has no attributes (either None or empty dict), the result should have no attributes. This maintains backward compatibility and fixes test_1d_math failures. The issue was compounded by a side effect in the attrs property getter that mutates _attrs from None to {} when accessed.
When merging attrs with drop_conflicts, objects whose __eq__ returns non-bool values (e.g., numpy arrays) would raise ValueError. Now these are caught and treated as non-equivalent, causing the attribute to be dropped rather than raising an error. Fixes regression from pydata#10726
Use attrs property directly instead of checking _attrs, since the property normalizes None to {}. This simplifies the logic while maintaining the same behavior.
Variable uses None for no attrs, Dataset uses {} for no attrs. Updated comments to make this distinction clear.
Previously we were dropping all attrs if either operand had no attrs. Now we properly merge attrs and only drop conflicting ones, which is what drop_conflicts should do: - If one has {"a": 1} and other has {}, result is {"a": 1} - If one has {"a": 1} and other has {"a": 2}, result is {} - If one has {"a": 1} and other has {"b": 2}, result is {"a": 1, "b": 2} Updated tests to reflect this correct behavior.
Variable constructor already normalizes empty dict to None internally, so the explicit conversion is redundant.
great, #10741 for that change! edit: and merged |
* Fix drop_conflicts to handle non-bool returns from __eq__ When merging attrs with drop_conflicts, objects whose __eq__ returns non-bool values (e.g., numpy arrays) would raise ValueError. Now these are caught and treated as non-equivalent, causing the attribute to be dropped rather than raising an error. Fixes regression from #10726 * Add changelog entry * Suppress DeprecationWarning for ambiguous array truth values The CI was failing because NumPy raises a DeprecationWarning before ValueError when evaluating ambiguous truth values. Since we're properly handling the ValueError, we suppress the warning in this specific context. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Move DeprecationWarning suppression to equivalent function Rather than suppressing the warning in merge_attrs, handle it at the source in equivalent() where the 'or' operation happens. This is cleaner and includes a clear comment about when the suppression can be removed (when minimum numpy version >= 2.0). The warning only occurs in numpy < 2.0; numpy 2.0+ raises ValueError directly, which we already handle properly in merge_attrs. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Move DeprecationWarning suppression from library to test The warning only occurs in our specific test case with custom objects that return numpy arrays from __eq__. This is a very edge case scenario, so it's more appropriate to suppress the warning in the test rather than in the library code. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Simplify drop_conflicts implementation Based on code review, simplified the logic to: - Process attributes in a single loop instead of nested operations - Eliminate intermediate dictionary creation - Make the control flow clearer and more efficient * Update whats-new.rst Co-authored-by: Justus Magin <[email protected]> * Update whats-new.rst Co-authored-by: Justus Magin <[email protected]> * Implement equivalent_attrs function as suggested by @keewis - Create dedicated equivalent_attrs function that wraps equivalent() - Returns False when ValueError is raised (e.g., for numpy arrays) - Simplifies merge_attrs logic by encapsulating error handling - Makes future behavior changes easier * Add pathological test cases suggested by @shoyer - Test NumPy object arrays with nested arrays - Test xarray.Dataset objects as attributes - Test Pandas Series as attributes - Update equivalent_attrs to catch TypeError in addition to ValueError - Ensure equivalent_attrs always returns a boolean (not Dataset/array-like) These cases all properly get dropped when they can't be reliably compared. * Use truthiness in equivalent_attrs for consistency with Python - Choose truthiness over strict boolean checking for consistency with Python's standard 'if a == b:' behavior - Accept edge cases with non-standard __eq__ methods as a deliberate tradeoff - Add comprehensive tests documenting the behavior and edge cases - Add clear documentation explaining the design choice and its implications This approach is more permissive and works better with numpy scalars and similar types, while still catching truly ambiguous cases (ValueError/TypeError). * Implement @shoyer's stricter equivalent() function - Rewrite equivalent() to reject non-boolean comparison results - Accept numpy bool scalars (np.bool_) but reject other non-bool types - Simplify equivalent_attrs() since equivalent() now handles non-bool cases - Update tests to reflect stricter behavior with non-standard __eq__ methods This makes comparisons more predictable by rejecting ambiguous cases like Dataset comparisons, custom objects with weird __eq__, etc. The tradeoff is being less permissive than Python's standard 'if a == b:' behavior. * Address @shoyer's review comments - Fix comment to say 'nested numpy arrays' instead of 'numpy arrays' - Use del instead of pop() to avoid creating unused return value - Split long test methods into smaller, focused test methods - Move all imports (warnings, pandas) to top of file Co-authored-by: Claude <[email protected]> * Address @shoyer's review comments - Split long test methods into smaller, focused test cases: - test_merge_attrs_drop_conflicts_ambiguous_array_returns split to add test_merge_attrs_drop_conflicts_all_true_array - test_merge_attrs_drop_conflicts_non_boolean_eq_returns split into test_merge_attrs_drop_conflicts_eq_returns_string and test_merge_attrs_drop_conflicts_eq_returns_number - Ran formatter which removed a blank line Co-authored-by: Claude <[email protected]> --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Justus Magin <[email protected]> Co-authored-by: Claude <[email protected]>
as discussed in #3891
Done with lots of help from Claude Code:
This change modifies xarray's default behavior to preserve attributes (
attrs
) across all operations, including computational, binary, and data manipulation functions. Previously, attributes were dropped by default unlesskeep_attrs=True
was explicitly set. This new default aligns xarray with common scientific workflows where metadata preservation is crucial.The
keep_attrs
option now defaults toTrue
for most operations. For binary operations, attributes are preserved from the left-hand operand.This is a breaking change
whats-new.rst
api.rst