Change `keep_attrs` default to `True` #10726

max-sixty · 2025-09-09T20:07:21Z

as discussed in #3891

Done with lots of help from Claude Code:

This change modifies xarray's default behavior to preserve attributes (attrs) across all operations, including computational, binary, and data manipulation functions. Previously, attributes were dropped by default unless keep_attrs=True was explicitly set. This new default aligns xarray with common scientific workflows where metadata preservation is crucial.

The keep_attrs option now defaults to True for most operations. For binary operations, attributes are preserved from the left-hand operand.

This is a breaking change

Closes Keep attrs by default? (keep_attrs) #3891
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst
New functions/methods are listed in api.rst

keewis · 2025-09-09T20:50:40Z

This is indeed a breaking change, which might require a deprecation cycle. Not sure, though, we could also decide that the benefits outweigh the disruption.

For binary operations, attributes are preserved from the left-hand operand.

should binary operations support combine_attrs strategy names / a function? apply_ufunc already supports that for keep_attrs (added in #5041), so it wouldn't be anything new for functions using it.

(doesn't have to be this PR, though)

max-sixty · 2025-09-09T21:33:09Z

This is indeed a breaking change, which might require a deprecation cycle. Not sure, though, we could also decide that the benefits outweigh the disruption.

it's difficult to do a deprecation cycle here — we would need to do something like "give a warning anytime we run an operation that drops an attrs" — could be quite noisy. some discussion on this in #3891. not impossible though...

I think it would probably be OK to start propagating more attrs by default as a breaking change. There's no easy way to roll this out incrementally, and I doubt too many users are relying upon metadata disappearing when they do xarray operations, given the somewhat inconsistent state of the current rules.

BREAKING CHANGE: Change keep_attrs default from False to True This changes the default behavior of xarray operations to preserve attributes by default, which better aligns with user expectations and scientific workflows where metadata preservation is critical. Migration guide: - To restore previous behavior globally: xr.set_options(keep_attrs=False) - To restore for specific operations: use keep_attrs=False parameter - Alternative: use .drop_attrs() method after operations Closes pydata#3891, pydata#4510, pydata#9920

failures in pydata#10726 + `main`

Integrated coordinate preservation feature from main with our keep_attrs changes.

…alse The merge incorrectly preserved coordinate attributes even when keep_attrs=False. Now coordinates have their attrs cleared when keep_attrs=False, consistent with data variables.

- When keep_attrs=True: restore attrs from original coords (func may have dropped them) - When keep_attrs=False: clear all attrs - More efficient than previous implementation

Group attribute operations by keep_attrs value for cleaner, more readable code with identical functionality.

shoyer

Generally looks on the right track, though Claude seems to be inserting a lot of specific comments about "now" that will get stale very quickly and should be removed.

xarray/core/variable.py

xarray/tests/test_computation.py

xarray/tests/test_dataarray.py

Per Stefan's review, remove 'now' from comments that describe behavior changes, as these become stale over time. Replace with timeless descriptions that simply state the current behavior.

for more information, see https://pre-commit.ci

- Remove PR-specific comment from variable.py that only makes sense in context - Remove redundant comment from test_computation.py - Clarify comment in test_dataarray.py about argmin preserving attrs

max-sixty · 2025-09-12T22:23:49Z

Generally looks on the right track, though Claude seems to be inserting a lot of specific comments about "now" that will get stale very quickly and should be removed.

yes agree, thanks for checking

now removed!

is everyone OK to move forward? please raise if not!

doc/whats-new.rst

shoyer · 2025-09-12T23:55:04Z

I wonder if a better default would be combine_attrs="drop_conflicts" from xarray.merge? Currently keep_attrs=True means combine_attrs="override".

"drop_conflicts" would resolve @Illviljan's concerns about binary arithmetic with mismatched units, which was likely part of the original motivation for the default keep_attrs=False.

The main limitation of drop_conflicts is that it assumes that every attribute value can be cheaply compared for equality, which may not be true (e.g., if someone puts a dask.array in an attrs). But I suppose we can have a back-up policy (either "drop" or "override") for rare cases where __eq__ does not return a boolean.

Instead of only preserving the left operand's attributes, binary operations now combine attributes from both operands using the drop_conflicts strategy: - Matching attributes (same key, same value) are kept - Conflicting attributes (same key, different values) are dropped - Non-conflicting attributes from both operands are preserved This provides more intuitive behavior when combining data with partially overlapping metadata.

For backward compatibility, when one operand has no attributes (None), keep the left operand's attributes instead of merging. This maintains the existing test expectations while still providing drop_conflicts behavior when both operands have attributes.

for more information, see https://pre-commit.ci

When one operand has no attributes (either None or empty dict), the result should have no attributes. This maintains backward compatibility and fixes test_1d_math failures. The issue was compounded by a side effect in the attrs property getter that mutates _attrs from None to {} when accessed.

When merging attrs with drop_conflicts, objects whose __eq__ returns non-bool values (e.g., numpy arrays) would raise ValueError. Now these are caught and treated as non-equivalent, causing the attribute to be dropped rather than raising an error. Fixes regression from pydata#10726

Use attrs property directly instead of checking _attrs, since the property normalizes None to {}. This simplifies the logic while maintaining the same behavior.

Variable uses None for no attrs, Dataset uses {} for no attrs. Updated comments to make this distinction clear.

Previously we were dropping all attrs if either operand had no attrs. Now we properly merge attrs and only drop conflicting ones, which is what drop_conflicts should do: - If one has {"a": 1} and other has {}, result is {"a": 1} - If one has {"a": 1} and other has {"a": 2}, result is {} - If one has {"a": 1} and other has {"b": 2}, result is {"a": 1, "b": 2} Updated tests to reflect this correct behavior.

Variable constructor already normalizes empty dict to None internally, so the explicit conversion is redundant.

max-sixty · 2025-09-14T04:26:46Z

The main limitation of drop_conflicts is that it assumes that every attribute value can be cheaply compared for equality, which may not be true (e.g., if someone puts a dask.array in an attrs). But I suppose we can have a back-up policy (either "drop" or "override") for rare cases where __eq__ does not return a boolean.

great, #10741 for that change!

edit: and merged

@keewis

* Fix drop_conflicts to handle non-bool returns from __eq__ When merging attrs with drop_conflicts, objects whose __eq__ returns non-bool values (e.g., numpy arrays) would raise ValueError. Now these are caught and treated as non-equivalent, causing the attribute to be dropped rather than raising an error. Fixes regression from #10726 * Add changelog entry * Suppress DeprecationWarning for ambiguous array truth values The CI was failing because NumPy raises a DeprecationWarning before ValueError when evaluating ambiguous truth values. Since we're properly handling the ValueError, we suppress the warning in this specific context. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Move DeprecationWarning suppression to equivalent function Rather than suppressing the warning in merge_attrs, handle it at the source in equivalent() where the 'or' operation happens. This is cleaner and includes a clear comment about when the suppression can be removed (when minimum numpy version >= 2.0). The warning only occurs in numpy < 2.0; numpy 2.0+ raises ValueError directly, which we already handle properly in merge_attrs. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Move DeprecationWarning suppression from library to test The warning only occurs in our specific test case with custom objects that return numpy arrays from __eq__. This is a very edge case scenario, so it's more appropriate to suppress the warning in the test rather than in the library code. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Simplify drop_conflicts implementation Based on code review, simplified the logic to: - Process attributes in a single loop instead of nested operations - Eliminate intermediate dictionary creation - Make the control flow clearer and more efficient * Update whats-new.rst Co-authored-by: Justus Magin <[email protected]> * Update whats-new.rst Co-authored-by: Justus Magin <[email protected]> * Implement equivalent_attrs function as suggested by @keewis - Create dedicated equivalent_attrs function that wraps equivalent() - Returns False when ValueError is raised (e.g., for numpy arrays) - Simplifies merge_attrs logic by encapsulating error handling - Makes future behavior changes easier * Add pathological test cases suggested by @shoyer - Test NumPy object arrays with nested arrays - Test xarray.Dataset objects as attributes - Test Pandas Series as attributes - Update equivalent_attrs to catch TypeError in addition to ValueError - Ensure equivalent_attrs always returns a boolean (not Dataset/array-like) These cases all properly get dropped when they can't be reliably compared. * Use truthiness in equivalent_attrs for consistency with Python - Choose truthiness over strict boolean checking for consistency with Python's standard 'if a == b:' behavior - Accept edge cases with non-standard __eq__ methods as a deliberate tradeoff - Add comprehensive tests documenting the behavior and edge cases - Add clear documentation explaining the design choice and its implications This approach is more permissive and works better with numpy scalars and similar types, while still catching truly ambiguous cases (ValueError/TypeError). * Implement @shoyer's stricter equivalent() function - Rewrite equivalent() to reject non-boolean comparison results - Accept numpy bool scalars (np.bool_) but reject other non-bool types - Simplify equivalent_attrs() since equivalent() now handles non-bool cases - Update tests to reflect stricter behavior with non-standard __eq__ methods This makes comparisons more predictable by rejecting ambiguous cases like Dataset comparisons, custom objects with weird __eq__, etc. The tradeoff is being less permissive than Python's standard 'if a == b:' behavior. * Address @shoyer's review comments - Fix comment to say 'nested numpy arrays' instead of 'numpy arrays' - Use del instead of pop() to avoid creating unused return value - Split long test methods into smaller, focused test methods - Move all imports (warnings, pandas) to top of file Co-authored-by: Claude <[email protected]> * Address @shoyer's review comments - Split long test methods into smaller, focused test cases: - test_merge_attrs_drop_conflicts_ambiguous_array_returns split to add test_merge_attrs_drop_conflicts_all_true_array - test_merge_attrs_drop_conflicts_non_boolean_eq_returns split into test_merge_attrs_drop_conflicts_eq_returns_string and test_merge_attrs_drop_conflicts_eq_returns_number - Ran formatter which removed a blank line Co-authored-by: Claude <[email protected]> --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Justus Magin <[email protected]> Co-authored-by: Claude <[email protected]>

github-actions bot added the topic-DataTree Related to the implementation of a DataTree class label Sep 9, 2025

max-sixty force-pushed the keep-attrs branch from e19c755 to 8453961 Compare September 9, 2025 23:53

max-sixty added a commit to max-sixty/xarray that referenced this pull request Sep 10, 2025

xfail windows matplotlib xdist tests

ae1305f

failures in pydata#10726 + `main`

max-sixty mentioned this pull request Sep 10, 2025

xfail windows matplotlib xdist tests #10729

Open

Merge branch 'main' into keep-attrs

fa49f34

max-sixty requested a review from dcherian September 10, 2025 17:26

max-sixty mentioned this pull request Sep 10, 2025

propagate attrs on coords in Dataset.map #10602

Merged

3 tasks

max-sixty added 4 commits September 10, 2025 12:06

Merge main and resolve conflicts in Dataset.map

9c224ec

Integrated coordinate preservation feature from main with our keep_attrs changes.

Fix Dataset.map to properly handle coordinate attrs when keep_attrs=F…

ab77f0c

…alse The merge incorrectly preserved coordinate attributes even when keep_attrs=False. Now coordinates have their attrs cleared when keep_attrs=False, consistent with data variables.

Optimize Dataset.map coordinate attribute handling

2988fe0

- When keep_attrs=True: restore attrs from original coords (func may have dropped them) - When keep_attrs=False: clear all attrs - More efficient than previous implementation

Simplify Dataset.map attribute handling code

f7f3c5b

Group attribute operations by keep_attrs value for cleaner, more readable code with identical functionality.

shoyer reviewed Sep 12, 2025

View reviewed changes

xarray/core/variable.py Outdated Show resolved Hide resolved

xarray/tests/test_computation.py Outdated Show resolved Hide resolved

xarray/tests/test_computation.py Outdated Show resolved Hide resolved

xarray/tests/test_dataarray.py Outdated Show resolved Hide resolved

max-sixty and others added 4 commits September 12, 2025 14:36

Remove temporal 'now' references from comments

08a4d43

Per Stefan's review, remove 'now' from comments that describe behavior changes, as these become stale over time. Replace with timeless descriptions that simply state the current behavior.

[pre-commit.ci] auto fixes from pre-commit.com hooks

f2e5f56

for more information, see https://pre-commit.ci

Merge branch 'main' into keep-attrs

87266a2

Address remaining review comments from Stefan

8858170

- Remove PR-specific comment from variable.py that only makes sense in context - Remove redundant comment from test_computation.py - Clarify comment in test_dataarray.py about argmin preserving attrs

max-sixty added plan to merge Final call for comments and removed topic-DataTree Related to the implementation of a DataTree class labels Sep 12, 2025

Illviljan reviewed Sep 12, 2025

View reviewed changes

doc/whats-new.rst Outdated Show resolved Hide resolved

shoyer added needs discussion and removed plan to merge Final call for comments labels Sep 13, 2025

github-actions bot added the topic-DataTree Related to the implementation of a DataTree class label Sep 13, 2025

max-sixty and others added 2 commits September 13, 2025 16:28

[pre-commit.ci] auto fixes from pre-commit.com hooks

f8ba047

for more information, see https://pre-commit.ci

max-sixty mentioned this pull request Sep 14, 2025

Allow drop_conflicts to handle non-bool returns from __eq__ #10741

Merged

max-sixty added 4 commits September 13, 2025 19:27

Simplify binary ops attrs handling

602b169

Use attrs property directly instead of checking _attrs, since the property normalizes None to {}. This simplifies the logic while maintaining the same behavior.

Clarify comments about attrs handling differences

4f003fb

Variable uses None for no attrs, Dataset uses {} for no attrs. Updated comments to make this distinction clear.

Remove unnecessary conversion of {} to None

b93090f

Variable constructor already normalizes empty dict to None internally, so the explicit conversion is redundant.

Merge branch 'main' into keep-attrs

97bdb29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Change `keep_attrs` default to `True` #10726

Change `keep_attrs` default to `True` #10726

Uh oh!

max-sixty commented Sep 9, 2025 •

edited

Loading

Uh oh!

keewis commented Sep 9, 2025

Uh oh!

max-sixty commented Sep 9, 2025

Uh oh!

shoyer left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

max-sixty commented Sep 12, 2025

Uh oh!

Uh oh!

shoyer commented Sep 12, 2025

Uh oh!

max-sixty commented Sep 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Change keep_attrs default to True #10726

Are you sure you want to change the base?

Change keep_attrs default to True #10726

Uh oh!

Conversation

max-sixty commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

keewis commented Sep 9, 2025

Uh oh!

max-sixty commented Sep 9, 2025

Uh oh!

shoyer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

max-sixty commented Sep 12, 2025

Uh oh!

Uh oh!

shoyer commented Sep 12, 2025

Uh oh!

max-sixty commented Sep 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Change `keep_attrs` default to `True` #10726

Change `keep_attrs` default to `True` #10726

max-sixty commented Sep 9, 2025 •

edited

Loading

max-sixty commented Sep 14, 2025 •

edited

Loading