Allow `drop_conflicts` to handle non-bool returns from `eq` #10741

max-sixty · 2025-09-14T00:18:53Z

From Claude:

When merging attributes with combine_attrs="drop_conflicts", objects whose __eq__ method returns non-bool values (e.g., numpy arrays) would raise ValueError due to ambiguous truth values. This PR fixes the issue by catching the ValueError and treating such comparisons as non-equivalent, causing the attribute to be dropped.

This fixes a regression introduced in #10726 where the merge logic started using direct equality comparisons that can fail for certain attribute types.

What's changed:

Added try/except around equivalent() call in merge_attrs for the drop_conflicts case
Added comprehensive test coverage for various non-bool __eq__ scenarios

Test coverage includes:

Objects returning single-element numpy arrays (unambiguous)
Objects returning empty numpy arrays (ambiguous)
Objects returning multi-element numpy arrays (ambiguous)
Regular numpy arrays as attribute values

When merging attrs with drop_conflicts, objects whose __eq__ returns non-bool values (e.g., numpy arrays) would raise ValueError. Now these are caught and treated as non-equivalent, causing the attribute to be dropped rather than raising an error. Fixes regression from pydata#10726

The CI was failing because NumPy raises a DeprecationWarning before ValueError when evaluating ambiguous truth values. Since we're properly handling the ValueError, we suppress the warning in this specific context.

for more information, see https://pre-commit.ci

Rather than suppressing the warning in merge_attrs, handle it at the source in equivalent() where the 'or' operation happens. This is cleaner and includes a clear comment about when the suppression can be removed (when minimum numpy version >= 2.0). The warning only occurs in numpy < 2.0; numpy 2.0+ raises ValueError directly, which we already handle properly in merge_attrs.

for more information, see https://pre-commit.ci

The warning only occurs in our specific test case with custom objects that return numpy arrays from __eq__. This is a very edge case scenario, so it's more appropriate to suppress the warning in the test rather than in the library code.

for more information, see https://pre-commit.ci

Based on code review, simplified the logic to: - Process attributes in a single loop instead of nested operations - Eliminate intermediate dictionary creation - Make the control flow clearer and more efficient

xarray/structure/merge.py

doc/whats-new.rst

Co-authored-by: Justus Magin <[email protected]>

- Create dedicated equivalent_attrs function that wraps equivalent() - Returns False when ValueError is raised (e.g., for numpy arrays) - Simplifies merge_attrs logic by encapsulating error handling - Makes future behavior changes easier

shoyer · 2025-09-15T01:47:30Z

Let me give a few give a few concrete examples of pathological inputs to be sure we handle for attribute equivalence:

NumPy object arrays, e.g.,

x = np.array([None], dtype=object)
x[0] = np.arange(3)
y = np.array([None], dtype=object)
y[0] = np.arange(10, 13)

xarray.Dataset objects (which actually raises TypeError!):

x = xarray.Dataset({'foo': 1})
y = xarray.Dataset({'bar': 1})

Pandas series:

x = pd.Series([1, 2])
y = pd.Series([3, 4])

I think we should re-write equivalent to make it a little safer to use. My suggestion:

def equivalent(first: object, second: object) -> bool:
    if first is second:
        return True

    if isinstance(first, np.ndarray) and isinstance(second, np.ndarray):
        return np.array_equal(first, second, equal_nan=True)

    if isinstance(first, list) and isinstance(second, list):
        return list_equiv(first, second)

    result = first == second
    if not isinstance(result, bool):  # unhandled array-like values
        return False

    return result or (pd.isna(first) and pd.isna(second))  # valid for scalars


def equivalent_attrs(first: object, second: object) -> bool:
    try:
        return equivalent(first, second)
    except ValueError, TypeError:
        return False

Potentially we could make this is a little more aggressive by returning False if first and second have different types.

- Test NumPy object arrays with nested arrays - Test xarray.Dataset objects as attributes - Test Pandas Series as attributes - Update equivalent_attrs to catch TypeError in addition to ValueError - Ensure equivalent_attrs always returns a boolean (not Dataset/array-like) These cases all properly get dropped when they can't be reliably compared.

- Choose truthiness over strict boolean checking for consistency with Python's standard 'if a == b:' behavior - Accept edge cases with non-standard __eq__ methods as a deliberate tradeoff - Add comprehensive tests documenting the behavior and edge cases - Add clear documentation explaining the design choice and its implications This approach is more permissive and works better with numpy scalars and similar types, while still catching truly ambiguous cases (ValueError/TypeError).

- Rewrite equivalent() to reject non-boolean comparison results - Accept numpy bool scalars (np.bool_) but reject other non-bool types - Simplify equivalent_attrs() since equivalent() now handles non-bool cases - Update tests to reflect stricter behavior with non-standard __eq__ methods This makes comparisons more predictable by rejecting ambiguous cases like Dataset comparisons, custom objects with weird __eq__, etc. The tradeoff is being less permissive than Python's standard 'if a == b:' behavior.

max-sixty · 2025-09-15T17:49:24Z

implemented that approach @shoyer . so quite small & simple code changes; lots of tests...

xarray/tests/test_merge.py

xarray/structure/merge.py

- Fix comment to say 'nested numpy arrays' instead of 'numpy arrays' - Use del instead of pop() to avoid creating unused return value - Split long test methods into smaller, focused test methods - Move all imports (warnings, pandas) to top of file Co-authored-by: Claude <[email protected]>

shoyer

Thanks @max-sixty (+ Claude!)

xarray/tests/test_merge.py

- Split long test methods into smaller, focused test cases: - test_merge_attrs_drop_conflicts_ambiguous_array_returns split to add test_merge_attrs_drop_conflicts_all_true_array - test_merge_attrs_drop_conflicts_non_boolean_eq_returns split into test_merge_attrs_drop_conflicts_eq_returns_string and test_merge_attrs_drop_conflicts_eq_returns_number - Ran formatter which removed a blank line Co-authored-by: Claude <[email protected]>

max-sixty · 2025-09-16T04:34:30Z

Thanks @max-sixty (+ Claude!)

yw!

was an interesting case study — I give it a 6/10.

one wrinkle is that our standards are higher than most! and these models are great at doing median code, but even with me overseeing them, it still took a few iterations to get to the standard we wanted...

max-sixty and others added 9 commits September 13, 2025 17:17

Add changelog entry

ed18a13

Suppress DeprecationWarning for ambiguous array truth values

0fad4c8

The CI was failing because NumPy raises a DeprecationWarning before ValueError when evaluating ambiguous truth values. Since we're properly handling the ValueError, we suppress the warning in this specific context.

[pre-commit.ci] auto fixes from pre-commit.com hooks

7adaa40

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

116a8f0

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

be07f64

for more information, see https://pre-commit.ci

Simplify drop_conflicts implementation

ce92f94

Based on code review, simplified the logic to: - Process attributes in a single loop instead of nested operations - Eliminate intermediate dictionary creation - Make the control flow clearer and more efficient

max-sixty mentioned this pull request Sep 14, 2025

Change keep_attrs default to True #10726

Open

4 tasks

keewis reviewed Sep 14, 2025

View reviewed changes

xarray/structure/merge.py Outdated Show resolved Hide resolved

keewis reviewed Sep 14, 2025

View reviewed changes

doc/whats-new.rst Outdated Show resolved Hide resolved

keewis reviewed Sep 14, 2025

View reviewed changes

doc/whats-new.rst Show resolved Hide resolved

max-sixty and others added 2 commits September 14, 2025 08:08

Update whats-new.rst

c259f93

Co-authored-by: Justus Magin <[email protected]>

Update whats-new.rst

2b85c43

Co-authored-by: Justus Magin <[email protected]>

max-sixty added the plan to merge Final call for comments label Sep 14, 2025

max-sixty added 2 commits September 15, 2025 07:48

max-sixty changed the title ~~Fix drop_conflicts to handle non-bool returns from __eq__~~ Adjust drop_conflicts to handle non-bool returns from __eq__ Sep 15, 2025

max-sixty changed the title ~~Adjust drop_conflicts to handle non-bool returns from __eq__~~ Allow drop_conflicts to handle non-bool returns from __eq__ Sep 15, 2025

max-sixty changed the title ~~Allow drop_conflicts to handle non-bool returns from __eq__~~ Allow drop_conflicts to handle non-bool returns from __eq__ Sep 15, 2025

max-sixty force-pushed the merge-attrs-drop-eq branch from 42e5c9d to 5557193 Compare September 15, 2025 17:15

Merge branch 'main' into merge-attrs-drop-eq

4c7cb9a

shoyer reviewed Sep 15, 2025

View reviewed changes

shoyer approved these changes Sep 16, 2025

View reviewed changes

xarray/tests/test_merge.py Show resolved Hide resolved

xarray/tests/test_merge.py Show resolved Hide resolved

xarray/tests/test_merge.py Show resolved Hide resolved

max-sixty merged commit 8694818 into pydata:main Sep 16, 2025
37 checks passed

max-sixty deleted the merge-attrs-drop-eq branch September 16, 2025 04:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Allow `drop_conflicts` to handle non-bool returns from `eq` #10741

Allow `drop_conflicts` to handle non-bool returns from `eq` #10741

max-sixty commented Sep 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shoyer commented Sep 15, 2025 •

edited

Loading

Uh oh!

max-sixty commented Sep 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shoyer left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

max-sixty commented Sep 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Allow drop_conflicts to handle non-bool returns from __eq__ #10741

Allow drop_conflicts to handle non-bool returns from __eq__ #10741

Conversation

max-sixty commented Sep 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's changed:

Test coverage includes:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shoyer commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

max-sixty commented Sep 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shoyer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

max-sixty commented Sep 16, 2025

Uh oh!

Uh oh!

Uh oh!

Allow `drop_conflicts` to handle non-bool returns from `eq` #10741

Allow `drop_conflicts` to handle non-bool returns from `eq` #10741

max-sixty commented Sep 14, 2025 •

edited

Loading

shoyer commented Sep 15, 2025 •

edited

Loading