Skip to content

Conversation

semohr
Copy link
Contributor

@semohr semohr commented Oct 2, 2025

Description

Fixes an issue where empty metadata caused a crash in the import process.

Seems like the issue was not catched earlier, since the typehints were
not propagated correctly in the metadata_plugin.item_candidates function.

Might have also been prevented with #5965 too.

  • Changelog.
  • Tests.

closes #6060

Disclaimer

I opted to allow None for artist andtitle searches as this makes it clear for the plugins that these fields
might be empty. See more context here. This increases complexity for plugins a bit as they need to handle empty artists or titles gracefully.

We sadly need to still support cases where no metadata is given e.g. for the chroma plugin. Otherwise I would have just enforced that artist and title are always given.

The fix does seem a bit off to me. There may be a better way to approach this. Feedback and other ideas highly welcome!

semohr added 2 commits October 2, 2025 15:59
Seems like the issue was not catched earlier, since the typehints were
not propagated correctly in the item_candidates function.
@semohr semohr requested a review from a team as a code owner October 2, 2025 14:07
@Copilot Copilot AI review requested due to automatic review settings October 2, 2025 14:07
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a crash in the import process when files have empty metadata by adding proper null checks and handling in the MusicBrainz plugin and related metadata processing functions.

  • Added early return for empty filters in the _search_api method
  • Updated item_candidates functions to properly handle None values for artist and title
  • Enhanced search fallback to use filename stem when title is missing

Reviewed Changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
beetsplug/musicbrainz.py Added null checks for empty filters and optional parameters for artist/title
beets/metadata_plugins.py Updated function signatures to handle None values and fixed typo
beets/autotag/match.py Added fallback to filename stem and empty string normalization
test/plugins/test_musicbrainz.py Added parametrized tests for various null/empty metadata scenarios
docs/changelog.rst Added changelog entry for the bug fix

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines 807 to 809
filters = {
k: _v for k, v in filters.items() if (_v := v.lower().strip())
}
Copy link
Preview

Copilot AI Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dictionary comprehension will fail if any value in the filters dict is None, since None doesn't have a .lower() method. The early return on line 804-805 should check if filters is empty or if any values are None before this line is reached.

Copilot uses AI. Check for mistakes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filters is always a dict[str,str] here!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot is confused now that Nones have entered the game - let's keep them out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I think taking them out and just putting a remark in the docstrings should be good enough. A test also should help for the long term 👍

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `test/plugins/test_musicbrainz.py:1035-1040` </location>
<code_context>
+            ("Artist", "Title", 1),
+            (None, "Title", 1),
+            ("Artist", None, 1),
+            (None, None, 0),
+        ],
+    )
+    def test_item_candidates(
</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding a test case for empty strings as artist/title.

Please include test cases with empty strings ("", " ") for artist and/or title to verify this conversion logic is properly tested.

```suggestion
        [
            ("Artist", "Title", 1),
            (None, "Title", 1),
            ("Artist", None, 1),
            (None, None, 0),
            ("", "Title", 1),
            ("Artist", "", 1),
            ("", "", 0),
            (" ", "Title", 1),
            ("Artist", " ", 1),
            (" ", " ", 0),
            (None, "", 0),
            ("", None, 0),
            (None, " ", 0),
            (" ", None, 0),
        ],
```
</issue_to_address>

### Comment 2
<location> `test/plugins/test_musicbrainz.py:1055` </location>
<code_context>
         )

-        candidates = list(mb.item_candidates(Item(), "hello", "there"))
+        candidates = list(mb.item_candidates(Item(), artist, title))

-        assert len(candidates) == 1
</code_context>

<issue_to_address>
**suggestion (testing):** Missing test for error handling when plugin returns unexpected data.

Please add a test where the mocked plugin returns malformed or incomplete data to verify the function handles it without crashing.
</issue_to_address>

### Comment 3
<location> `test/plugins/test_musicbrainz.py:1058-1059` </location>
<code_context>

</code_context>

<issue_to_address>
**issue (code-quality):** Avoid conditionals in tests. ([`no-conditionals-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-conditionals-in-tests))

<details><summary>Explanation</summary>Avoid complex code, like conditionals, in test functions.

Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals

Some ways to fix this:

* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.

> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.

Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>

### Comment 4
<location> `beets/autotag/match.py:324` </location>
<code_context>
def tag_item(
    item: Item,
    search_artist: str | None = None,
    search_title: str | None = None,
    search_ids: list[str] | None = None,
) -> Proposal:
    """Find metadata for a single track. Return a `Proposal` consisting
    of `TrackMatch` objects.

    `search_artist` and `search_title` may be used to override the item
    metadata in the search query. `search_ids` may be used for restricting the
    search to a list of metadata backend IDs.
    """
    # Holds candidates found so far: keys are MBIDs; values are
    # (distance, TrackInfo) pairs.
    candidates = {}
    rec: Recommendation | None = None

    # First, try matching by the external source ID.
    trackids = search_ids or [t for t in [item.mb_trackid] if t]
    if trackids:
        for trackid in trackids:
            log.debug("Searching for track ID: {}", trackid)
            if info := metadata_plugins.track_for_id(trackid):
                dist = track_distance(item, info, incl_artist=True)
                candidates[info.track_id] = hooks.TrackMatch(dist, info)
                # If this is a good match, then don't keep searching.
                rec = _recommendation(_sort_candidates(candidates.values()))
                if (
                    rec == Recommendation.strong
                    and not config["import"]["timid"]
                ):
                    log.debug("Track ID match.")
                    return Proposal(_sort_candidates(candidates.values()), rec)

    # If we're searching by ID, don't proceed.
    if search_ids:
        if candidates:
            assert rec is not None
            return Proposal(_sort_candidates(candidates.values()), rec)
        else:
            return Proposal([], Recommendation.none)

    # Search terms.
    search_artist = search_artist or item.artist
    search_title = search_title or item.title or item.filepath.stem
    log.debug("Item search terms: {} - {}", search_artist, search_title)

    # Replace empty string with None
    if isinstance(search_artist, str) and search_artist.strip() == "":
        search_artist = None
    if isinstance(search_title, str) and search_title.strip() == "":
        search_title = None

    # Get and evaluate candidate metadata.
    for track_info in metadata_plugins.item_candidates(
        item, search_artist, search_title
    ):
        dist = track_distance(item, track_info, incl_artist=True)
        candidates[track_info.track_id] = hooks.TrackMatch(dist, track_info)

    # Sort by distance and return with recommendation.
    log.debug("Found {} candidates.", len(candidates))
    candidates_sorted = _sort_candidates(candidates.values())
    rec = _recommendation(candidates_sorted)
    return Proposal(candidates_sorted, rec)

</code_context>

<issue_to_address>
**issue (code-quality):** We've found these issues:

- Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
- Swap if/else branches ([`swap-if-else-branches`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/swap-if-else-branches/))
- Remove unnecessary else after guard condition ([`remove-unnecessary-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-unnecessary-else/))
- Low code quality found in tag\_item - 24% ([`low-code-quality`](https://docs.sourcery.ai/Reference/Default-Rules/comments/low-code-quality/))

<br/><details><summary>Explanation</summary>


The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.

How can you solve this?

It might be worth refactoring this function to make it shorter and more readable.

- Reduce the function length by extracting pieces of functionality out into
  their own functions. This is the most important thing you can do - ideally a
  function should be less than 10 lines.
- Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
  sits together within the function rather than being scattered.</details>
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@semohr semohr marked this pull request as draft October 2, 2025 14:13
Copy link

codecov bot commented Oct 2, 2025

Codecov Report

❌ Patch coverage is 25.00000% with 30 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.64%. Comparing base (e12020a) to head (ec7fbf5).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
beetsplug/discogs.py 0.00% 11 Missing ⚠️
beetsplug/beatport.py 0.00% 8 Missing ⚠️
beets/metadata_plugins.py 12.50% 7 Missing ⚠️
beets/autotag/match.py 20.00% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6065      +/-   ##
==========================================
- Coverage   66.72%   66.64%   -0.08%     
==========================================
  Files         119      119              
  Lines       18143    18175      +32     
  Branches     3076     3090      +14     
==========================================
+ Hits        12106    12113       +7     
- Misses       5378     5401      +23     
- Partials      659      661       +2     
Files with missing lines Coverage Δ
beets/ui/commands.py 74.85% <ø> (ø)
beetsplug/musicbrainz.py 72.65% <100.00%> (+0.50%) ⬆️
beets/autotag/match.py 79.22% <20.00%> (-2.12%) ⬇️
beets/metadata_plugins.py 77.37% <12.50%> (-2.33%) ⬇️
beetsplug/beatport.py 42.68% <0.00%> (-1.22%) ⬇️
beetsplug/discogs.py 65.54% <0.00%> (-1.89%) ⬇️
🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@semohr semohr marked this pull request as ready for review October 2, 2025 14:36
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `test/plugins/test_musicbrainz.py:1057-1058` </location>
<code_context>

-        assert len(candidates) == 1
-        assert candidates[0].track_id == self.RECORDING["id"]
+        assert len(candidates) == expected_count
+        if expected_count == 1:
+            assert candidates[0].track_id == self.RECORDING["id"]

</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding a test for empty string values ("") for artist and title.

Tests currently check for None but not for empty strings. Since empty strings are normalized to None, please add test cases for empty string inputs to verify this behavior.
</issue_to_address>

### Comment 2
<location> `test/plugins/test_musicbrainz.py:1058-1059` </location>
<code_context>

</code_context>

<issue_to_address>
**issue (code-quality):** Avoid conditionals in tests. ([`no-conditionals-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-conditionals-in-tests))

<details><summary>Explanation</summary>Avoid complex code, like conditionals, in test functions.

Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals

Some ways to fix this:

* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.

> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.

Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>

### Comment 3
<location> `beets/autotag/match.py:324` </location>
<code_context>
def tag_item(
    item: Item,
    search_artist: str | None = None,
    search_title: str | None = None,
    search_ids: list[str] | None = None,
) -> Proposal:
    """Find metadata for a single track. Return a `Proposal` consisting
    of `TrackMatch` objects.

    `search_artist` and `search_title` may be used to override the item
    metadata in the search query. `search_ids` may be used for restricting the
    search to a list of metadata backend IDs.
    """
    # Holds candidates found so far: keys are MBIDs; values are
    # (distance, TrackInfo) pairs.
    candidates = {}
    rec: Recommendation | None = None

    # First, try matching by the external source ID.
    trackids = search_ids or [t for t in [item.mb_trackid] if t]
    if trackids:
        for trackid in trackids:
            log.debug("Searching for track ID: {}", trackid)
            if info := metadata_plugins.track_for_id(trackid):
                dist = track_distance(item, info, incl_artist=True)
                candidates[info.track_id] = hooks.TrackMatch(dist, info)
                # If this is a good match, then don't keep searching.
                rec = _recommendation(_sort_candidates(candidates.values()))
                if (
                    rec == Recommendation.strong
                    and not config["import"]["timid"]
                ):
                    log.debug("Track ID match.")
                    return Proposal(_sort_candidates(candidates.values()), rec)

    # If we're searching by ID, don't proceed.
    if search_ids:
        if candidates:
            assert rec is not None
            return Proposal(_sort_candidates(candidates.values()), rec)
        else:
            return Proposal([], Recommendation.none)

    # Search terms.
    search_artist = search_artist or item.artist
    search_title = search_title or item.title or item.filepath.stem
    log.debug("Item search terms: {} - {}", search_artist, search_title)

    # Replace empty string with None
    if isinstance(search_artist, str) and search_artist.strip() == "":
        search_artist = None
    if isinstance(search_title, str) and search_title.strip() == "":
        search_title = None

    # Get and evaluate candidate metadata.
    for track_info in metadata_plugins.item_candidates(
        item, search_artist, search_title
    ):
        dist = track_distance(item, track_info, incl_artist=True)
        candidates[track_info.track_id] = hooks.TrackMatch(dist, track_info)

    # Sort by distance and return with recommendation.
    log.debug("Found {} candidates.", len(candidates))
    candidates_sorted = _sort_candidates(candidates.values())
    rec = _recommendation(candidates_sorted)
    return Proposal(candidates_sorted, rec)

</code_context>

<issue_to_address>
**issue (code-quality):** We've found these issues:

- Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
- Swap if/else branches ([`swap-if-else-branches`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/swap-if-else-branches/))
- Remove unnecessary else after guard condition ([`remove-unnecessary-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-unnecessary-else/))
- Low code quality found in tag\_item - 24% ([`low-code-quality`](https://docs.sourcery.ai/Reference/Default-Rules/comments/low-code-quality/))

<br/><details><summary>Explanation</summary>


The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.

How can you solve this?

It might be worth refactoring this function to make it shorter and more readable.

- Reduce the function length by extracting pieces of functionality out into
  their own functions. This is the most important thing you can do - ideally a
  function should be less than 10 lines.
- Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
  sits together within the function rather than being scattered.</details>
</issue_to_address>

### Comment 4
<location> `beetsplug/discogs.py:190-193` </location>
<code_context>
    def candidates(
        self,
        items: Sequence[Item],
        artist: str | None,
        album: str | None,
        va_likely: bool,
    ) -> Iterable[AlbumInfo]:
        query = ""
        if artist is not None:
            query += artist
        if album is not None:
            query += f" {album}"

        if va_likely:
            query = album or ""

        query = query.strip()
        if not query:
            return []

        return self.get_albums(query)

</code_context>

<issue_to_address>
**suggestion (code-quality):** We've found these issues:

- Lift code into else after jump in control flow ([`reintroduce-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/reintroduce-else/))
- Replace if statement with if expression ([`assign-if-exp`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/assign-if-exp/))

```suggestion
        return [] if not query else self.get_albums(query)
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@snejus
Copy link
Member

snejus commented Oct 3, 2025

Seems like the issue was not catched earlier, since the typehints were
not propagated correctly in the metadata_plugin.item_candidates function.

I may be missing something, but it seems to me they have been correct? Both candidates functions always received strings - including empty ones.

Copy link
Member

@snejus snejus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the practical benefit of replacing empty strings by None? I outlined a couple of issues with this approach.

I'm a bit confused about the 'why' here, especially so because this only applies to tracks/singletons and not albums

# Search terms.
search_artist = search_artist or item.artist
search_title = search_title or item.title
search_title = search_title or item.title or item.filepath.stem
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't have this here - if users want to use file names, they can enable the fromfilename plugin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it doesn't really hurt here and makes this one edgecase a bit better. Some search results are better than none imo.
But don't get me wrong I agree that people are better of using the fromfilename plugin.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It changes the behaviour for external plugins that expect this string to be either the search string or the title of the item.

Comment on lines +403 to +404
if artist is not None:
filters["artist"] = artist
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how the introduction of None brings more clarity - this now says 'any string is fine, including an empty one'.

On the other hand, if one is aware of tag_item implementation, they know that None just stands for an empty string.

Thus, I think, we should continue using strings and simply handle if not artist etc. if needed.

Copy link
Contributor Author

@semohr semohr Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, using empty strings is the easier way. Will change it in the next commit.

To give you some context how we got here: I just fixed the mypy issues that followed from typing the args and kwargs in the item_candidates function.

The main issue with the types stems from shadowing not working as expected for search_artists/titles. Every other function down the line expected the search strings to be str | None. The fix is straight forward, just didn't see this initially and thought item.title can return a None too 🤦

if artist is not None:
filters["artist"] = artist

if title is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a big change in behaviour - every metadata source handles absence of title fine when artist is given.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, both linked issues indicate that the issue is limited to musicbrainz only and only when both artist and title are empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will probably change it in the next commit, but I don't think it is a big change. At least from the typehint perspective this is how it is supposed to be imo. We use NotRequired for both the title and artist filter.

Copy link
Contributor Author

@semohr semohr Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, both linked issues indicate that the issue is limited to musicbrainz only and only when both artist and title are empty.

Maybe this is also a good reason for creating a proper general tests suit for all metadata plugins. I will see if I can think about how to approach this properly.

self, item: Item, artist: str | None, title: str | None
) -> Iterable[TrackInfo]:
query = f"{artist} {title}"
query = ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep the same str types and use the following instead

query = " ".join(map(str.strip, [artist, title]))

) -> Iterable[beets.autotag.hooks.TrackInfo]:
criteria = {"artist": artist, "recording": title, "alias": title}
criteria: dict[str, str] = {}
if artist is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is redundant - _search_api filters out empty values consistently for both singletons and albums. No change is needed here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly not. I tested it: If you for example input {"artist":None, "title":"something"} the _search_api will crash. But the logic should not matter if we properly support empty strings.

Copy link
Member

@snejus snejus Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies - forgot to clarify that my comment applies only if we handle strings only 😅

Comment on lines 807 to 809
filters = {
k: _v for k, v in filters.items() if (_v := v.lower().strip())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot is confused now that Nones have entered the game - let's keep them out.


filters = {
k: _v for k, v in filters.items() if (_v := v.lower().strip())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you've reverted to handling empty strings, the only adjustment that this module needs is adding an early return here:

if not filters:
    return []

@semohr
Copy link
Contributor Author

semohr commented Oct 3, 2025

Seems like the issue was not catched earlier, since the typehints were
not propagated correctly in the metadata_plugin.item_candidates function.

I may be missing something, but it seems to me they have been correct? Both candidates functions always received strings - including empty ones.

see #6065 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

beet import -s fails with "at least one query term is required"
2 participants