-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Empty metadata fix #6065
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Empty metadata fix #6065
Conversation
Seems like the issue was not catched earlier, since the typehints were not propagated correctly in the item_candidates function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a crash in the import process when files have empty metadata by adding proper null checks and handling in the MusicBrainz plugin and related metadata processing functions.
- Added early return for empty filters in the
_search_api
method - Updated
item_candidates
functions to properly handle None values for artist and title - Enhanced search fallback to use filename stem when title is missing
Reviewed Changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
beetsplug/musicbrainz.py | Added null checks for empty filters and optional parameters for artist/title |
beets/metadata_plugins.py | Updated function signatures to handle None values and fixed typo |
beets/autotag/match.py | Added fallback to filename stem and empty string normalization |
test/plugins/test_musicbrainz.py | Added parametrized tests for various null/empty metadata scenarios |
docs/changelog.rst | Added changelog entry for the bug fix |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
filters = { | ||
k: _v for k, v in filters.items() if (_v := v.lower().strip()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This dictionary comprehension will fail if any value in the filters dict is None, since None doesn't have a .lower()
method. The early return on line 804-805 should check if filters is empty or if any values are None before this line is reached.
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filters is always a dict[str,str]
here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot is confused now that None
s have entered the game - let's keep them out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I think taking them out and just putting a remark in the docstrings should be good enough. A test also should help for the long term 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `test/plugins/test_musicbrainz.py:1035-1040` </location>
<code_context>
+ ("Artist", "Title", 1),
+ (None, "Title", 1),
+ ("Artist", None, 1),
+ (None, None, 0),
+ ],
+ )
+ def test_item_candidates(
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test case for empty strings as artist/title.
Please include test cases with empty strings ("", " ") for artist and/or title to verify this conversion logic is properly tested.
```suggestion
[
("Artist", "Title", 1),
(None, "Title", 1),
("Artist", None, 1),
(None, None, 0),
("", "Title", 1),
("Artist", "", 1),
("", "", 0),
(" ", "Title", 1),
("Artist", " ", 1),
(" ", " ", 0),
(None, "", 0),
("", None, 0),
(None, " ", 0),
(" ", None, 0),
],
```
</issue_to_address>
### Comment 2
<location> `test/plugins/test_musicbrainz.py:1055` </location>
<code_context>
)
- candidates = list(mb.item_candidates(Item(), "hello", "there"))
+ candidates = list(mb.item_candidates(Item(), artist, title))
- assert len(candidates) == 1
</code_context>
<issue_to_address>
**suggestion (testing):** Missing test for error handling when plugin returns unexpected data.
Please add a test where the mocked plugin returns malformed or incomplete data to verify the function handles it without crashing.
</issue_to_address>
### Comment 3
<location> `test/plugins/test_musicbrainz.py:1058-1059` </location>
<code_context>
</code_context>
<issue_to_address>
**issue (code-quality):** Avoid conditionals in tests. ([`no-conditionals-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-conditionals-in-tests))
<details><summary>Explanation</summary>Avoid complex code, like conditionals, in test functions.
Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals
Some ways to fix this:
* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.
> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.
Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>
### Comment 4
<location> `beets/autotag/match.py:324` </location>
<code_context>
def tag_item(
item: Item,
search_artist: str | None = None,
search_title: str | None = None,
search_ids: list[str] | None = None,
) -> Proposal:
"""Find metadata for a single track. Return a `Proposal` consisting
of `TrackMatch` objects.
`search_artist` and `search_title` may be used to override the item
metadata in the search query. `search_ids` may be used for restricting the
search to a list of metadata backend IDs.
"""
# Holds candidates found so far: keys are MBIDs; values are
# (distance, TrackInfo) pairs.
candidates = {}
rec: Recommendation | None = None
# First, try matching by the external source ID.
trackids = search_ids or [t for t in [item.mb_trackid] if t]
if trackids:
for trackid in trackids:
log.debug("Searching for track ID: {}", trackid)
if info := metadata_plugins.track_for_id(trackid):
dist = track_distance(item, info, incl_artist=True)
candidates[info.track_id] = hooks.TrackMatch(dist, info)
# If this is a good match, then don't keep searching.
rec = _recommendation(_sort_candidates(candidates.values()))
if (
rec == Recommendation.strong
and not config["import"]["timid"]
):
log.debug("Track ID match.")
return Proposal(_sort_candidates(candidates.values()), rec)
# If we're searching by ID, don't proceed.
if search_ids:
if candidates:
assert rec is not None
return Proposal(_sort_candidates(candidates.values()), rec)
else:
return Proposal([], Recommendation.none)
# Search terms.
search_artist = search_artist or item.artist
search_title = search_title or item.title or item.filepath.stem
log.debug("Item search terms: {} - {}", search_artist, search_title)
# Replace empty string with None
if isinstance(search_artist, str) and search_artist.strip() == "":
search_artist = None
if isinstance(search_title, str) and search_title.strip() == "":
search_title = None
# Get and evaluate candidate metadata.
for track_info in metadata_plugins.item_candidates(
item, search_artist, search_title
):
dist = track_distance(item, track_info, incl_artist=True)
candidates[track_info.track_id] = hooks.TrackMatch(dist, track_info)
# Sort by distance and return with recommendation.
log.debug("Found {} candidates.", len(candidates))
candidates_sorted = _sort_candidates(candidates.values())
rec = _recommendation(candidates_sorted)
return Proposal(candidates_sorted, rec)
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
- Swap if/else branches ([`swap-if-else-branches`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/swap-if-else-branches/))
- Remove unnecessary else after guard condition ([`remove-unnecessary-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-unnecessary-else/))
- Low code quality found in tag\_item - 24% ([`low-code-quality`](https://docs.sourcery.ai/Reference/Default-Rules/comments/low-code-quality/))
<br/><details><summary>Explanation</summary>
The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.
How can you solve this?
It might be worth refactoring this function to make it shorter and more readable.
- Reduce the function length by extracting pieces of functionality out into
their own functions. This is the most important thing you can do - ideally a
function should be less than 10 lines.
- Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
sits together within the function rather than being scattered.</details>
</issue_to_address>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6065 +/- ##
==========================================
- Coverage 66.72% 66.64% -0.08%
==========================================
Files 119 119
Lines 18143 18175 +32
Branches 3076 3090 +14
==========================================
+ Hits 12106 12113 +7
- Misses 5378 5401 +23
- Partials 659 661 +2
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `test/plugins/test_musicbrainz.py:1057-1058` </location>
<code_context>
- assert len(candidates) == 1
- assert candidates[0].track_id == self.RECORDING["id"]
+ assert len(candidates) == expected_count
+ if expected_count == 1:
+ assert candidates[0].track_id == self.RECORDING["id"]
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test for empty string values ("") for artist and title.
Tests currently check for None but not for empty strings. Since empty strings are normalized to None, please add test cases for empty string inputs to verify this behavior.
</issue_to_address>
### Comment 2
<location> `test/plugins/test_musicbrainz.py:1058-1059` </location>
<code_context>
</code_context>
<issue_to_address>
**issue (code-quality):** Avoid conditionals in tests. ([`no-conditionals-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-conditionals-in-tests))
<details><summary>Explanation</summary>Avoid complex code, like conditionals, in test functions.
Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals
Some ways to fix this:
* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.
> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.
Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>
### Comment 3
<location> `beets/autotag/match.py:324` </location>
<code_context>
def tag_item(
item: Item,
search_artist: str | None = None,
search_title: str | None = None,
search_ids: list[str] | None = None,
) -> Proposal:
"""Find metadata for a single track. Return a `Proposal` consisting
of `TrackMatch` objects.
`search_artist` and `search_title` may be used to override the item
metadata in the search query. `search_ids` may be used for restricting the
search to a list of metadata backend IDs.
"""
# Holds candidates found so far: keys are MBIDs; values are
# (distance, TrackInfo) pairs.
candidates = {}
rec: Recommendation | None = None
# First, try matching by the external source ID.
trackids = search_ids or [t for t in [item.mb_trackid] if t]
if trackids:
for trackid in trackids:
log.debug("Searching for track ID: {}", trackid)
if info := metadata_plugins.track_for_id(trackid):
dist = track_distance(item, info, incl_artist=True)
candidates[info.track_id] = hooks.TrackMatch(dist, info)
# If this is a good match, then don't keep searching.
rec = _recommendation(_sort_candidates(candidates.values()))
if (
rec == Recommendation.strong
and not config["import"]["timid"]
):
log.debug("Track ID match.")
return Proposal(_sort_candidates(candidates.values()), rec)
# If we're searching by ID, don't proceed.
if search_ids:
if candidates:
assert rec is not None
return Proposal(_sort_candidates(candidates.values()), rec)
else:
return Proposal([], Recommendation.none)
# Search terms.
search_artist = search_artist or item.artist
search_title = search_title or item.title or item.filepath.stem
log.debug("Item search terms: {} - {}", search_artist, search_title)
# Replace empty string with None
if isinstance(search_artist, str) and search_artist.strip() == "":
search_artist = None
if isinstance(search_title, str) and search_title.strip() == "":
search_title = None
# Get and evaluate candidate metadata.
for track_info in metadata_plugins.item_candidates(
item, search_artist, search_title
):
dist = track_distance(item, track_info, incl_artist=True)
candidates[track_info.track_id] = hooks.TrackMatch(dist, track_info)
# Sort by distance and return with recommendation.
log.debug("Found {} candidates.", len(candidates))
candidates_sorted = _sort_candidates(candidates.values())
rec = _recommendation(candidates_sorted)
return Proposal(candidates_sorted, rec)
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
- Swap if/else branches ([`swap-if-else-branches`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/swap-if-else-branches/))
- Remove unnecessary else after guard condition ([`remove-unnecessary-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-unnecessary-else/))
- Low code quality found in tag\_item - 24% ([`low-code-quality`](https://docs.sourcery.ai/Reference/Default-Rules/comments/low-code-quality/))
<br/><details><summary>Explanation</summary>
The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.
How can you solve this?
It might be worth refactoring this function to make it shorter and more readable.
- Reduce the function length by extracting pieces of functionality out into
their own functions. This is the most important thing you can do - ideally a
function should be less than 10 lines.
- Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
sits together within the function rather than being scattered.</details>
</issue_to_address>
### Comment 4
<location> `beetsplug/discogs.py:190-193` </location>
<code_context>
def candidates(
self,
items: Sequence[Item],
artist: str | None,
album: str | None,
va_likely: bool,
) -> Iterable[AlbumInfo]:
query = ""
if artist is not None:
query += artist
if album is not None:
query += f" {album}"
if va_likely:
query = album or ""
query = query.strip()
if not query:
return []
return self.get_albums(query)
</code_context>
<issue_to_address>
**suggestion (code-quality):** We've found these issues:
- Lift code into else after jump in control flow ([`reintroduce-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/reintroduce-else/))
- Replace if statement with if expression ([`assign-if-exp`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/assign-if-exp/))
```suggestion
return [] if not query else self.get_albums(query)
```
</issue_to_address>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
I may be missing something, but it seems to me they have been correct? Both candidates functions always received strings - including empty ones. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the practical benefit of replacing empty strings by None
? I outlined a couple of issues with this approach.
I'm a bit confused about the 'why' here, especially so because this only applies to tracks/singletons and not albums
# Search terms. | ||
search_artist = search_artist or item.artist | ||
search_title = search_title or item.title | ||
search_title = search_title or item.title or item.filepath.stem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't have this here - if users want to use file names, they can enable the fromfilename
plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it doesn't really hurt here and makes this one edgecase a bit better. Some search results are better than none imo.
But don't get me wrong I agree that people are better of using the fromfilename
plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It changes the behaviour for external plugins that expect this string to be either the search string or the title
of the item.
if artist is not None: | ||
filters["artist"] = artist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how the introduction of None
brings more clarity - this now says 'any string is fine, including an empty one'.
On the other hand, if one is aware of tag_item
implementation, they know that None
just stands for an empty string.
Thus, I think, we should continue using strings and simply handle if not artist
etc. if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, using empty strings is the easier way. Will change it in the next commit.
To give you some context how we got here: I just fixed the mypy issues that followed from typing the args and kwargs in the item_candidates
function.
The main issue with the types stems from shadowing not working as expected for search_artists/titles
. Every other function down the line expected the search strings to be str | None
. The fix is straight forward, just didn't see this initially and thought item.title
can return a None
too 🤦
if artist is not None: | ||
filters["artist"] = artist | ||
|
||
if title is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a big change in behaviour - every metadata source handles absence of title
fine when artist
is given.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, both linked issues indicate that the issue is limited to musicbrainz
only and only when both artist
and title
are empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will probably change it in the next commit, but I don't think it is a big change. At least from the typehint perspective this is how it is supposed to be imo. We use NotRequired
for both the title
and artist
filter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, both linked issues indicate that the issue is limited to musicbrainz only and only when both artist and title are empty.
Maybe this is also a good reason for creating a proper general tests suit for all metadata
plugins. I will see if I can think about how to approach this properly.
self, item: Item, artist: str | None, title: str | None | ||
) -> Iterable[TrackInfo]: | ||
query = f"{artist} {title}" | ||
query = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can keep the same str
types and use the following instead
query = " ".join(map(str.strip, [artist, title]))
) -> Iterable[beets.autotag.hooks.TrackInfo]: | ||
criteria = {"artist": artist, "recording": title, "alias": title} | ||
criteria: dict[str, str] = {} | ||
if artist is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is redundant - _search_api
filters out empty values consistently for both singletons and albums. No change is needed here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly not. I tested it: If you for example input {"artist":None, "title":"something"}
the _search_api
will crash. But the logic should not matter if we properly support empty strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies - forgot to clarify that my comment applies only if we handle strings only 😅
filters = { | ||
k: _v for k, v in filters.items() if (_v := v.lower().strip()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot is confused now that None
s have entered the game - let's keep them out.
|
||
filters = { | ||
k: _v for k, v in filters.items() if (_v := v.lower().strip()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once you've reverted to handling empty strings, the only adjustment that this module needs is adding an early return here:
if not filters:
return []
see #6065 (comment) |
Description
Fixes an issue where empty metadata caused a crash in the import process.
Seems like the issue was not catched earlier, since the typehints were
not propagated correctly in the
metadata_plugin.item_candidates
function.Might have also been prevented with #5965 too.
closes #6060
Disclaimer
I opted to allow
None
forartist
andtitle
searches as this makes it clear for the plugins that these fieldsmight be empty. See more context here. This increases complexity for plugins a bit as they need to handle empty artists or titles gracefully.
We sadly need to still support cases where no metadata is given e.g. for the chroma plugin. Otherwise I would have just enforced that artist and title are always given.
The fix does seem a bit off to me. There may be a better way to approach this. Feedback and other ideas highly welcome!