Add support for creating indices on arbitrary database tables #5926

semohr · 2025-08-10T21:25:01Z

Creating indexes turned out to be relatively straightforward!
While we can’t remove them yet, that doesn’t seem necessary for now. Interestingly, much of the infrastructure for database additions was already in place.

This PR introduces a new type, Index, which can be used to create an index on any defined table. 🎉
The items table now automatically registers an index on album_id

Closes #5809

Copilot

Pull Request Overview

This PR adds support for creating database indices on arbitrary tables to improve query performance. The implementation introduces a new Index type and applies it to create an index on the album_id column of the items table.

Introduces a new Index type for defining database indices
Adds infrastructure to automatically create indices during table setup
Creates an index on album_id for the items table to speed up album-based queries

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
docs/changelog.rst	Documents the new index creation for album_id queries
beets/library/models.py	Adds index definition to the Item model
beets/dbcore/types.py	Defines the new Index type
beets/dbcore/db.py	Implements index creation logic in database setup

docs/changelog.rst

beets/dbcore/types.py

beets/dbcore/db.py

beets/dbcore/types.py

docs/changelog.rst

beets/dbcore/db.py

codecov · 2025-08-12T13:05:21Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 66.51%. Comparing base (fcc9341) to head (cc94872).
✅ All tests successful. No failed tests found.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5926      +/-   ##
==========================================
+ Coverage   66.48%   66.51%   +0.03%     
==========================================
  Files         117      117              
  Lines       18128    18148      +20     
  Branches     3071     3072       +1     
==========================================
+ Hits        12052    12072      +20     
  Misses       5422     5422              
  Partials      654      654

Files with missing lines	Coverage Δ
beets/dbcore/__init__.py	`100.00% <100.00%> (ø)`
beets/dbcore/db.py	`94.83% <100.00%> (+0.18%)`	⬆️
beets/library/models.py	`87.20% <100.00%> (+0.02%)`	⬆️

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

function. Moved Index into db.py from types.py

beets/dbcore/db.py

beets/library/models.py

beets/dbcore/db.py

snejus · 2025-08-13T03:22:02Z

beets/dbcore/db.py

+        columns = [row[2] for row in rows]
+        return cls(name, columns)
+
+    def __hash__(self) -> int:


A neat way to reduce the comparison complexity! We just may want to define it above the other methods

Is the index that important? I always try to order by importance top down.

beets/dbcore/db.py

snejus · 2025-08-19T13:54:41Z

I just stumbled upon this function in the same module:

    def _make_attribute_table(self, flex_table: str):
        """Create a table and associated index for flexible attributes
        for the given entity (if they don't exist).
        """
        with self.transaction() as tx:
            tx.script(
                """
                CREATE TABLE IF NOT EXISTS {0} (
                    id INTEGER PRIMARY KEY,
                    entity_id INTEGER,
                    key TEXT,
                    value TEXT,
                    UNIQUE(entity_id, key) ON CONFLICT REPLACE);
                CREATE INDEX IF NOT EXISTS {0}_by_entity
                    ON {0} (entity_id);
                """.format(flex_table)
            )

I didn't realise we've already been creating indices for attributes tables!

As you mentioned, consistency is important, and it seems that now we use two methods to create indices. Looking at the simplicity of the above command, I'm now questioning whether we really need to depend on the entire complexity of checking what indices we already have in the database. Can't we get away with the following instead?

diff --git a/beets/dbcore/db.py b/beets/dbcore/db.py
index 2e0df3fec..a54dabd7f 100755
--- a/beets/dbcore/db.py
+++ b/beets/dbcore/db.py
@@ -1159,10 +1159,11 @@ def load_extension
 
     # Schema setup and migration.
 
-    def _make_table(self, table: str, fields: Mapping[str, types.Type]):
+    def _make_table(self, model_cls: Model):
         """Set up the schema of the database. `fields` is a mapping
         from field names to `Type`s. Columns are added if necessary.
         """
+        table, fields = model_cls._table, model_cls._fields
         # Get current schema.
         with self.transaction() as tx:
             rows = tx.query("PRAGMA table_info(%s)" % table)
@@ -1192,6 +1193,14 @@ def _make_table
                     table, name, typ.sql
                 )
 
+        create_idx_cmds = [
+            f"CREATE INDEX IF NOT EXISTS {i.name} ON {table} ({', '.join(i.columns)});"
+            for i in model_cls._indices
+        ]
+
+        setup_sql += "\n".join(create_idx_cmds)
+
         with self.transaction() as tx:
             tx.script(setup_sql)

semohr · 2025-08-19T14:14:16Z

I didn't realise we've already been creating indices for attributes tables!

I have seen that when creating the indices. For me the whole set of attribute tables seem a bit inconsistent in the first place as they feel like a bolt-on rather than a first-class part of the schema, more like a key/value store glued on top of the ORM than something that plays nicely with the rest of the core tables. Long term, I think they’d probably benefit from a bigger refactor to align with the rest of the core layer. That’s also why I originally ignored it 🙃

From what I understand, the core was originally meant to function like a lightweight ORM, with an eye toward adding more tables and features to beets in the future. If we only care about functionality, we could simplify things quite a bit in that layer. Personally, though, I’m fine with a bit of added complexity here since it fits with the original ORM idea.

The advantage of this approach is that it gives us a path to evolve the schema more cleanly, e.g. adding or migrating indices in a structured way, which isn’t really possible with your proposal. Once created they stay if we don't add additional logic. For me this comes down to whether we want to stick with the ORM-like abstraction in the core, or move toward a leaner, purely functional approach.

I don’t have a huge preference either way. I do plan to work on some schema migration in the future, which might benefit from the ORM-style complexity in general, but you’re right that it’s not urgent right now. How do you see this in the bigger picture?

snejus · 2025-08-19T16:33:57Z

See this blog post for more context regarding flexible attributes: https://beets.io/blog/flexattr.html. Certain implementation details can potentially be optimised, but the core concept is going to stay, I think.

How do you see this in the bigger picture?

I don't think I see how this relates to the bigger picture. I'm aware of the requirement:

we need to index items.album_id to optimise the lookup performance.

And I want to see the simplest solution that satisfies this requirement but doesn't create any tech debt and is easy to maintain.

semohr · 2025-08-19T17:31:44Z

See this blog post for more context regarding flexible attributes: https://beets.io/blog/flexattr.html. Certain implementation details can potentially be optimised, but the core concept is going to stay, I think.

I don’t disagree with having an attribute table in general I think you got that wrong, they’re useful as a key–value store. My concern is just that they shouldn’t be treated differently from “normal” tables in the core. A table is a table, regardless of its content. Especially now that we’ve also introduced indices on the albums table, the differences are even smaller. Treating them more uniformly might allow us to simplify some logic quite a bit.

I don't think I see how this relates to the bigger picture. I'm aware of the requirement:

I’m just trying to explore improvements here. For example, we could reuse the index data object on the attribute tables (and similarly in the migration function), or apply the same, or a similar, approach for field migrations in the future. That said, this doesn’t really fit into this PR, so I’m fine taking it one step at a time. If you prefer not to change existing code, that’s fine, but I like to maintain a codebase that I feel confident and happy working with. Adding features is much easier when the groundwork is set properly.

And I want to see the simplest solution that satisfies this requirement but doesn't create any tech debt and is easy to maintain

If the absolute main goal is just to optimize items.album_id lookup performance, then sure, we could hardcode the migration. However, this would reduce flexibility, make testing and reuse harder, and increase maintenance effort, essentially creating technical debt over time. There’s definitely a trade-off here.

snejus · 2025-08-24T16:24:36Z

I'm largely fine with the changes but since this affects one of the key pieces of the code base I think we'd benefit from hearing what @wisp3rwind thinks about it as well.

wisp3rwind

Two small comments; feel free to ignore either of them if you don't agree.

I don't think there's much I can say about the design here; I don't have any real experience in dealing with databases. Thus, I also don't have anything like a clear vision on how dbcore should evolve.

In any case, this PR seems to deal with indices in very similar way as the existing codes does with columns, so I don't think it additionally limits our options in how to evolve the database code.

wisp3rwind · 2025-09-11T15:22:09Z

test/test_dbcore.py

+    def test_index_creation(self, db, sample_index):
+        """Test creating an index and checking its existence."""
+        with db.transaction() as tx:
+            sample_index.recreate(tx, "test")
+            indexes = (
+                db._connection().execute("PRAGMA index_list(test)").fetchall()
+            )
+        assert any(sample_index.name in index for index in indexes)


Does this test really add much value compared to test_from_db?

wisp3rwind · 2025-09-11T15:27:24Z

test/test_dbcore.py

+        different_index = Index(name="other_index", columns=("other_field",))
+        index_set.add(different_index)
+
+        assert len(index_set) == 2  # Should recognize distinct index


To be really thorough with testing the hash, we might also test indices with differences in just one component like.

Index(name="sample_index", columns=("field_one",)) Index(name="sample_index", columns=("field_two",)) Index(name="sample_index", columns=("field_one", "field_two")) Index(name="other_index", columns=("field_one",))

Copilot AI review requested due to automatic review settings August 10, 2025 21:25

Copilot AI reviewed Aug 10, 2025

View reviewed changes

docs/changelog.rst Outdated Show resolved Hide resolved

beets/dbcore/types.py Outdated Show resolved Hide resolved

beets/dbcore/db.py Show resolved Hide resolved

semohr force-pushed the indices branch 2 times, most recently from d51728f to 14b90b4 Compare August 10, 2025 21:29

semohr changed the title ~~"Add support for creating indices on arbitrary database tables~~ Add support for creating indices on arbitrary database tables Aug 10, 2025

snejus requested changes Aug 11, 2025

View reviewed changes

semohr force-pushed the indices branch from 40fee27 to 6961d40 Compare August 12, 2025 13:02

semohr added 4 commits August 12, 2025 15:06

Added possibility to add indices to arbitrary models.

4a88529

typo changelog

78454cd

Reverted some changes to _make_table function and added a _make_indices

f38f845

function. Moved Index into db.py from types.py

Changelog

4073c11

semohr force-pushed the indices branch from 6961d40 to 4073c11 Compare August 12, 2025 13:07

Fixed test

324a99f

semohr force-pushed the indices branch from e25fd6e to 324a99f Compare August 12, 2025 13:15

semohr requested a review from snejus August 12, 2025 13:17

snejus requested changes Aug 13, 2025

View reviewed changes

simplified migration, one line indices for albums.

cc94872

snejus requested a review from wisp3rwind August 24, 2025 16:21

wisp3rwind reviewed Sep 11, 2025

View reviewed changes

Add support for creating indices on arbitrary database tables #5926

Are you sure you want to change the base?

Add support for creating indices on arbitrary database tables #5926

Conversation

semohr commented Aug 10, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

snejus Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

semohr Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

snejus commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

semohr commented Aug 19, 2025

Uh oh!

snejus commented Aug 19, 2025

Uh oh!

semohr commented Aug 19, 2025

Uh oh!

snejus commented Aug 24, 2025

Uh oh!

wisp3rwind left a comment

Choose a reason for hiding this comment

Uh oh!

wisp3rwind Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

wisp3rwind Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Aug 12, 2025 •

edited

Loading

snejus commented Aug 19, 2025 •

edited

Loading