Skip to content

Conversation

semohr
Copy link
Contributor

@semohr semohr commented Aug 10, 2025

Creating indexes turned out to be relatively straightforward!
While we can’t remove them yet, that doesn’t seem necessary for now. Interestingly, much of the infrastructure for database additions was already in place.

  • This PR introduces a new type, Index, which can be used to create an index on any defined table. 🎉
  • The items table now automatically registers an index on album_id

Closes #5809

@Copilot Copilot AI review requested due to automatic review settings August 10, 2025 21:25
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for creating database indices on arbitrary tables to improve query performance. The implementation introduces a new Index type and applies it to create an index on the album_id column of the items table.

  • Introduces a new Index type for defining database indices
  • Adds infrastructure to automatically create indices during table setup
  • Creates an index on album_id for the items table to speed up album-based queries

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
docs/changelog.rst Documents the new index creation for album_id queries
beets/library/models.py Adds index definition to the Item model
beets/dbcore/types.py Defines the new Index type
beets/dbcore/db.py Implements index creation logic in database setup

@semohr semohr force-pushed the indices branch 2 times, most recently from d51728f to 14b90b4 Compare August 10, 2025 21:29
@semohr semohr changed the title "Add support for creating indices on arbitrary database tables Add support for creating indices on arbitrary database tables Aug 10, 2025
Copy link

codecov bot commented Aug 12, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 66.51%. Comparing base (fcc9341) to head (cc94872).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5926      +/-   ##
==========================================
+ Coverage   66.48%   66.51%   +0.03%     
==========================================
  Files         117      117              
  Lines       18128    18148      +20     
  Branches     3071     3072       +1     
==========================================
+ Hits        12052    12072      +20     
  Misses       5422     5422              
  Partials      654      654              
Files with missing lines Coverage Δ
beets/dbcore/__init__.py 100.00% <100.00%> (ø)
beets/dbcore/db.py 94.83% <100.00%> (+0.18%) ⬆️
beets/library/models.py 87.20% <100.00%> (+0.02%) ⬆️
🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

columns = [row[2] for row in rows]
return cls(name, columns)

def __hash__(self) -> int:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A neat way to reduce the comparison complexity! We just may want to define it above the other methods

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the index that important? I always try to order by importance top down.

@snejus
Copy link
Member

snejus commented Aug 19, 2025

I just stumbled upon this function in the same module:

    def _make_attribute_table(self, flex_table: str):
        """Create a table and associated index for flexible attributes
        for the given entity (if they don't exist).
        """
        with self.transaction() as tx:
            tx.script(
                """
                CREATE TABLE IF NOT EXISTS {0} (
                    id INTEGER PRIMARY KEY,
                    entity_id INTEGER,
                    key TEXT,
                    value TEXT,
                    UNIQUE(entity_id, key) ON CONFLICT REPLACE);
                CREATE INDEX IF NOT EXISTS {0}_by_entity
                    ON {0} (entity_id);
                """.format(flex_table)
            )

I didn't realise we've already been creating indices for attributes tables!

As you mentioned, consistency is important, and it seems that now we use two methods to create indices. Looking at the simplicity of the above command, I'm now questioning whether we really need to depend on the entire complexity of checking what indices we already have in the database. Can't we get away with the following instead?

diff --git a/beets/dbcore/db.py b/beets/dbcore/db.py
index 2e0df3fec..a54dabd7f 100755
--- a/beets/dbcore/db.py
+++ b/beets/dbcore/db.py
@@ -1159,10 +1159,11 @@ def load_extension
 
     # Schema setup and migration.
 
-    def _make_table(self, table: str, fields: Mapping[str, types.Type]):
+    def _make_table(self, model_cls: Model):
         """Set up the schema of the database. `fields` is a mapping
         from field names to `Type`s. Columns are added if necessary.
         """
+        table, fields = model_cls._table, model_cls._fields
         # Get current schema.
         with self.transaction() as tx:
             rows = tx.query("PRAGMA table_info(%s)" % table)
@@ -1192,6 +1193,14 @@ def _make_table
                     table, name, typ.sql
                 )
 
+        create_idx_cmds = [
+            f"CREATE INDEX IF NOT EXISTS {i.name} ON {table} ({', '.join(i.columns)});"
+            for i in model_cls._indices
+        ]
+
+        setup_sql += "\n".join(create_idx_cmds)
+
         with self.transaction() as tx:
             tx.script(setup_sql)
 

@semohr
Copy link
Contributor Author

semohr commented Aug 19, 2025

I didn't realise we've already been creating indices for attributes tables!

I have seen that when creating the indices. For me the whole set of attribute tables seem a bit inconsistent in the first place as they feel like a bolt-on rather than a first-class part of the schema, more like a key/value store glued on top of the ORM than something that plays nicely with the rest of the core tables. Long term, I think they’d probably benefit from a bigger refactor to align with the rest of the core layer. That’s also why I originally ignored it 🙃


From what I understand, the core was originally meant to function like a lightweight ORM, with an eye toward adding more tables and features to beets in the future. If we only care about functionality, we could simplify things quite a bit in that layer. Personally, though, I’m fine with a bit of added complexity here since it fits with the original ORM idea.

The advantage of this approach is that it gives us a path to evolve the schema more cleanly, e.g. adding or migrating indices in a structured way, which isn’t really possible with your proposal. Once created they stay if we don't add additional logic. For me this comes down to whether we want to stick with the ORM-like abstraction in the core, or move toward a leaner, purely functional approach.

I don’t have a huge preference either way. I do plan to work on some schema migration in the future, which might benefit from the ORM-style complexity in general, but you’re right that it’s not urgent right now. How do you see this in the bigger picture?

@snejus
Copy link
Member

snejus commented Aug 19, 2025

See this blog post for more context regarding flexible attributes: https://beets.io/blog/flexattr.html. Certain implementation details can potentially be optimised, but the core concept is going to stay, I think.

How do you see this in the bigger picture?

I don't think I see how this relates to the bigger picture. I'm aware of the requirement:

we need to index items.album_id to optimise the lookup performance.

And I want to see the simplest solution that satisfies this requirement but doesn't create any tech debt and is easy to maintain.

@semohr
Copy link
Contributor Author

semohr commented Aug 19, 2025

See this blog post for more context regarding flexible attributes: https://beets.io/blog/flexattr.html. Certain implementation details can potentially be optimised, but the core concept is going to stay, I think.

I don’t disagree with having an attribute table in general I think you got that wrong, they’re useful as a key–value store. My concern is just that they shouldn’t be treated differently from “normal” tables in the core. A table is a table, regardless of its content. Especially now that we’ve also introduced indices on the albums table, the differences are even smaller. Treating them more uniformly might allow us to simplify some logic quite a bit.

I don't think I see how this relates to the bigger picture. I'm aware of the requirement:

I’m just trying to explore improvements here. For example, we could reuse the index data object on the attribute tables (and similarly in the migration function), or apply the same, or a similar, approach for field migrations in the future. That said, this doesn’t really fit into this PR, so I’m fine taking it one step at a time. If you prefer not to change existing code, that’s fine, but I like to maintain a codebase that I feel confident and happy working with. Adding features is much easier when the groundwork is set properly.

And I want to see the simplest solution that satisfies this requirement but doesn't create any tech debt and is easy to maintain

If the absolute main goal is just to optimize items.album_id lookup performance, then sure, we could hardcode the migration. However, this would reduce flexibility, make testing and reuse harder, and increase maintenance effort, essentially creating technical debt over time. There’s definitely a trade-off here.

@snejus snejus requested a review from wisp3rwind August 24, 2025 16:21
@snejus
Copy link
Member

snejus commented Aug 24, 2025

I'm largely fine with the changes but since this affects one of the key pieces of the code base I think we'd benefit from hearing what @wisp3rwind thinks about it as well.

Copy link
Member

@wisp3rwind wisp3rwind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small comments; feel free to ignore either of them if you don't agree.

I don't think there's much I can say about the design here; I don't have any real experience in dealing with databases. Thus, I also don't have anything like a clear vision on how dbcore should evolve.

In any case, this PR seems to deal with indices in very similar way as the existing codes does with columns, so I don't think it additionally limits our options in how to evolve the database code.

Comment on lines +805 to +812
def test_index_creation(self, db, sample_index):
"""Test creating an index and checking its existence."""
with db.transaction() as tx:
sample_index.recreate(tx, "test")
indexes = (
db._connection().execute("PRAGMA index_list(test)").fetchall()
)
assert any(sample_index.name in index for index in indexes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this test really add much value compared to test_from_db?

Comment on lines +828 to +831
different_index = Index(name="other_index", columns=("other_field",))
index_set.add(different_index)

assert len(index_set) == 2 # Should recognize distinct index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be really thorough with testing the hash, we might also test indices with differences in just one component like.

Index(name="sample_index", columns=("field_one",))
Index(name="sample_index", columns=("field_two",))
Index(name="sample_index", columns=("field_one", "field_two"))
Index(name="other_index", columns=("field_one",))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing Indexes in SQLite Database Causing Performance Bottlenecks
3 participants