-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Add support for creating indices on arbitrary database tables #5926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for creating database indices on arbitrary tables to improve query performance. The implementation introduces a new Index
type and applies it to create an index on the album_id
column of the items
table.
- Introduces a new
Index
type for defining database indices - Adds infrastructure to automatically create indices during table setup
- Creates an index on
album_id
for theitems
table to speed up album-based queries
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
File | Description |
---|---|
docs/changelog.rst | Documents the new index creation for album_id queries |
beets/library/models.py | Adds index definition to the Item model |
beets/dbcore/types.py | Defines the new Index type |
beets/dbcore/db.py | Implements index creation logic in database setup |
d51728f
to
14b90b4
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #5926 +/- ##
==========================================
+ Coverage 66.48% 66.51% +0.03%
==========================================
Files 117 117
Lines 18128 18148 +20
Branches 3071 3072 +1
==========================================
+ Hits 12052 12072 +20
Misses 5422 5422
Partials 654 654
🚀 New features to boost your workflow:
|
function. Moved Index into db.py from types.py
columns = [row[2] for row in rows] | ||
return cls(name, columns) | ||
|
||
def __hash__(self) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A neat way to reduce the comparison complexity! We just may want to define it above the other methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the index that important? I always try to order by importance top down.
I just stumbled upon this function in the same module: def _make_attribute_table(self, flex_table: str):
"""Create a table and associated index for flexible attributes
for the given entity (if they don't exist).
"""
with self.transaction() as tx:
tx.script(
"""
CREATE TABLE IF NOT EXISTS {0} (
id INTEGER PRIMARY KEY,
entity_id INTEGER,
key TEXT,
value TEXT,
UNIQUE(entity_id, key) ON CONFLICT REPLACE);
CREATE INDEX IF NOT EXISTS {0}_by_entity
ON {0} (entity_id);
""".format(flex_table)
) I didn't realise we've already been creating indices for attributes tables! As you mentioned, consistency is important, and it seems that now we use two methods to create indices. Looking at the simplicity of the above command, I'm now questioning whether we really need to depend on the entire complexity of checking what indices we already have in the database. Can't we get away with the following instead? diff --git a/beets/dbcore/db.py b/beets/dbcore/db.py
index 2e0df3fec..a54dabd7f 100755
--- a/beets/dbcore/db.py
+++ b/beets/dbcore/db.py
@@ -1159,10 +1159,11 @@ def load_extension
# Schema setup and migration.
- def _make_table(self, table: str, fields: Mapping[str, types.Type]):
+ def _make_table(self, model_cls: Model):
"""Set up the schema of the database. `fields` is a mapping
from field names to `Type`s. Columns are added if necessary.
"""
+ table, fields = model_cls._table, model_cls._fields
# Get current schema.
with self.transaction() as tx:
rows = tx.query("PRAGMA table_info(%s)" % table)
@@ -1192,6 +1193,14 @@ def _make_table
table, name, typ.sql
)
+ create_idx_cmds = [
+ f"CREATE INDEX IF NOT EXISTS {i.name} ON {table} ({', '.join(i.columns)});"
+ for i in model_cls._indices
+ ]
+
+ setup_sql += "\n".join(create_idx_cmds)
+
with self.transaction() as tx:
tx.script(setup_sql)
|
I have seen that when creating the indices. For me the whole set of attribute tables seem a bit inconsistent in the first place as they feel like a bolt-on rather than a first-class part of the schema, more like a key/value store glued on top of the ORM than something that plays nicely with the rest of the core tables. Long term, I think they’d probably benefit from a bigger refactor to align with the rest of the core layer. That’s also why I originally ignored it 🙃 From what I understand, the core was originally meant to function like a lightweight ORM, with an eye toward adding more tables and features to beets in the future. If we only care about functionality, we could simplify things quite a bit in that layer. Personally, though, I’m fine with a bit of added complexity here since it fits with the original ORM idea. The advantage of this approach is that it gives us a path to evolve the schema more cleanly, e.g. adding or migrating indices in a structured way, which isn’t really possible with your proposal. Once created they stay if we don't add additional logic. For me this comes down to whether we want to stick with the ORM-like abstraction in the core, or move toward a leaner, purely functional approach. I don’t have a huge preference either way. I do plan to work on some schema migration in the future, which might benefit from the ORM-style complexity in general, but you’re right that it’s not urgent right now. How do you see this in the bigger picture? |
See this blog post for more context regarding flexible attributes: https://beets.io/blog/flexattr.html. Certain implementation details can potentially be optimised, but the core concept is going to stay, I think.
I don't think I see how this relates to the bigger picture. I'm aware of the requirement: we need to index And I want to see the simplest solution that satisfies this requirement but doesn't create any tech debt and is easy to maintain. |
I don’t disagree with having an attribute table in general I think you got that wrong, they’re useful as a key–value store. My concern is just that they shouldn’t be treated differently from “normal” tables in the core. A table is a table, regardless of its content. Especially now that we’ve also introduced indices on the albums table, the differences are even smaller. Treating them more uniformly might allow us to simplify some logic quite a bit.
I’m just trying to explore improvements here. For example, we could reuse the index data object on the attribute tables (and similarly in the migration function), or apply the same, or a similar, approach for field migrations in the future. That said, this doesn’t really fit into this PR, so I’m fine taking it one step at a time. If you prefer not to change existing code, that’s fine, but I like to maintain a codebase that I feel confident and happy working with. Adding features is much easier when the groundwork is set properly.
If the absolute main goal is just to optimize |
I'm largely fine with the changes but since this affects one of the key pieces of the code base I think we'd benefit from hearing what @wisp3rwind thinks about it as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two small comments; feel free to ignore either of them if you don't agree.
I don't think there's much I can say about the design here; I don't have any real experience in dealing with databases. Thus, I also don't have anything like a clear vision on how dbcore
should evolve.
In any case, this PR seems to deal with indices in very similar way as the existing codes does with columns, so I don't think it additionally limits our options in how to evolve the database code.
def test_index_creation(self, db, sample_index): | ||
"""Test creating an index and checking its existence.""" | ||
with db.transaction() as tx: | ||
sample_index.recreate(tx, "test") | ||
indexes = ( | ||
db._connection().execute("PRAGMA index_list(test)").fetchall() | ||
) | ||
assert any(sample_index.name in index for index in indexes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this test really add much value compared to test_from_db
?
different_index = Index(name="other_index", columns=("other_field",)) | ||
index_set.add(different_index) | ||
|
||
assert len(index_set) == 2 # Should recognize distinct index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be really thorough with testing the hash, we might also test indices with differences in just one component like.
Index(name="sample_index", columns=("field_one",))
Index(name="sample_index", columns=("field_two",))
Index(name="sample_index", columns=("field_one", "field_two"))
Index(name="other_index", columns=("field_one",))
Creating indexes turned out to be relatively straightforward!
While we can’t remove them yet, that doesn’t seem necessary for now. Interestingly, much of the infrastructure for database additions was already in place.
Index
, which can be used to create an index on any defined table. 🎉items
table now automatically registers an index onalbum_id
Closes #5809