-
-
Couldn't load subscription status.
- Fork 203
Feature: "watched_columns" speedup in more 10000 times for millions requests #606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Feature: "watched_columns" speedup in more 10000 times for millions requests #606
Conversation
|
@toluaina Good morning Sir. Waiting for your review ☀️, hope this feature will be useful for high load systems. |
|
Apologies for the delay. I needed some time to review this. (I've spent a bit of time in recent times fixing bugs with exclusions). Excluding the watched columns in this way has some implications. For example, with through tables you wouldn’t receive updates and could miss records. Your approach gave me an idea, so I’ve started a feature branch to explore it. Could you take a look and share your thoughts? |
|
Good evening! Thank you for answering.
We are skipping only on UPDATE operation. In common cases m2m tables for creating use INSERT and for removing DELETE. So even this table will contain any specific info like:
Or
We can set Maybe I missed some, but in our cases all okey with through tables.
Sure 👍🏻 |
|
@toluaina Good evening. I looked at your idea – it's very unusual. I didn't think something like this could be done with practically pure SQL. On the one hand, it's very cool and could work faster, but on the other hand, processing logic in Python feels more familiar to me.
Otherwise, foreign keys will be skipped unless the user explicitly specifies them. in my case with python way skipping, message NEW and OLD always PK, FK, and watched_columns (if set). |
First of all, thank you for this amazing project — it has been incredibly useful! 🙌
In our high-load system, we handle millions of updates per day, and during peak times Redis queues can contain hundreds of thousands of payloads.
A significant portion of these updates turned out to be redundant — especially in two cases:
• Many-to-many or relation tables frequently updated with the same data (no real value change).
• Large tables where only a few columns are relevant for the OpenSearch document, but updates to any column currently trigger a full document recalculation.
I introduced a simple improvement that helps avoid unnecessary re-indexing and can significantly boost performance in similar high-throughput environments:
• You can now define a watched_columns list inside the document model.
Only updates to these columns will trigger a document refresh in OpenSearch.
• Previously, if a table had 20 columns but only 2 of them were defined in columns, any update to the remaining 18 still caused a full reindex — even when the relevant data didn’t change.
• With watched_columns, this overhead is eliminated.
In our setup, this optimization reduced redundant updates by several orders of magnitude — even under peak load, the Redis queue rarely exceeds 10 messages.
It works especially well for frequent re-updates in M2M tables.
Backward compatibility
• Default behavior remains unchanged.
• If watched_columns is not defined, all updates will continue to trigger document recalculations as before.
Tests
• Added.
Docs
• README.rst: updated.
• HISTORY.rst: updated.