Skip to content

Conversation

@TerionGVS5
Copy link

First of all, thank you for this amazing project — it has been incredibly useful! 🙌

In our high-load system, we handle millions of updates per day, and during peak times Redis queues can contain hundreds of thousands of payloads.
A significant portion of these updates turned out to be redundant — especially in two cases:
• Many-to-many or relation tables frequently updated with the same data (no real value change).
• Large tables where only a few columns are relevant for the OpenSearch document, but updates to any column currently trigger a full document recalculation.

I introduced a simple improvement that helps avoid unnecessary re-indexing and can significantly boost performance in similar high-throughput environments:
• You can now define a watched_columns list inside the document model.
Only updates to these columns will trigger a document refresh in OpenSearch.
• Previously, if a table had 20 columns but only 2 of them were defined in columns, any update to the remaining 18 still caused a full reindex — even when the relevant data didn’t change.
• With watched_columns, this overhead is eliminated.

In our setup, this optimization reduced redundant updates by several orders of magnitude — even under peak load, the Redis queue rarely exceeds 10 messages.
It works especially well for frequent re-updates in M2M tables.

Backward compatibility
• Default behavior remains unchanged.
• If watched_columns is not defined, all updates will continue to trigger document recalculations as before.

Tests
• Added.

Docs
• README.rst: updated.
• HISTORY.rst: updated.

@TerionGVS5 TerionGVS5 changed the title Feature/watched columns speedup Feature: "watched_columns" speedup in more 10000 times for millions requests Oct 9, 2025
@TerionGVS5
Copy link
Author

@toluaina Good morning Sir. Waiting for your review ☀️, hope this feature will be useful for high load systems.

@toluaina
Copy link
Owner

Apologies for the delay. I needed some time to review this. (I've spent a bit of time in recent times fixing bugs with exclusions). Excluding the watched columns in this way has some implications. For example, with through tables you wouldn’t receive updates and could miss records.

Your approach gave me an idea, so I’ve started a feature branch to explore it. Could you take a look and share your thoughts?

@TerionGVS5
Copy link
Author

Good evening! Thank you for answering.

For example, with through tables you wouldn’t receive updates and could miss records.

We are skipping only on UPDATE operation. In common cases m2m tables for creating use INSERT and for removing DELETE. So even this table will contain any specific info like:

user_id (FK) role_id (FK) expires_at
1 2 23-12-2025
1 3 23-12-2025
2 3 20-12-2025

Or

id (PK) user_id (FK) role_id (FK) expires_at
1 1 2 23-12-2025
2 1 3 23-12-2025
3 2 3 20-12-2025

We can set watched_columns with ["user_id", "role_id", "expires_at"] and then we will push update to redis, pull it and update opensearch document only on real update with data changing. Or we can not set watched_columns and all will be work as before.

Maybe I missed some, but in our cases all okey with through tables.

Could you take a look and share your thoughts?

Sure 👍🏻

@TerionGVS5
Copy link
Author

@toluaina Good evening. I looked at your idea – it's very unusual. I didn't think something like this could be done with practically pure SQL. On the one hand, it's very cool and could work faster, but on the other hand, processing logic in Python feels more familiar to me.
It also seems that in the current SQL version, if the user has not specified anything in the columns, then any update will be ignored/skipped, which may be a little non-obvious to the average user.
plus something like this comes to mind:

WHERE n.key = ANY(_columns || _foreign_keys || _primary_keys)

Otherwise, foreign keys will be skipped unless the user explicitly specifies them.
although of course I could be wrong.

in my case with python way skipping, message NEW and OLD always PK, FK, and watched_columns (if set).
Therefore, the user doesn't need to specify primary or foreign keys in this attribute; it's enough to specify additional fields that we look at during processing. Less thinking for the user means less likelihood of unexpected behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants