Feature: "watched_columns" speedup in more 10000 times for millions requests #606

TerionGVS5 · 2025-10-05T08:17:03Z

First of all, thank you for this amazing project — it has been incredibly useful! 🙌

In our high-load system, we handle millions of updates per day, and during peak times Redis queues can contain hundreds of thousands of payloads.
A significant portion of these updates turned out to be redundant — especially in two cases:
• Many-to-many or relation tables frequently updated with the same data (no real value change).
• Large tables where only a few columns are relevant for the OpenSearch document, but updates to any column currently trigger a full document recalculation.

I introduced a simple improvement that helps avoid unnecessary re-indexing and can significantly boost performance in similar high-throughput environments:
• You can now define a watched_columns list inside the document model.
Only updates to these columns will trigger a document refresh in OpenSearch.
• Previously, if a table had 20 columns but only 2 of them were defined in columns, any update to the remaining 18 still caused a full reindex — even when the relevant data didn’t change.
• With watched_columns, this overhead is eliminated.

In our setup, this optimization reduced redundant updates by several orders of magnitude — even under peak load, the Redis queue rarely exceeds 10 messages.
It works especially well for frequent re-updates in M2M tables.

Backward compatibility
• Default behavior remains unchanged.
• If watched_columns is not defined, all updates will continue to trigger document recalculations as before.

Tests
• Added.

Docs
• README.rst: updated.
• HISTORY.rst: updated.

TerionGVS5 · 2025-10-09T06:16:11Z

@toluaina Good morning Sir. Waiting for your review ☀️, hope this feature will be useful for high load systems.

toluaina · 2025-10-23T16:10:45Z

Apologies for the delay. I needed some time to review this. (I've spent a bit of time in recent times fixing bugs with exclusions). Excluding the watched columns in this way has some implications. For example, with through tables you wouldn’t receive updates and could miss records.

Your approach gave me an idea, so I’ve started a feature branch to explore it. Could you take a look and share your thoughts?

TerionGVS5 · 2025-10-23T16:38:37Z

Good evening! Thank you for answering.

For example, with through tables you wouldn’t receive updates and could miss records.

We are skipping only on UPDATE operation. In common cases m2m tables for creating use INSERT and for removing DELETE. So even this table will contain any specific info like:

user_id (FK)	role_id (FK)	expires_at
1	2	23-12-2025
1	3	23-12-2025
2	3	20-12-2025

Or

id (PK)	user_id (FK)	role_id (FK)	expires_at
1	1	2	23-12-2025
2	1	3	23-12-2025
3	2	3	20-12-2025

We can set watched_columns with ["user_id", "role_id", "expires_at"] and then we will push update to redis, pull it and update opensearch document only on real update with data changing. Or we can not set watched_columns and all will be work as before.

Maybe I missed some, but in our cases all okey with through tables.

Could you take a look and share your thoughts?

Sure 👍🏻

TerionGVS5 · 2025-10-24T14:12:26Z

@toluaina Good evening. I looked at your idea – it's very unusual. I didn't think something like this could be done with practically pure SQL. On the one hand, it's very cool and could work faster, but on the other hand, processing logic in Python feels more familiar to me.
It also seems that in the current SQL version, if the user has not specified anything in the columns, then any update will be ignored/skipped, which may be a little non-obvious to the average user.
plus something like this comes to mind:

WHERE n.key = ANY(_columns || _foreign_keys || _primary_keys)

Otherwise, foreign keys will be skipped unless the user explicitly specifies them.
although of course I could be wrong.

in my case with python way skipping, message NEW and OLD always PK, FK, and watched_columns (if set).
Therefore, the user doesn't need to specify primary or foreign keys in this attribute; it's enough to specify additional fields that we look at during processing. Less thinking for the user means less likelihood of unexpected behavior.

TerionGVS5 added 5 commits October 4, 2025 18:16

add watched_columns feature

458a780

update existing tests

d671853

add new tests for watched_columns

e9c806d

add doc for watched_columns

f1220de

Merge branch 'main' into feature/watched_columns_speedup

2148ef8

TerionGVS5 changed the title ~~Feature/watched columns speedup~~ Feature: "watched_columns" speedup in more 10000 times for millions requests Oct 9, 2025

TerionGVS5 added 3 commits October 9, 2025 23:26

return spaces back README.md

e958a14

return spaces back README.md

14518b7

return spaces back README.rst

3fb54f6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Feature: "watched_columns" speedup in more 10000 times for millions requests #606

Feature: "watched_columns" speedup in more 10000 times for millions requests #606

Uh oh!

TerionGVS5 commented Oct 5, 2025

Uh oh!

TerionGVS5 commented Oct 9, 2025

Uh oh!

toluaina commented Oct 23, 2025

Uh oh!

TerionGVS5 commented Oct 23, 2025

Uh oh!

TerionGVS5 commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

Feature: "watched_columns" speedup in more 10000 times for millions requests #606

Are you sure you want to change the base?

Feature: "watched_columns" speedup in more 10000 times for millions requests #606

Uh oh!

Conversation

TerionGVS5 commented Oct 5, 2025

Uh oh!

TerionGVS5 commented Oct 9, 2025

Uh oh!

toluaina commented Oct 23, 2025

Uh oh!

TerionGVS5 commented Oct 23, 2025

Uh oh!

TerionGVS5 commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants