Task deduplication #143
Replies: 2 comments 4 replies
-
I'm interested in the use case for this. When would a task be duplicated? Are you imagining it be a bug, or just caused by an impatient user (ie they click "download report" twice, and thus get 2 emails). I suspect the way to do this is to determine whether a similar task is in the queue, and just use that instead. You should be storing the task id somewhere outside of the queue anyway, so hopefully it should be reasonable simple to determine. |
Beta Was this translation helpful? Give feedback.
-
Duplicated tasks can happen just like you said; they just get triggered twice in one way or another. One of my projects serves images and videos in different resolutions, and if a specific resolution does not exist, I'll return the closest available one, but also trigger a task to do the conversion for the exact requested resolution. User hits reload before the conversion finishes -> duplicate task. The workaround (I see it as that) is of course to double check manually in the task or before triggering a task whether something is already queued up, or have some kind of bookkeeping, but I really think a task or message queue is much easier to use if it's fire-and-forget in a way.
Yeah. It should be some kind of id (which could be a hash of your own choosing) which would determine which tasks are "the same". Google Tasks for example does it like this, see https://cloud.google.com/tasks/docs/reference/rest/v2beta3/projects.locations.queues.tasks/create#body.request_body.FIELDS.task. Also I like to use task deduplication for debounces. I have a project where data is supposed to be synchronized to another system 5 minutes after the last change happened. I think just having a tasks queue which can deduplicate/debounce is easier than other solutions. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
One of the features I think the ideal task queue needs is task deduplication.
My use cases are usually that I either want to just "drop" the second duplicate task if the same task is already queued, or that I want to "replace" the first task with the second, so basically a debounce.
Same as for deferred/run_after tasks, I expect that some backends will not be able to do this, but for the default DB backend it should be fairly straight forward.
Does this make sense to others here?
Beta Was this translation helpful? Give feedback.
All reactions