How to use sharding for schedulers? #56294

dimon222 · 2025-10-01T13:49:36Z

dimon222
Oct 1, 2025

Hi there. I use 2.11.0 with celery executor. I have reached the stage of 6000 dags on my cluster and my scheduler loop basically takes now enormous amount of time slowing down the task scheduling/turnaround. I had multiple schedulers, played with max dagrun settings, sort mode, but I feel it's time to shard the work each scheduler gets. Would saving dags to independent folders externally and point different schedulers on separate folders help reduce the burden of "parsing entire tons of dags every single scheduler loop"? I don't want to deploy extra airflow clusters and explicitly interested in having one big one.

potiuk · 2025-10-03T06:50:03Z

potiuk
Oct 3, 2025
Collaborator

I think you are mixing Parsing with Scheduling. Parsing is done by dag file processor (which might be embedded in Scheduler or standalone).

For parsing - you can use standalone dag processor (the only option with Airflow 3 actually) and see if it helps, I believe 2.11 version had option to have separate dag file processor per directory. Also 6000 Dags does not seem like many, and it simply migh tbe that you have slow parsing. Any network traffic, database access at the top-level parsing of your Dag files is no-no.

Optimising parsing time is a good idea and best practice for Airflow. Look at "best practices" in Airflow docs for various optimisation techniques.

If it is about scheduling then multiple schedulers are doing auto-sharding internally (using SKIP LOCKED) - so I am not sure if that is your issue. In either case - after you explore standalone dag file processors., you might want to look at recent devlist discussions (you will find links in "community" tab of airflow website) - there are some starvation scenarios being discussed there, it might be one of those - so you might chime in in the discussion.

2 replies

dimon222 Oct 5, 2025
Author

Thank you for this suggestion, I missed the memo regarding standalone dag processor support. Now that I think about it, it might be something towards solution to my problem. I noticed that generally airflow pattern expects recurring reparsing of directory for serialized dags setting in DB. I realized that I control what goes into my dag folder using my own middleware REST, so can construct on-demand reloading potentially removing necessity to run dag processor as recurring process altogether, but it seems that min_serialized_dag_update_interval cannot be set to 0/infinite to never expire my rows so that I tell exactly when to refresh serialized entry.

So potential solution might incorporate:

Running dag processor separately from everything on-demand and/or at REST when any CRUD on dag code happens or Airflow version upgrade needs to happen.
min_serialized_dag_update_interval needs to be set ideally to infinite, but due to current limitations just maximum possible for this value.
Scheduler or multiple of them in random seed mode should only run without dag processor enabled.
More sophisticated sharding when need to do sharding for schedulers (not dag processors) will likely require custom sort modes implemented but might be unnecessary since most of pain of initial problem is the dag processor heaviness.

Ideally It would have been great if dag processor could remember checksums of dag codes to avoid rows in db/webservers when it doesn't technically change this way minimize I/O to database unnecessarily. This might be another elegant way to improve the case.

potiuk Oct 6, 2025
Collaborator

In Airflow 3, there are dag bundles implemented and they are supposed in the future have configurable scan interval per bundle (AIP-66) - https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=294816356. Also there is a discussion in progress on work-in-progress AIP-65 https://cwiki.apache.org/confluence/display/AIRFLOW/%5BWIP%5D+AIP-85+Extendable+DAG+parsing+controls - that further builds on top of AIP-66.

If you would like to contribute to completing the proposal and implementing it - feel free to chime in, continue the discussion there (you will find links in the docs)

potiuk · 2025-10-03T09:48:22Z

potiuk
Oct 3, 2025
Collaborator

@salimuddin07 - please stop posting those generated, verbose AI generated answers that duplicate answers already given. They are adding noise. If you want to add value by generated answers, please do so (reviewing things before), but duplicating explanations is wrong and causes extra effort from people to read the same thigns differently formatted.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to use sharding for schedulers? #56294

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to use sharding for schedulers? #56294

Uh oh!

dimon222 Oct 1, 2025

Replies: 2 comments · 2 replies

Uh oh!

potiuk Oct 3, 2025 Collaborator

Uh oh!

Uh oh!

dimon222 Oct 5, 2025 Author

Uh oh!

potiuk Oct 6, 2025 Collaborator

Uh oh!

potiuk Oct 3, 2025 Collaborator

dimon222
Oct 1, 2025

Replies: 2 comments 2 replies

potiuk
Oct 3, 2025
Collaborator

dimon222 Oct 5, 2025
Author

potiuk Oct 6, 2025
Collaborator

potiuk
Oct 3, 2025
Collaborator