Replies: 2 comments 2 replies
-
I think you are mixing Parsing with Scheduling. Parsing is done by dag file processor (which might be embedded in Scheduler or standalone). For parsing - you can use standalone dag processor (the only option with Airflow 3 actually) and see if it helps, I believe 2.11 version had option to have separate dag file processor per directory. Also 6000 Dags does not seem like many, and it simply migh tbe that you have slow parsing. Any network traffic, database access at the top-level parsing of your Dag files is no-no. Optimising parsing time is a good idea and best practice for Airflow. Look at "best practices" in Airflow docs for various optimisation techniques. If it is about scheduling then multiple schedulers are doing auto-sharding internally (using SKIP LOCKED) - so I am not sure if that is your issue. In either case - after you explore standalone dag file processors., you might want to look at recent devlist discussions (you will find links in "community" tab of airflow website) - there are some starvation scenarios being discussed there, it might be one of those - so you might chime in in the discussion. |
Beta Was this translation helpful? Give feedback.
-
@salimuddin07 - please stop posting those generated, verbose AI generated answers that duplicate answers already given. They are adding noise. If you want to add value by generated answers, please do so (reviewing things before), but duplicating explanations is wrong and causes extra effort from people to read the same thigns differently formatted. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi there. I use 2.11.0 with celery executor. I have reached the stage of 6000 dags on my cluster and my scheduler loop basically takes now enormous amount of time slowing down the task scheduling/turnaround. I had multiple schedulers, played with max dagrun settings, sort mode, but I feel it's time to shard the work each scheduler gets. Would saving dags to independent folders externally and point different schedulers on separate folders help reduce the burden of "parsing entire tons of dags every single scheduler loop"? I don't want to deploy extra airflow clusters and explicitly interested in having one big one.
Beta Was this translation helpful? Give feedback.
All reactions