-
Notifications
You must be signed in to change notification settings - Fork 100
Add Parent Queue Quota Checking to Scheduler. #112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @shchennu 👋
Thanks for suggesting this change, I want to better understand the motivation here, more specifically what is the gap with IsJobOverQueueCapacity? From what I see it is called before PrePredicateFn so there is probably something that you identified as a gap in the implementation?
Another question, what is the problem with job allocation above parent's queue quota? If the job is preemptible it can be allowed.
Hi @romanbaron 👋 Thank you for your questions! Let me address them:
The changes ensure that:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
I left a few comments, mostly about style.
I do have a question about the logic: how will this handle elastic jobs who doesn't have to schedule all pods to run? The job has a minimal number of pods that it needs to run, and you don't know how many will actually fit in the cluster / queue limits.
| // OnSessionOpen is called when a new scheduling session begins. It registers | ||
| // the early quota checking function that prevents jobs from being considered | ||
| // for scheduling if they would exceed their parent queues' quotas. | ||
| func (cp *CapacityPolicy) OnSessionOpen(ssn *framework.Session) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I am missing something, but I don't think this is called by the changes in the current PR because this is not a plugin that is registered.
|
|
||
| // capacityCheckFn is a function type that checks if a job's requested resources | ||
| // exceed capacity limits. It returns a SchedulableResult indicating whether the | ||
| // job can be scheduled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its a style comment, because this is a new project and all, but we don't have many obvious comments here (the comment really doesn't add anything that is not written in the next line)
| // Only check parent queues, not the job's direct queue | ||
| currentQueueID := queue.ParentQueue | ||
|
|
||
| for currentQueueID != "" { | ||
| parentQueue, found := ssn.Queues[currentQueueID] | ||
| if !found { | ||
| break | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This together with line 239 can be one line of for with initialization, check and step
| } | ||
| } | ||
|
|
||
| // Check GPU quota |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this part seems duplicated 3 times, there are functions to avoid it in the code base. you can loop on the resource type and the . look at proportion/proportion.go for example.
| // Check GPU quota | ||
| if parentQueue.Resources.GPU.Quota > 0 && jobResources.GPUs() > float64(parentQueue.Resources.GPU.Quota) { | ||
| errorMsg := fmt.Sprintf( | ||
| "parent queue '%s' quota has reached the allowable limit of GPUs. "+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A queue can go over the quota, but not the limit. I would change the whole thing to address the limit.
| // queue quotas, while non-preemptible jobs must strictly adhere to quotas. | ||
| func (cp *CapacityPolicy) checkParentQueueQuotas(job *podgroup_info.PodGroupInfo, ssn *framework.Session) error { | ||
| // Skip quota checks for preemptible jobs | ||
| if job.Priority == constants.PriorityTrainNumber { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We aim to separate the job priority and preemptibility soon, and the Train priority is not the only one that is currently preemptible.
We do have several scalars to define the resource of a queue: quota, limit and over quota weight. We should look at limit here and not quota and then it will be correct for all jobs.
| // Register early quota checks | ||
| ssn.AddPrePredicateFn(func(task *pod_info.PodInfo, job *podgroup_info.PodGroupInfo) error { | ||
| // Only check for the first pending pod to avoid duplicate checks | ||
| firstPending := getFirstPendingPod(job) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the fact that the job needs scheduling (and then predicates are running on it) is not enough to understand that it has pending pods?
8b59cf5 to
301550c
Compare
Thanks for the review! I've addressed all the comments:
All tests are passing. Let me know if you'd like any further improvements! |
@shchennu I looked at |
|
@shchennu - all hierarchy levels are validated in these two places: IsJobOverQueueCapacity calls these two functions here: KAI-Scheduler/pkg/scheduler/plugins/proportion/capacity_policy/capacity_policy.go Line 35 in 2a0bacd
Doesn't it cover it? |
|
@shchennu - can you please update on your thoughts regarding the questions above? |
Add Parent Queue Quota and Limit Checking to KAI Scheduler
Description
This PR enhances the KAI Scheduler with comprehensive parent queue resource management, implementing both quota and limit checking at parent queue levels. The changes improve scheduler efficiency by preventing unnecessary scheduling attempts and provide better resource management for different job types.
Changes
checkParentQueueLimitsmethod toCapacityPolicyto check both limits and quotas at parent queue levelsgetFirstPendingPodhelper function to identify the first pending pod in a jobOnSessionOpento include early validation of resource limitsImplementation Details
Test Coverage
Files Changed
pkg/scheduler/plugins/proportion/capacity_policy/capacity_policy.gopkg/scheduler/plugins/proportion/capacity_policy/parent_queue_test.goTesting
All tests are passing:
Impact