Add Parent Queue Quota Checking to Scheduler. #112

shchennu · 2025-05-01T18:27:11Z

Add Parent Queue Quota and Limit Checking to KAI Scheduler

Description

This PR enhances the KAI Scheduler with comprehensive parent queue resource management, implementing both quota and limit checking at parent queue levels. The changes improve scheduler efficiency by preventing unnecessary scheduling attempts and provide better resource management for different job types.

Changes

Added checkParentQueueLimits method to CapacityPolicy to check both limits and quotas at parent queue levels
Added getFirstPendingPod helper function to identify the first pending pod in a job
Modified OnSessionOpen to include early validation of resource limits
Added comprehensive test cases for parent queue limit checking
Added support for elastic jobs with minimum required resources
Separated job priority from preemptibility handling

Implementation Details

Resource checks are performed for GPU, CPU, and Memory resources
Both limits and quotas are checked, using the more restrictive value
Checks are done at each parent queue level in the hierarchy
Early validation prevents unnecessary scheduling attempts
Proper error messages indicate which resource limit/quota was exceeded
Special handling for preemptible jobs (PriorityInferenceNumber)
Support for elastic jobs with minimum resource requirements

Test Coverage

Basic limit/quota checks for each resource type (GPU, CPU, Memory)
Multi-level queue hierarchy checks
Elastic job handling with minimum required resources
Preemptible vs non-preemptible job behavior
Edge cases (zero limits/quotas, missing values)
Error message formatting
Helper function validation

Files Changed

Modified: pkg/scheduler/plugins/proportion/capacity_policy/capacity_policy.go
Modified: pkg/scheduler/plugins/proportion/capacity_policy/parent_queue_test.go

Testing

All tests are passing:

37 specs from the original test suite
10 new test cases for parent queue limit checking
3 test cases for the helper function

Impact

Improves scheduler efficiency by preventing unnecessary scheduling attempts
Provides clear error messages when limits/quotas are exceeded
Better resource management for different job types (preemptible, elastic)

romanbaron

Hi @shchennu 👋
Thanks for suggesting this change, I want to better understand the motivation here, more specifically what is the gap with IsJobOverQueueCapacity? From what I see it is called before PrePredicateFn so there is probably something that you identified as a gap in the implementation?

Another question, what is the problem with job allocation above parent's queue quota? If the job is preemptible it can be allowed.

shchennu · 2025-05-04T17:16:21Z

Hi @shchennu 👋 Thanks for suggesting this change, I want to better understand the motivation here, more specifically what is the gap with IsJobOverQueueCapacity? From what I see it is called before PrePredicateFn so there is probably something that you identified as a gap in the implementation?

Another question, what is the problem with job allocation above parent's queue quota? If the job is preemptible it can be allowed.

Hi @romanbaron 👋

Thank you for your questions! Let me address them:

Gap with IsJobOverQueueCapacity:
- IsJobOverQueueCapacity only checks the immediate queue's capacity
- Our new checkParentQueueQuotas function checks ALL parent queues up the hierarchy
- This is important because a job might fit in its immediate queue but exceed parent queue quotas
- By checking in PrePredicateFn, we fail fast and avoid unnecessary scheduling attempts
Preemptible Jobs and Parent Queue Quotas:
- You're absolutely right! Preemptible jobs should be allowed to exceed parent queue quotas
- I've updated the implementation to handle this:
  - Added a check for PriorityTrainNumber at the start of checkParentQueueQuotas
  - Preemptible jobs now skip quota checks entirely
  - Non-preemptible jobs still maintain strict quota enforcement
- This change is reflected in the test cases:
  - preemptible_job_can_exceed_parent_queue_GPU_quota
  - non-preemptible_job_cannot_exceed_parent_queue_GPU_quota

The changes ensure that:

Preemptible jobs can utilize resources beyond parent queue quotas
Non-preemptible jobs maintain strict quota enforcement
We still get early validation for jobs that would exceed quotas

enoodle

Hi,
I left a few comments, mostly about style.

I do have a question about the logic: how will this handle elastic jobs who doesn't have to schedule all pods to run? The job has a minimal number of pods that it needs to run, and you don't know how many will actually fit in the cluster / queue limits.

enoodle · 2025-05-04T20:56:34Z