-
Notifications
You must be signed in to change notification settings - Fork 210
Description
Describe the bug
When a workload starts quickly after the DSA plugin has commenced, the workload can fail to start with "UnexpectedAdmissionError".
To Reproduce
while true; do make e2e-dsa; done
Expected behavior
e2e-dsa works on a good system without issues or crashes.
Screenshots
pod "dpdk" failed with status:
<v1.PodStatus>:
message: 'Pod was rejected: Allocate failed due to no healthy devices present; cannot
allocate unhealthy devices dsa.intel.com/wq-user-dedicated, which is unexpected'
phase: Failed
qosClass: Burstable
reason: UnexpectedAdmissionError
startTime: "2025-07-02T09:47:31Z"
{
System (please complete the following information):
- OS version: Ubuntu 24.04
- Kernel version: 6.8
- Device plugins version: 0.32.1
- Hardware info: GNR
Additional context
Since DSA and IAA use the same base structure, I suspect IAA also suffers from this. But I haven't seen it.
I tried adding a time.sleep(time.Second * 2)
here: https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/test/e2e/dsa/dsa.go#L94 which allowed me to run e2e-dsa 20 times in a row. I suspect this is a timing issue in the api server or the plugin. The plugin doesn't seem to crash but I haven't seen logs from it so far.