Skip to content

DSA: unhealthy state leads to AdmissionError #2082

@tkatila

Description

@tkatila

Describe the bug
When a workload starts quickly after the DSA plugin has commenced, the workload can fail to start with "UnexpectedAdmissionError".

To Reproduce

while true; do make e2e-dsa; done

Expected behavior
e2e-dsa works on a good system without issues or crashes.

Screenshots

      pod "dpdk" failed with status: 
          <v1.PodStatus>: 
              message: 'Pod was rejected: Allocate failed due to no healthy devices present; cannot
                allocate unhealthy devices dsa.intel.com/wq-user-dedicated, which is unexpected'
              phase: Failed
              qosClass: Burstable
              reason: UnexpectedAdmissionError
              startTime: "2025-07-02T09:47:31Z"
      {

System (please complete the following information):

  • OS version: Ubuntu 24.04
  • Kernel version: 6.8
  • Device plugins version: 0.32.1
  • Hardware info: GNR

Additional context
Since DSA and IAA use the same base structure, I suspect IAA also suffers from this. But I haven't seen it.

I tried adding a time.sleep(time.Second * 2) here: https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/test/e2e/dsa/dsa.go#L94 which allowed me to run e2e-dsa 20 times in a row. I suspect this is a timing issue in the api server or the plugin. The plugin doesn't seem to crash but I haven't seen logs from it so far.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions