|
| 1 | +# Prototyping CPU DRA device abstraction / DRA-based CPU allocation |
| 2 | + |
| 3 | +## Background |
| 4 | + |
| 5 | +This prototype patch set bolts a DRA allocation frontend on top of the existing |
| 6 | +topology aware resource policy plugin. The main intention with of this patch set |
| 7 | +is to |
| 8 | + |
| 9 | +- provide something practical to play around with for the [feasibility study]( https://docs.google.com/document/d/1Tb_dC60YVCBr7cNYWuVLddUUTMcNoIt3zjd5-8rgug0/edit?tab=t.0#heading=h.iutbebngx80e) of enabling DRA-based CPU allocation, |
| 10 | +- allow (relatively) easy experimentation with how to expose CPU as DRA |
| 11 | +devices (IOW test various CPU DRA attributes) |
| 12 | +- allow testing how DRA-based CPU allocation (using non-trivial CEL expressions) |
| 13 | +would scale with cluster and cluster node size |
| 14 | + |
| 15 | +## Notes |
| 16 | + |
| 17 | +This patched NRI plugin, especially in its current state and form, is |
| 18 | +*not a proposal* for a first real DRA-based CPU driver. |
| 19 | + |
| 20 | +## Prerequisites for Testing |
| 21 | + |
| 22 | +To test out this in a cluster, make sure you have |
| 23 | + |
| 24 | +1. DRA enabled in your cluster |
| 25 | +One way to ensure it is to bootstrap you cluster using an InitConfig with the |
| 26 | +following bits set: |
| 27 | + |
| 28 | +```yaml |
| 29 | +apiVersion: kubeadm.k8s.io/v1beta4 |
| 30 | +kind: InitConfiguration |
| 31 | +... |
| 32 | +--- |
| 33 | +apiServer: |
| 34 | + extraArgs: |
| 35 | + - name: feature-gates |
| 36 | + value: DynamicResourceAllocation=true,DRADeviceTaints=true,DRAAdminAccess=true,DRAPrioritizedList=true,DRAPartitionableDevices=true,DRAResourceClaimDeviceStatus=true |
| 37 | + - name: runtime-config |
| 38 | + value: resource.k8s.io/v1beta2=true,resource.k8s.io/v1beta1=true,resource.k8s.io/v1alpha3=true |
| 39 | +apiVersion: kubeadm.k8s.io/v1beta4 |
| 40 | +... |
| 41 | +controllerManager: |
| 42 | + extraArgs: |
| 43 | + - name: feature-gates |
| 44 | + value: DynamicResourceAllocation=true,DRADeviceTaints=true |
| 45 | +... |
| 46 | +scheduler: |
| 47 | + extraArgs: |
| 48 | + - name: feature-gates |
| 49 | + value: DynamicResourceAllocation=true,DRADeviceTaints=true,DRAAdminAccess=true,DRAPrioritizedList=true,DRAPartitionableDevices=true |
| 50 | +--- |
| 51 | +apiVersion: kubelet.config.k8s.io/v1beta1 |
| 52 | +kind: KubeletConfiguration |
| 53 | +featureGates: |
| 54 | + DynamicResourceAllocation: true |
| 55 | +``` |
| 56 | +
|
| 57 | +2. CDI enabled in your runtime configuration |
| 58 | +
|
| 59 | +## Installation and Testing |
| 60 | +
|
| 61 | +Once you have your cluster properly set upset up, you can pull this in to |
| 62 | +your cluster with for testing with something like this: |
| 63 | +
|
| 64 | +```bash |
| 65 | +helm install --devel -n kube-system test oci://ghcr.io/klihub/nri-plugins/helm-charts/nri-resource-policy-topology-aware --version v0.9-dra-driver-unstable --set image.pullPolicy=Always --set extraEnv.OVERRIDE_SYS_ATOM_CPUS='2-5' --set extraEnv.OVERRIDE_SYS_CORE_CPUS='0\,1\,6-15' |
| 66 | +``` |
| 67 | + |
| 68 | +Once the NRI plugin+DRA driver is up and running, you should see some CPUs |
| 69 | +exposed as DRI devices. You can check the resource slices with the following |
| 70 | +command |
| 71 | + |
| 72 | +```bash |
| 73 | +[kli@n4c16-fedora-40-cloud-base-containerd ~]# kubectl get resourceslices |
| 74 | +NAME NODE DRIVER POOL AGE |
| 75 | +n4c16-fedora-40-cloud-base-containerd-native.cpu-jxfkj n4c16-fedora-40-cloud-base-containerd native.cpu pool0 4d2h |
| 76 | +``` |
| 77 | + |
| 78 | +And the exposed devices like this: |
| 79 | + |
| 80 | +```bash |
| 81 | +[kli@n4c16-fedora-40-cloud-base-containerd ~]# kubectl get resourceslices -oyaml | less |
| 82 | +apiVersion: v1 |
| 83 | +items: |
| 84 | +- apiVersion: resource.k8s.io/v1beta2 |
| 85 | + kind: ResourceSlice |
| 86 | + metadata: |
| 87 | + creationTimestamp: "2025-06-10T06:01:54Z" |
| 88 | + generateName: n4c16-fedora-40-cloud-base-containerd-native.cpu- |
| 89 | + generation: 1 |
| 90 | + name: n4c16-fedora-40-cloud-base-containerd-native.cpu-jxfkj |
| 91 | + ownerReferences: |
| 92 | + - apiVersion: v1 |
| 93 | + controller: true |
| 94 | + kind: Node |
| 95 | + name: n4c16-fedora-40-cloud-base-containerd |
| 96 | + uid: 90a99f1f-c1ca-4bea-8dbd-3cc821f744b1 |
| 97 | + resourceVersion: "871388" |
| 98 | + uid: 4639d31f-e508-4b0a-8378-867f6c1c7cb1 |
| 99 | + spec: |
| 100 | + devices: |
| 101 | + - attributes: |
| 102 | + cache0ID: |
| 103 | + int: 0 |
| 104 | + cache1ID: |
| 105 | + int: 8 |
| 106 | + cache2ID: |
| 107 | + int: 16 |
| 108 | + cache3ID: |
| 109 | + int: 24 |
| 110 | + cluster: |
| 111 | + int: 0 |
| 112 | + core: |
| 113 | + int: 0 |
| 114 | + coreType: |
| 115 | + string: P-core |
| 116 | + die: |
| 117 | + int: 0 |
| 118 | + isolated: |
| 119 | + bool: false |
| 120 | + localMemory: |
| 121 | + int: 0 |
| 122 | + package: |
| 123 | + int: 0 |
| 124 | + name: cpu1 |
| 125 | + - attributes: |
| 126 | + - attributes: |
| 127 | + cache0ID: |
| 128 | + int: 1 |
| 129 | + cache1ID: |
| 130 | + int: 9 |
| 131 | + cache2ID: |
| 132 | + int: 17 |
| 133 | + cache3ID: |
| 134 | + int: 24 |
| 135 | + cluster: |
| 136 | + int: 2 |
| 137 | + core: |
| 138 | + int: 1 |
| 139 | + coreType: |
| 140 | + string: E-core |
| 141 | + die: |
| 142 | + int: 0 |
| 143 | + isolated: |
| 144 | + bool: false |
| 145 | + localMemory: |
| 146 | + int: 0 |
| 147 | + package: |
| 148 | + int: 0 |
| 149 | + name: cpu2 |
| 150 | + - attributes: |
| 151 | + cache0ID: |
| 152 | + int: 1 |
| 153 | + cache1ID: |
| 154 | + int: 9 |
| 155 | + cache2ID: |
| 156 | + int: 17 |
| 157 | + cache3ID: |
| 158 | + int: 24 |
| 159 | + cluster: |
| 160 | + int: 2 |
| 161 | + core: |
| 162 | +... |
| 163 | +``` |
| 164 | + |
| 165 | +If everything looks fine and you do have CPUs available as DRA devices, you |
| 166 | +can test DRA-based CPU allocation with something like this. This allocates |
| 167 | +a single P-core for the container. |
| 168 | + |
| 169 | +```yaml |
| 170 | +apiVersion: resource.k8s.io/v1beta1 |
| 171 | +kind: ResourceClaimTemplate |
| 172 | +metadata: |
| 173 | + name: any-cores |
| 174 | +spec: |
| 175 | + spec: |
| 176 | + devices: |
| 177 | + requests: |
| 178 | + - name: cpu |
| 179 | + deviceClassName: native.cpu |
| 180 | +--- |
| 181 | +apiVersion: resource.k8s.io/v1beta1 |
| 182 | +kind: ResourceClaimTemplate |
| 183 | +metadata: |
| 184 | + name: p-cores |
| 185 | +spec: |
| 186 | + spec: |
| 187 | + devices: |
| 188 | + requests: |
| 189 | + - name: cpu |
| 190 | + deviceClassName: native.cpu |
| 191 | + selectors: |
| 192 | + - cel: |
| 193 | + expression: device.attributes["native.cpu"].coreType == "P-core" |
| 194 | + count: 1 |
| 195 | +--- |
| 196 | +apiVersion: resource.k8s.io/v1beta1 |
| 197 | +kind: ResourceClaimTemplate |
| 198 | +metadata: |
| 199 | + name: e-cores |
| 200 | +spec: |
| 201 | + spec: |
| 202 | + devices: |
| 203 | + requests: |
| 204 | + - name: cpu |
| 205 | + deviceClassName: native.cpu |
| 206 | + selectors: |
| 207 | + - cel: |
| 208 | + expression: device.attributes["native.cpu"].coreType == "E-core" |
| 209 | + count: 1 |
| 210 | +--- |
| 211 | +apiVersion: v1 |
| 212 | +kind: Pod |
| 213 | +metadata: |
| 214 | + name: pcore-test |
| 215 | + labels: |
| 216 | + app: pod |
| 217 | +spec: |
| 218 | + containers: |
| 219 | + - name: ctr0 |
| 220 | + image: busybox |
| 221 | + imagePullPolicy: IfNotPresent |
| 222 | + args: |
| 223 | + - /bin/sh |
| 224 | + - -c |
| 225 | + - trap 'exit 0' TERM; sleep 3600 & wait |
| 226 | + resources: |
| 227 | + requests: |
| 228 | + cpu: 1 |
| 229 | + memory: 100M |
| 230 | + limits: |
| 231 | + cpu: 1 |
| 232 | + memory: 100M |
| 233 | + claims: |
| 234 | + - name: claim-pcores |
| 235 | + resourceClaims: |
| 236 | + - name: claim-pcores |
| 237 | + resourceClaimTemplateName: p-cores |
| 238 | + terminationGracePeriodSeconds: 1 |
| 239 | +``` |
| 240 | +
|
| 241 | +If you want to try a mixed native CPU + DRA-based allocation, try |
| 242 | +increasing the CPU request and limit in the pods spec to 1500m CPUs |
| 243 | +or CPUs and see what happens. |
| 244 | +
|
| 245 | +
|
| 246 | +## Playing Around with CPU Abstractions |
| 247 | +
|
| 248 | +If you want to play around with this (for instance modify the exposed CPU abstraction), the easiest way is to |
| 249 | +1. [fork](https://github.com/containers/nri-plugins/fork) the [main NRI Reference Plugins](https://github.com/containers/nri-plugins) repo |
| 250 | +2. enable github actions in your personal fork |
| 251 | +3. make any changes you want (for instance, to alter the CPU abstraction, take a look at [cpu.DRA()](https://github.com/klihub/nri-plugins/blob/test/build/dra-driver/pkg/sysfs/dra.go) |
| 252 | +4. Push your changes to ssh://[email protected]/$YOUR_FORK/nri-plugins/refs/heads/test/build/dra-driver. |
| 253 | +5. Wait for the image and Helm chart publishing actions to succeed |
| 254 | +6. Once done, you can pull the result in to your cluster with something like `helm install --devel -n kube-system test oci://ghcr.io/$YOUR_GITHUB_USERID/nri-plugins/helm-charts/nri-resource-policy-topology-aware --version v0.9-dra-driver-unstable` |
0 commit comments