Skip to content

Conversation

klihub
Copy link
Collaborator

@klihub klihub commented Jun 14, 2025

This prototype patch set bolts a DRA allocation frontend on top of the existing topology aware resource policy plugin. The main intention with of this patch set is

  • provide something practical to play around with for the feasibility study of enabling DRA-based CPU allocation,
  • allow (relatively) easy experimentation with how to expose CPU as DRA devices (IOW test various CPU DRA attributes)
  • allow testing how DRA-based CPU allocation (using non-trivial CEL expressions) would scale with cluster and cluster node size

Notes:
This patched NRI plugin, especially in its current state and form, is not a proposal for a first real DRA-based CPU driver.

If you want to play around with this (for instance modify the exposed CPU abstraction), the easiest way is to

  1. fork the main NRI Reference Plugins repo
  2. enable github actions in your personal fork
  3. make any changes you want (for instance, to alter the CPU abstraction, take a look at cpu.DRA()
  4. Push your changes to ssh://[email protected]/$YOUR_FORK/nri-plugins/refs/heads/test/build/dra-driver.
  5. Wait for the image and Helm chart publishing actions to succeed
  6. Once done, you can pull the result in to your cluster with something like helm install --devel -n kube-system test oci://ghcr.io/$YOUR_GITHUB_USERID/nri-plugins/helm-charts/nri-resource-policy-topology-aware --version v0.9-dra-driver-unstable

You can then test if things work with something like

apiVersion: resource.k8s.io/v1beta1
kind: ResourceClaimTemplate
metadata:
  name: any-cores
spec:
  spec:
    devices:
      requests:
      - name: cpu
        deviceClassName: native.cpu
---
apiVersion: resource.k8s.io/v1beta1
kind: ResourceClaimTemplate
metadata:
  name: p-cores
spec:
  spec:
    devices:
      requests:
      - name: cpu
        deviceClassName: native.cpu
        selectors:
          - cel:
              expression: device.attributes["native.cpu"].coreType == "P-core"
        count: 1
---
apiVersion: resource.k8s.io/v1beta1
kind: ResourceClaimTemplate
metadata:
  name: e-cores
spec:
  spec:
    devices:
      requests:
      - name: cpu
        deviceClassName: native.cpu
        selectors:
          - cel:
              expression: device.attributes["native.cpu"].coreType == "E-core"
        count: 1
---
apiVersion: v1
kind: Pod
metadata:
  name: pcore-test
  labels:
    app: pod
spec:
  containers:
  - name: ctr0
    image: busybox
    imagePullPolicy: IfNotPresent
    args:
      - /bin/sh
      - -c
      - trap 'exit 0' TERM; sleep 3600 & wait
    resources:
      requests:
        cpu: 1
        memory: 100M
      limits:
        cpu: 1
        memory: 100M
      claims:
      - name: claim-pcores
  resourceClaims:
  - name: claim-pcores
    resourceClaimTemplateName: p-cores
  terminationGracePeriodSeconds: 1

@klihub klihub force-pushed the test/build/dra-driver branch 3 times, most recently from d10467e to 0f3a301 Compare June 14, 2025 14:08
@klihub klihub changed the title [prototype]: bolt a test DRA driver on top of the topology-aware policy plugin. [proto]: bolt a DRA driver frontend on the topology-aware policy. Jun 14, 2025
@klihub klihub force-pushed the test/build/dra-driver branch 3 times, most recently from 66c2519 to 8527808 Compare June 14, 2025 15:47
@klihub klihub force-pushed the test/build/dra-driver branch from 8527808 to f96ea65 Compare June 23, 2025 06:36
@klihub klihub force-pushed the test/build/dra-driver branch 2 times, most recently from 776684c to 7ce62a1 Compare August 4, 2025 09:39
klihub added 20 commits August 11, 2025 13:11
A dear child has many names...

Signed-off-by: Krisztian Litkey <[email protected]>
AsYaml can be used to produced YAML-formatted log blocks with
something like this:

  logger.DebugBlock(" <my-obj> ", "%s", log.AsYaml(my-obj))

Signed-off-by: Krisztian Litkey <[email protected]>
Split out K8s client setup code from agent to make it more generally
available to any kind of plugin, not just resource management ones.

Signed-off-by: Krisztian Litkey <[email protected]>
Allow setting the content types a client accepts and the
type it uses on the wire. Provide constants for JSON and
protobuf content types.

Signed-off-by: Krisztian Litkey <[email protected]>
Split out K8s watch wrapper setup code from agent to make it
generally available to any kind of plugin, not just resource
management ones.

Signed-off-by: Krisztian Litkey <[email protected]>
Add the necessary RBAC rules (access resource slices and claims)
and kubelet host mounts (plugin and plugin registry directories)
to the topology-aware policy Helm chart.

Signed-off-by: Krisztian Litkey <[email protected]>
Sort caches by level, kind, and id to enumerate caches globally.

Signed-off-by: Krisztian Litkey <[email protected]>
Add an interface for caching non-policy-specific data, similar but
simpler than {Get,Set}PolicyData().

Signed-off-by: Krisztian Litkey <[email protected]>
Signed-off-by: Krisztian Litkey <[email protected]>
Bolt DRA-claimed CPU allocation on topology aware policy.

Signed-off-by: Krisztian Litkey <[email protected]>
@klihub klihub force-pushed the test/build/dra-driver branch from 7ce62a1 to a3b4047 Compare August 11, 2025 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant