Install Drivers and Allocate Devices with DRA

FEATURE STATE: Kubernetes v1.32 [beta] (enabled by default: false)

This tutorial shows you how to install DRA drivers in your cluster and how to use them in conjunction with the DRA APIs to allocate devices to Pods. This page is intended for cluster administrators.

Dynamic Resource Allocation (DRA) is a Kubernetes feature that allows a cluster to manage availability and allocation of hardware resources to satisfy Pod-based claims for hardware requirements and preferences (see the DRA Concept page for more background). To support this, a mixture of Kubernetes built-in components (like the Kubernetes scheduler, kubelet, and kube-controller-manager) and third-party components (called DRA drivers) share the responsibility to advertise, allocate, prepare, mount, healthcheck, unprepare, and cleanup resources throughout the Pod lifecycle. These components share information via a series of DRA specific APIs in the resource.k8s.io API group, including DeviceClasses, ResourceSlices, ResourceClaims, as well as new fields in the Pod spec itself.

Objectives

  • Deploy an example DRA driver
  • Deploy a Pod requesting a hardware claim using DRA APIs
  • Delete a Pod that has a claim

Before you begin

Your cluster should support RBAC. You can try this tutorial with a cluster using a different authorization mechanism, but in that case you will have to adapt the steps around defining roles and permissions.

You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a cluster, you can create one by using minikube or you can use one of these Kubernetes playgrounds:

This tutorial has been tested with Linux nodes, though it may also work with other types of nodes.

Your Kubernetes server must be at or later than version v1.32.

To check the version, enter kubectl version.

Your cluster also must be configured to use the Dynamic Resource Allocation feature. To enable the DRA feature, you must enable the following feature gates and API groups:

  1. Enable the DynamicResourceAllocation feature gate on all of the following components:

    • kube-apiserver
    • kube-controller-manager
    • kube-scheduler
    • kubelet
  2. Enable the following API groups:

    • resource.k8s.io/v1beta1: required for DRA to function.
    • resource.k8s.io/v1beta2: optional, recommended improvements to the user experience.

    For more information, see Enabling or disabling API groups.

Explore the DRA initial state

With no driver installed or Pod claims yet to satisfy, you can observe the initial state of a cluster with DRA enabled.

  1. Get a list of DeviceClasses:

    kubectl get deviceclasses
    

    The output is similar to this:

    No resources found
    

    If you set up a new blank cluster for this tutorial, it's normal to find that there are no DeviceClasses. Learn more about DeviceClasses here.

  2. Get a list of ResourceSlices:

    kubectl get resourceslices
    

    The output is similar to this:

    No resources found
    

    If you set up a new blank cluster for this tutorial, it's normal to find that there are no ResourceSlices advertised. Learn more about ResourceSlices here.

  3. View ResourceClaims and ResourceClaimTemplates

    kubectl get resourceclaims -A
    kubectl get resourceclaimtemplates -A
    

    The output is similar to this:

    No resources found
    No resources found
    

    If you set up a new blank cluster for this tutorial, it's normal to find that there are no ResourceClaims or ResourceClaimTemplates as you, the user, have not created any. Learn more about ResourceClaims and ResourceClaimTemplates here.

At this point, you have confirmed that DRA is enabled and configured properly in the cluster, and that no DRA drivers have advertised any resources to the DRA APIs yet.

Install an example DRA driver

DRA drivers are third-party applications that run on each node of your cluster to interface with the hardware of that node and Kubernetes' built-in DRA components. The installation procedure depends on the driver you choose, but is likely deployed as a DaemonSet to all or a selection of the nodes (using selectors or similar mechanisms) in your cluster.

Check your driver's documentation for specific installation instructions, which may include a Helm chart, a set of manifests, or other deployment tooling.

This tutorial uses an example driver which can be found in the kubernetes-sigs/dra-example-driver repository to demonstrate driver installation.

Prepare your cluster for driver installation

To make it easier to cleanup later, create a namespace called dra-tutorial in your cluster.

In a production environment, you would likely be using a previously released or qualified image from the driver vendor or your own organization, and your nodes would need to have access to the image registry where the driver image is hosted. In this tutorial, you will use a publicly released image of the dra-example-driver to simulate access to a DRA driver image.

  1. Create the namespace:

    kubectl create namespace dra-tutorial 
    
  2. Confirm your nodes have access to the image by running the following from within one of your cluster's nodes:

    docker pull registry.k8s.io/dra-example-driver/dra-example-driver:v0.1.0
    

Deploy the DRA driver components

For this tutorial, you will install the critical example resource driver components individually with kubectl.

  1. Create the DeviceClass representing the device types this DRA driver supports:

    apiVersion: resource.k8s.io/v1beta1
    kind: DeviceClass
    metadata:
      name: gpu.example.com
    spec:
      selectors:
      - cel: 
          expression: "device.driver == 'gpu.example.com'"
    kubectl apply --server-side -f http://k8s.io/examples/dra/driver-install/deviceclass.yaml
    
  2. Create the ServiceAccount, ClusterRole and ClusterRoleBinding that will be used by the driver to gain permissions to interact with the Kubernetes API on this cluster:

    1. Create the Service Account:

      apiVersion: v1
      kind: ServiceAccount
      metadata:
        name: dra-example-driver-service-account
        namespace: dra-tutorial
        labels:
          app.kubernetes.io/name: dra-example-driver
          app.kubernetes.io/instance: dra-example-driver
      kubectl apply --server-side -f http://k8s.io/examples/dra/driver-install/serviceaccount.yaml
      
    2. Create the ClusterRole:

      apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRole
      metadata:
        name: dra-example-driver-role
      rules:
      - apiGroups: ["resource.k8s.io"]
        resources: ["resourceclaims"]
        verbs: ["get"]
      - apiGroups: [""]
        resources: ["nodes"]
        verbs: ["get"]
      - apiGroups: ["resource.k8s.io"]
        resources: ["resourceslices"]
        verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
      kubectl apply --server-side -f http://k8s.io/examples/dra/driver-install/clusterrole.yaml
      
    3. Create the ClusterRoleBinding:

      apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRoleBinding
      metadata:
        name: dra-example-driver-role-binding
      subjects:
      - kind: ServiceAccount
        name: dra-example-driver-service-account
        namespace: dra-tutorial
      roleRef:
        kind: ClusterRole
        name: dra-example-driver-role
        apiGroup: rbac.authorization.k8s.io
      kubectl apply --server-side -f http://k8s.io/examples/dra/driver-install/clusterrolebinding.yaml
      
  3. Create a PriorityClass for the DRA driver. The DRA driver component is responsible for important lifecycle operations for Pods with claims, so you don't want it to be preempted. Learn more about pod priority and preemption here. Learn more about good practices when maintaining a DRA driver here.

    apiVersion: scheduling.k8s.io/v1
    kind: PriorityClass
    metadata:
      name: dra-driver-high-priority
    value: 1000000
    globalDefault: false
    description: "This priority class should be used for DRA driver pods only."
    kubectl apply --server-side -f http://k8s.io/examples/dra/driver-install/priorityclass.yaml
    
  4. Deploy the actual DRA driver as a DaemonSet configured to run the example driver binary with the permissions provisioned above.

    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      name: dra-example-driver-kubeletplugin
      namespace: dra-tutorial
      labels:
        app.kubernetes.io/name: dra-example-driver
    spec:
      selector:
        matchLabels:
          app.kubernetes.io/name: dra-example-driver
      updateStrategy:
        type: RollingUpdate
      template:
        metadata:
          labels:
            app.kubernetes.io/name: dra-example-driver
        spec:
          priorityClassName: dra-driver-high-priority
          serviceAccountName: dra-example-driver-service-account
          securityContext:
            {}
          containers:
          - name: plugin
            securityContext:
              privileged: true
            image: registry.k8s.io/dra-example-driver/dra-example-driver:v0.1.0
            imagePullPolicy: IfNotPresent
            command: ["dra-example-kubeletplugin"]
            resources:
              {}
            # Production drivers should always implement a liveness probe
            # For the tutorial we simply omit it
            # livenessProbe:
            #   grpc:
            #     port: 51515
            #     service: liveness
            #   failureThreshold: 3
            #   periodSeconds: 10
            env:
            - name: CDI_ROOT
              value: /var/run/cdi
            - name: KUBELET_REGISTRAR_DIRECTORY_PATH
              value: "/var/lib/kubelet/plugins_registry"
            - name: KUBELET_PLUGINS_DIRECTORY_PATH
              value: "/var/lib/kubelet/plugins"
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            # Simulated number of devices the example driver will pretend to have.
            - name: NUM_DEVICES
              value: "9"
            - name: HEALTHCHECK_PORT
              value: "51515"
            volumeMounts:
            - name: plugins-registry
              mountPath: "/var/lib/kubelet/plugins_registry"
            - name: plugins
              mountPath: "/var/lib/kubelet/plugins"
            - name: cdi
              mountPath: /var/run/cdi
          volumes:
          - name: plugins-registry
            hostPath:
              path: "/var/lib/kubelet/plugins_registry"
          - name: plugins
            hostPath:
              path: "/var/lib/kubelet/plugins"
          - name: cdi
            hostPath:
              path: /var/run/cdi
    kubectl apply --server-side -f http://k8s.io/examples/dra/driver-install/daemonset.yaml
    

    It is configured with the volume mounts necessary to interact with the underlying Container Device Interface (CDI) directory, and to expose its socket to kubelet via the kubelet plugins directory.

Verify the DRA driver installation

  1. Observe the Pods of the DRA driver DaemonSet across all worker nodes:

    kubectl get pod -l app.kubernetes.io/name=dra-example-driver -n dra-tutorial
    

    The output is similar to this:

    NAME                                     READY   STATUS    RESTARTS   AGE
    dra-example-driver-kubeletplugin-4sk2x   1/1     Running   0          13s
    dra-example-driver-kubeletplugin-cttr2   1/1     Running   0          13s
    
  2. The initial reponsibility of each node's local DRA driver is to update the cluster with what devices are available to Pods on that node, by publishing its metadata to the ResourceSlices API. You can check that API to see that each node with a driver is advertising the device class it represents.

    Check for available ResourceSlices:

    kubectl get resourceslices
    

    The output is similar to this:

    NAME                                 NODE           DRIVER            POOL           AGE
    kind-worker-gpu.example.com-k69gd    kind-worker    gpu.example.com   kind-worker    19s
    kind-worker2-gpu.example.com-qdgpn   kind-worker2   gpu.example.com   kind-worker2   19s
    

At this point, you have successfully installed the example DRA driver, and confirmed its initial configuration. You're now ready to use DRA to schedule Pods.

Claim resources and deploy a Pod

To request resources using DRA, you create ResourceClaims or ResourceClaimTemplates that define the resources that your Pods need. In the example driver, a memory capacity attribute is exposed for mock GPU devices. This section shows you how to use Common Expression Language to express your requirements in a ResourceClaim, select that ResourceClaim in a Pod specification, and observe the resource allocation.

This tutorial showcases only one basic example of a DRA ResourceClaim. Read Dynamic Resource Allocation to learn more about ResourceClaims.

Create the ResourceClaim

The Pod manifest itself will include a reference to its relevant ResourceClaim object, which you will create now. Whatever the claim, the deviceClassName is a required field, narrowing down the scope of the request to a specific device class. The request itself can include a Common Expression Language expression that references attributes that may be advertised by the driver managing that device class.

In this example, you will create a request for any GPU advertising over 10Gi memory capacity. The attribute exposing capacity from the example driver takes the form device.capacity['gpu.example.com'].memory. Note also that the name of the claim is set to some-gpu.

apiVersion: resource.k8s.io/v1beta2
kind: ResourceClaim
metadata:
 name: some-gpu
 namespace: dra-tutorial
spec:
   devices:
     requests:
     - name: some-gpu
       exactly:
         deviceClassName: gpu.example.com
         selectors:
         - cel:
             expression: "device.capacity['gpu.example.com'].memory.compareTo(quantity('10Gi')) >= 0"
kubectl apply --server-side -f http://k8s.io/examples/dra/driver-install/example/resourceclaim.yaml

Create the Pod that references that ResourceClaim

Below is the Pod manifest referencing the ResourceClaim you just made, some-gpu, in the spec.resourceClaims.resourceClaimName field. The local name for that claim, gpu, is then used in the spec.containers.resources.claims.name field to allocate the claim to the Pod's underlying container.

apiVersion: v1
kind: Pod
metadata:
  name: pod0
  namespace: dra-tutorial
  labels:
    app: pod
spec:
  containers:
  - name: ctr0
    image: ubuntu:24.04
    command: ["bash", "-c"]
    args: ["export; trap 'exit 0' TERM; sleep 9999 & wait"]
    resources:
      claims:
      - name: gpu
  resourceClaims:
  - name: gpu
    resourceClaimName: some-gpu
kubectl apply --server-side -f http://k8s.io/examples/dra/driver-install/example/pod.yaml

Explore the DRA state

The cluster now tries to schedule that Pod to a node where Kubernetes can satisfy the ResourceClaim. In our situation, the DRA driver is deployed on all nodes, and is advertising mock GPUs on all nodes, all of which have enough capacity advertised to satisfy the Pod's claim, so this Pod may be scheduled to any node and any of the mock GPUs on that node may be allocated.

The mock GPU driver injects environment variables in each container it is allocated to in order to indicate which GPUs would have been injected into them by a real resource driver and how they would have been configured, so you can check those environment variables to see how the Pods have been handled by the system.

  1. Confirm the pod has deployed:

    kubectl get pod pod0 -n dra-tutorial
    

    The output is similar to this:

    NAME   READY   STATUS    RESTARTS   AGE
    pod0   1/1     Running   0          9s
    
  2. Observe the pod logs which report the name of the mock GPU allocated:

    kubectl logs pod0 -c ctr0 -n dra-tutorial | grep -E "GPU_DEVICE_[0-9]+=" | grep -v "RESOURCE_CLAIM"
    

    The output is similar to this:

    declare -x GPU_DEVICE_4="gpu-4"
    
  3. Observe the ResourceClaim object:

    You can observe the ResourceClaim more closely, first only to see its state is allocated and reserved.

    kubectl get resourceclaims -n dra-tutorial
    

    The output is similar to this:

    NAME       STATE                AGE
    some-gpu   allocated,reserved   34s
    

    Looking deeper at the some-gpu ResourceClaim, you can see that the status stanza includes information about the device that has been allocated and for what pod it has been reserved for:

    kubectl get resourceclaim some-gpu -n dra-tutorial -o yaml
    

    The output is similar to this:

     1apiVersion: v1
     2items:
     3- apiVersion: resource.k8s.io/v1beta2
     4  kind: ResourceClaim
     5  metadata:
     6    creationTimestamp: "2025-07-29T05:11:52Z"
     7    finalizers:
     8    - resource.kubernetes.io/delete-protection
     9    name: some-gpu
    10    namespace: dra-tutorial
    11    resourceVersion: "58357"
    12    uid: 79e1e8d8-7e53-4362-aad1-eca97678339e
    13  spec:
    14    devices:
    15      requests:
    16      - exactly:
    17          allocationMode: ExactCount
    18          count: 1
    19          deviceClassName: gpu.example.com
    20          selectors:
    21          - cel:
    22              expression: device.capacity['gpu.example.com'].memory.compareTo(quantity('10Gi'))
    23                >= 0
    24        name: some-gpu
    25  status:
    26    allocation:
    27      devices:
    28        results:
    29        - adminAccess: null
    30          device: gpu-4
    31          driver: gpu.example.com
    32          pool: kind-worker
    33          request: some-gpu
    34      nodeSelector:
    35        nodeSelectorTerms:
    36        - matchFields:
    37          - key: metadata.name
    38            operator: In
    39            values:
    40            - kind-worker
    41    reservedFor:
    42    - name: pod0
    43      resource: pods
    44      uid: fa55b59b-d28d-4f7d-9e5b-ef4c8476dff5
    45kind: List
    46metadata:
    47  resourceVersion: ""

  4. Observe the driver by checking the pod logs for pods backing the driver daemonset:

    kubectl logs -l app.kubernetes.io/name=dra-example-driver -n dra-tutorial
    

    The output is similar to this:

    I0729 05:11:52.679057       1 driver.go:84] NodePrepareResource is called: number of claims: 1
    I0729 05:11:52.684450       1 driver.go:112] Returning newly prepared devices for claim '79e1e8d8-7e53-4362-aad1-eca97678339e': [&Device{RequestNames:[some-gpu],PoolName:kind-worker,DeviceName:gpu-4,CDIDeviceIDs:[k8s.gpu.example.com/gpu=common k8s.gpu.example.com/gpu=79e1e8d8-7e53-4362-aad1-eca97678339e-gpu-4],}]
    

You have now successfully deployed a Pod with a DRA based claim, and seen it scheduled to an appropriate node and the associated DRA APIs updated to reflect its status.

Remove the Pod with a claim

When a Pod with a claim is deleted, the DRA driver deallocates the resource so it can be available for future scheduling. You can observe that by deleting our pod with a claim and seeing that the state of the ResourceClaim changes.

Delete the pod using the resource claim

  1. Delete the pod directly:

    kubectl delete pod pod0 -n dra-tutorial
    

    The output is similar to this:

    pod "pod0" deleted
    

Observe the DRA state

The driver will deallocate the hardware and update the corresponding ResourceClaim resource that previously held the association.

  1. Check the ResourceClaim is now pending:

    kubectl get resourceclaims -n dra-tutorial
    

    The output is similar to this:

    NAME       STATE     AGE
    some-gpu   pending   76s
    
  2. Observe the driver logs and see that it processed unpreparing the device for this claim:

    kubectl logs -l app.kubernetes.io/name=dra-example-driver -n dra-tutorial
    

    The output is similar to this:

    I0729 05:13:02.144623       1 driver.go:117] NodeUnPrepareResource is called: number of claims: 1
    

You have now deleted a Pod that had a claim, and observed that the driver took action to unprepare the underlying hardware resource and update the DRA APIs to reflect that the resource is available again for future scheduling.

Cleaning up

To cleanup the resources, delete the namespace for the tutorial which will clean up the ResourceClaims, driver components, and ServiceAccount. Then also delete the cluster level DeviceClass resource and cluster level RBAC resources.

kubectl delete namespace dra-tutorial
kubectl delete deviceclass gpu.example.com
kubectl delete clusterrole dra-example-driver-role
kubectl delete clusterrolebinding dra-example-driver-role-binding

What's next

Last modified August 13, 2025 at 7:53 PM PST: Use dedicated priorityclass (098eb145b9)