// AI NATIVE STACK

AI Native › AI Native Infra › Accelerator and SuperPod › DRA

CRASH COURSE · AI-NATIVE · advanced · 10 min read · GA v1.34

DRA — Kubernetes finally schedules GPUs like it means it.

accelerator ai-native dra gpu kubernetes

TL;DR — Dynamic Resource Allocation is the new, GA-since-v1.34 way Kubernetes allocates specialized hardware. Instead of an opaque count (nvidia.com/gpu: 1), a pod files a ResourceClaim describing the device it needs — by model, memory, topology — and a driver matches it from published ResourceSlices. It brings fractional sharing, prioritized alternatives, and device health to core Kubernetes.

What it is

DRA is a framework in core Kubernetes for requesting and allocating specialized resources — GPUs, FPGAs, NICs — with rich constraints. It graduated to GA in Kubernetes v1.34 (Aug 2025), stable API resource.k8s.io/v1 on by default. In the AI Native landscape it's the future-facing piece of AI Native Infra › Accelerator and SuperPod — the model the others are converging on.

Why it exists

The old device-plugin model exposes hardware as a countable integer with no vocabulary for "an H100 with ≥40 GB, NVLink-connected to its peer." You can't express attributes, can't share fractionally in a first-class way, and can't ask for alternatives. DRA replaces counting with claiming: declarative, attribute-aware requests resolved by vendor drivers.

The objects

ObjectRole
DeviceClassA category of devices + how to select attributes (e.g. "NVIDIA GPU"). Cluster-scoped, admin-defined.
ResourceClaimA pod's request for device(s) matching constraints. The thing that gets allocated.
ResourceClaimTemplateStamps out a per-pod ResourceClaim (like a PVC template).
ResourceSliceDriver-published inventory of the devices available in a pool — what the scheduler matches against.
Pod + ResourceClaim"H100, 40GB" schedulermatches claim DeviceClass (admin) ResourceSlicedriver inventory

Fig 1 — A claim describes the need; the scheduler matches it against driver-published slices.

What v1.34 brought

  • Consumable Capacity (beta) — first-class fractional sharing: allocate, say, 10 GiB of a 40 GiB GPU safely across pods/namespaces.
  • Prioritized devices — list acceptable alternatives in order (one H100, else two mid GPUs); the scheduler tries them in turn.
  • Device health status — a device's health surfaces in Pod status (for DRA and device-plugin devices), so failures are diagnosable.
  • Vendor drivers — NVIDIA donated its DRA GPU driver and Google its TPU driver to the community.

Quick start

On v1.34+ the API is on by default; you install a vendor DRA driver, then a pod references a claim built from a DeviceClass:

# pod references a ResourceClaimTemplate; the driver + scheduler resolve it
spec:
  resourceClaims:
    - name: gpu
      resourceClaimTemplateName: single-gpu

The ResourceClaimTemplate selects a DeviceClass (e.g. NVIDIA GPUs) and adds constraints; the driver publishes ResourceSlices the scheduler matches.

When to use, when to skip

Use it on modern clusters (v1.34+) where you need attribute-aware allocation, fractional sharing, topology, or multi-vendor accelerators — it's the strategic direction and what new tooling targets. NVIDIA/Google driver donations signal broad consensus.

Hold off if you're on older Kubernetes, your platform/cloud hasn't shipped DRA drivers yet, or the simple device plugin already meets your needs. Migration is gradual — the device plugin still works.

heads up DRA needs a vendor driver that publishes ResourceSlices — the framework alone does nothing without it, and managed clouds roll out support on their own timelines. Check your distro/cloud before designing around it.

DRA vs the old way

ModelRequest looks likeNote
DRAResourceClaim with attributes/constraintsRich, fractional, GA v1.34
Device PluginOpaque count: nvidia.com/gpu: 1Simple, no attributes
HAMiResource fields for slicesSoftware sharing; integrates with DRA

References

Extra reads

Verified against kubernetes.io DRA docs, May 2026. GA as of v1.34 (resource.k8s.io/v1).

← AI Native Stack
© cvam — written in plaintext, served warm