TL;DR — Kubernetes has no built-in idea of a GPU. The device plugin framework is the kubelet API that lets a vendor advertise hardware as a schedulable resource like nvidia.com/gpu. NVIDIA's k8s-device-plugin implements it, and can oversubscribe a GPU via time-slicing or MPS — sharing without isolation. It's the foundational layer the GPU Operator and DRA build on.
What it is
The device plugin framework is a Kubernetes interface for advertising non-CPU/memory resources — GPUs, NICs, FPGAs — to the kubelet so the scheduler can allocate them. The NVIDIA k8s-device-plugin is the GPU implementation: a DaemonSet that discovers GPUs on each node and exposes them as nvidia.com/gpu. In the AI Native landscape it's the bedrock piece of AI Native Infra › Accelerator and SuperPod.
Why it exists
The scheduler only understands CPU and memory natively. Without a device plugin, a GPU on a node is invisible — pods can't request it and the scheduler can't place GPU work. The plugin closes that gap by registering the device type with the kubelet and reporting how many are available, turning hardware into a first-class, requestable resource.
How it works
The plugin registers with the kubelet over a gRPC socket, lists the GPUs it found, and on allocation hands the kubelet the device IDs + mounts a container needs. The kubelet reports the count to the API server; the scheduler then treats nvidia.com/gpu like any countable resource in pod requests.
Fig 1 — Plugin → kubelet → scheduler: hardware becomes a requestable resource.
Oversubscription: time-slicing & MPS
By default one GPU = one allocatable unit. The NVIDIA plugin can oversubscribe it by advertising replicas:
- Time-slicing — declare N replicas of a GPU; pods get a replica each and CUDA time-slices between them. Simple, but no isolation — they share memory and fault domain.
- MPS (Multi-Process Service) — run kernels concurrently with some resource partitioning; better than raw time-slicing for certain workloads.
Enabling replicas relabels the node (nvidia.com/gpu.replicas, product tagged -SHARED) so you can target shared vs whole GPUs.
Quick start
If you run the GPU Operator, the device plugin is already installed — you rarely deploy it alone. Standalone, it's a DaemonSet:
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.17.0/deployments/static/nvidia-device-plugin.yml
kubectl describe node <gpu-node> | grep nvidia.com/gpu # see advertised GPUs
Then a pod requests limits: nvidia.com/gpu: 1 as usual. Time-slicing is enabled via a small ConfigMap setting the replica count.
When to use, when to skip
Use it — you effectively always do: it's the mechanism that makes GPUs schedulable. The real decisions are how you run it (bundled in the GPU Operator) and whether to enable time-slicing/MPS for cheap sharing of light workloads.
Move beyond it when you need real isolation or quotas: time-slicing has none — reach for MIG or HAMi. And the device plugin model is being superseded by DRA (GA in v1.34) for richer, constraint-based allocation.
Where it sits
| Layer | Role | Note |
|---|---|---|
| Device Plugin | Advertise + allocate GPUs to the scheduler | The foundation |
| GPU Operator | Installs the plugin + driver + monitoring | Bundles it |
| HAMi / MIG | Isolated sharing | Beyond time-slicing |
| DRA | Constraint-based native allocation | The successor |
References
- Device plugin framework — the kubelet API.
- NVIDIA/k8s-device-plugin — source + config.
- Time-slicing GPUs — oversubscription setup.
Extra reads
- Improving GPU utilization — NVIDIA on sharing.
- DIY GPU sharing — time-slicing, MIG, workarounds.
- Advanced device plugin usage — troubleshooting.
Verified against kubernetes.io and the NVIDIA k8s-device-plugin docs, May 2026.