TL;DR — HAMi is CNCF GPU-virtualization middleware: it lets multiple pods share one GPU with precise memory and compute slicing and hard runtime isolation — no app changes. It covers many vendors (NVIDIA, Ascend, Cambricon, Hygon…), adds binpack/spread/topology-aware scheduling and dynamic MIG, and plugs into Kubernetes, DRA, and CDI.
What it is
HAMi — Heterogeneous AI Computing Virtualization Middleware (formerly k8s-vGPU-scheduler) — is open-source cloud-native middleware that brings sharing, isolation, and device-aware scheduling of accelerators to Kubernetes. It's a CNCF Sandbox project. In the AI Native landscape it's in AI Native Infra › Accelerator and SuperPod.
Why it exists
By default a pod that requests a GPU gets the whole GPU — even if it uses 10% of it. On a cluster of expensive H100s that's enormous waste. MIG helps but only on certain cards with coarse fixed partitions. HAMi adds software GPU virtualization: hand a pod, say, 2 GB and 30% of a GPU's compute, with hard limits so a noisy neighbor can't blow past its slice — all without touching application code.
Fig 1 — Many isolated slices on one physical GPU instead of one pod hogging it.
How it works
HAMi is built from a mutating webhook (intercepts pods requesting GPU resources), a scheduler extender (device-aware placement), device plugins (advertise virtual devices), and in-container virtualization components (enforce memory/compute limits at runtime). You request a fraction via resource fields and HAMi handles slicing + isolation transparently.
Key features
- Memory & compute slicing — allocate a GPU's memory and core quota precisely, with hard isolation.
- Multi-vendor — NVIDIA, Ascend, Cambricon, Hygon, Iluvatar, MetaX, Moore Threads, and more in one workflow.
- Scheduling policies — binpack (consolidate), spread (reduce contention), topology-aware, and dynamic MIG.
- Ecosystem fit — works with Kubernetes APIs, DRA, and CDI; complements the NVIDIA GPU Operator.
- No app changes — slicing is transparent to the workload.
Quick start
Install via Helm (on a cluster that already has GPU drivers, e.g. via the GPU Operator), then request a fraction in the pod spec:
helm repo add hami-charts https://project-hami.github.io/HAMi/
helm install hami hami-charts/hami -n kube-system
resources:
limits:
nvidia.com/gpu: 1 # one shared GPU
nvidia.com/gpumem: 2000 # cap at 2000 MiB
nvidia.com/gpucores: 30 # cap at 30% compute
Multiple such pods now pack onto one physical GPU, each held to its slice.
When to use, when to skip
Use it when many small workloads (notebooks, light inference, dev) waste full GPUs, when you have mixed-vendor accelerators to manage uniformly, or when you need finer or more flexible sharing than MIG alone. It directly attacks GPU under-utilization.
Skip it when each workload genuinely needs a whole GPU (large training), or when MIG's hardware-level partitioning already meets your isolation needs. HAMi sits alongside the GPU Operator (which provides drivers), not instead of it.
vs / alongside
| Approach | Sharing model | Note |
|---|---|---|
| HAMi | Software memory/compute slicing, multi-vendor | Flexible, fine-grained |
| MIG (via GPU Operator) | Hardware partitions on A100/H100 | Strong isolation, coarse + card-limited |
| Time-slicing | Oversubscribe, no isolation | Simplest, least safe |
| DRA | Native claim-based allocation | The emerging standard layer |
References
- project-hami.io — project home.
- What is HAMi / docs — concepts & setup.
- Project-HAMi/HAMi — source (CNCF).
- CNCF project page — maturity + landscape.
Extra reads
- How HAMi solves the GPU utilization crisis — the case for sharing.
- Heterogeneous accelerator sharing — multi-vendor in practice.
- Setting up HAMi (Crusoe/L40s) — a real deployment.
Verified against the official HAMi docs (project-hami.io) and CNCF sources, May 2026.