// AI NATIVE STACK

AI Native › AI Native Infra › Accelerator and SuperPod › HAMi

CRASH COURSE · AI-NATIVE · intermediate · 10 min read · CNCF

HAMi — slice one GPU across many pods, with hard isolation.

accelerator ai-native hami gpu-sharing kubernetes

TL;DR — HAMi is CNCF GPU-virtualization middleware: it lets multiple pods share one GPU with precise memory and compute slicing and hard runtime isolation — no app changes. It covers many vendors (NVIDIA, Ascend, Cambricon, Hygon…), adds binpack/spread/topology-aware scheduling and dynamic MIG, and plugs into Kubernetes, DRA, and CDI.

What it is

HAMi — Heterogeneous AI Computing Virtualization Middleware (formerly k8s-vGPU-scheduler) — is open-source cloud-native middleware that brings sharing, isolation, and device-aware scheduling of accelerators to Kubernetes. It's a CNCF Sandbox project. In the AI Native landscape it's in AI Native Infra › Accelerator and SuperPod.

Why it exists

By default a pod that requests a GPU gets the whole GPU — even if it uses 10% of it. On a cluster of expensive H100s that's enormous waste. MIG helps but only on certain cards with coarse fixed partitions. HAMi adds software GPU virtualization: hand a pod, say, 2 GB and 30% of a GPU's compute, with hard limits so a noisy neighbor can't blow past its slice — all without touching application code.

without HAMi — one pod owns the whole GPU pod Aidle GPU capacity with HAMi — sliced, isolated pod Apod Bpod C

Fig 1 — Many isolated slices on one physical GPU instead of one pod hogging it.

How it works

HAMi is built from a mutating webhook (intercepts pods requesting GPU resources), a scheduler extender (device-aware placement), device plugins (advertise virtual devices), and in-container virtualization components (enforce memory/compute limits at runtime). You request a fraction via resource fields and HAMi handles slicing + isolation transparently.

Key features

  • Memory & compute slicing — allocate a GPU's memory and core quota precisely, with hard isolation.
  • Multi-vendor — NVIDIA, Ascend, Cambricon, Hygon, Iluvatar, MetaX, Moore Threads, and more in one workflow.
  • Scheduling policies — binpack (consolidate), spread (reduce contention), topology-aware, and dynamic MIG.
  • Ecosystem fit — works with Kubernetes APIs, DRA, and CDI; complements the NVIDIA GPU Operator.
  • No app changes — slicing is transparent to the workload.

Quick start

Install via Helm (on a cluster that already has GPU drivers, e.g. via the GPU Operator), then request a fraction in the pod spec:

helm repo add hami-charts https://project-hami.github.io/HAMi/
helm install hami hami-charts/hami -n kube-system
resources:
  limits:
    nvidia.com/gpu: 1            # one shared GPU
    nvidia.com/gpumem: 2000      # cap at 2000 MiB
    nvidia.com/gpucores: 30      # cap at 30% compute

Multiple such pods now pack onto one physical GPU, each held to its slice.

When to use, when to skip

Use it when many small workloads (notebooks, light inference, dev) waste full GPUs, when you have mixed-vendor accelerators to manage uniformly, or when you need finer or more flexible sharing than MIG alone. It directly attacks GPU under-utilization.

Skip it when each workload genuinely needs a whole GPU (large training), or when MIG's hardware-level partitioning already meets your isolation needs. HAMi sits alongside the GPU Operator (which provides drivers), not instead of it.

heads up Software isolation is strong but not identical to MIG's hardware isolation — for strict multi-tenant security boundaries, weigh MIG vs HAMi slicing. And sharing trades isolation for utilization: size slices so co-located pods don't starve each other.

vs / alongside

ApproachSharing modelNote
HAMiSoftware memory/compute slicing, multi-vendorFlexible, fine-grained
MIG (via GPU Operator)Hardware partitions on A100/H100Strong isolation, coarse + card-limited
Time-slicingOversubscribe, no isolationSimplest, least safe
DRANative claim-based allocationThe emerging standard layer

References

Extra reads

Verified against the official HAMi docs (project-hami.io) and CNCF sources, May 2026.

← AI Native Stack
© cvam — written in plaintext, served warm