// AI NATIVE STACK

AI Native › AI Native Infra › Orchestration and Scheduling › Kueue

CRASH COURSE · AI-NATIVE · intermediate · 10 min read · SIG

Kueue — who gets the GPUs, and when.

orchestration ai-native kueue quota kubernetes

TL;DR — Kueue is a Kubernetes-native job queueing system. It doesn't place pods (the scheduler does that) — it decides when a whole job is allowed to start, based on quotas and fairness across teams. ClusterQueue defines a resource pool, LocalQueue is a team's entry point, and cohorts let queues borrow each other's idle GPUs. It's an official Kubernetes SIG project.

What it is

Kueue is a job-queueing and quota-management layer that sits above the Kubernetes scheduler. A job submitted to Kueue waits in a queue until its quota is available; only then does Kueue admit it and let the normal scheduler place the pods. It's maintained by kubernetes-sigs — as close to "official" as add-ons get. In the AI Native landscape it's in AI Native Infra › Orchestration and Scheduling.

Why it exists

On a shared GPU cluster the real question isn't "where does this pod go" but "whose job runs now, and how much can each team consume." Without that control, whoever submits first grabs all the GPUs and everyone else starves. Kueue adds the missing governance: per-team quotas, fair sharing, priorities, and borrowing — so an expensive cluster stays both fully used and fairly used.

How it works — the objects

ObjectRole
ResourceFlavorA flavor of hardware — e.g. A100 vs H100 vs spot — so quota can target specific accelerators.
ClusterQueueCluster-scoped resource pool: how much CPU/mem/GPU of each flavor is available, plus policies.
LocalQueueNamespaced pointer a team submits to; routes their jobs to a ClusterQueue.
CohortA group of ClusterQueues that can borrow one another's unused quota.
WorkloadKueue's internal record of a queued job awaiting admission.
team A job team B job LocalQueue(ns A) LocalQueue(ns B) ClusterQueuescohort: borrowquota · fair-share schedulerplaces pods

Fig 1 — Teams submit to LocalQueues; Kueue admits jobs by quota, then the scheduler places them.

Key features

  • Quotas & fair sharing — per-team caps with weighted fairness across the cohort.
  • Borrowing & preemption — idle quota is lent to busy queues, then reclaimed (preempting borrowed jobs) when the owner needs it.
  • Partial admission & dynamic reclaim — run a job at reduced parallelism if full quota isn't free; release quota as pods finish.
  • ResourceFlavor fungibility — fall back across hardware types (e.g. spill to spot or a different GPU).
  • Broad job support — BatchJob, Kubeflow training jobs, RayJob, RayCluster, JobSet, plain Pods and Pod groups.
  • MultiKueue — dispatch jobs across multiple clusters (a 2026 focus area).

Quick start

Install the controller, then create one ClusterQueue + a LocalQueue and submit jobs against it:

kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/latest/download/manifests.yaml

kubectl get pods -n kueue-system        # controller up
kubectl get clusterqueues               # after you create one

A job opts in by labeling itself with its LocalQueue; Kueue suspends it until quota frees up, then releases it to the scheduler:

metadata:
  labels:
    kueue.x-k8s.io/queue-name: team-a-lq

When to use, when to skip

Use it when multiple teams share a GPU/TPU cluster and you need quotas, fairness, and borrowing — the classic "platform team rations scarce accelerators" problem. It layers cleanly on the existing scheduler, so adoption is low-risk.

Pair, don't pick. Kueue handles admission; it doesn't gang-schedule placement. For distributed training you often run Kueue with Volcano (Kueue for quota, Volcano for gang). For a single-tenant cluster with no quota needs, you may not need Kueue at all.

heads up Kueue decides when jobs run, not where pods land — that's still the scheduler's job. If you expected it to do topology-aware gang placement on its own, you want Volcano (or both together).

vs the alternatives

ToolBest forTrade-off
KueueJob queueing, quota, fair-share, borrowingAdmission only — not a placement scheduler
VolcanoGang scheduling + rich placement policiesA full scheduler to run
YuniKornUnified batch+service scheduling with queuesReplaces the scheduler

References

Extra reads

Verified against the official Kueue docs (kueue.sigs.k8s.io), May 2026.

← AI Native Stack
© cvam — written in plaintext, served warm