// AI NATIVE STACK

AI Native › AI Native Infra › Orchestration and Scheduling › KubeRay

CRASH COURSE · AI-NATIVE · intermediate · 10 min read · operator

KubeRay — run Ray on Kubernetes without babysitting it.

orchestration ai-native kuberay ray kubernetes

TL;DR — Ray is the distributed-compute engine behind a lot of modern AI (training, tuning, batch + online inference). KubeRay is the Kubernetes operator that runs it: three CRDs — RayCluster, RayJob, RayService — turn "stand up a Ray cluster, run my job, serve my model" into declarative YAML, with autoscaling, fault tolerance, and zero-downtime upgrades handled for you.

What it is

Ray is an open-source framework for distributed Python — it spreads tasks and actors across a cluster, and ships libraries for training, tuning (Ray Tune), and serving (Ray Serve). KubeRay is the official operator that makes Ray a first-class Kubernetes citizen: you describe what you want as custom resources and the operator reconciles the Ray cluster for you. In the AI Native landscape it's in AI Native Infra › Orchestration and Scheduling.

Why it exists

Running Ray by hand on Kubernetes means wiring head/worker pods, services, autoscaling, and failure recovery yourself — and redoing it every time. KubeRay encodes all that operational knowledge into an operator, so a Ray cluster becomes as easy to declare as a Deployment, and integrates with the rest of the K8s ecosystem (schedulers, monitoring, ingress).

The three CRDs

CRDWhat it manages
RayClusterThe full lifecycle of a Ray cluster — head + worker groups, autoscaling, fault tolerance. The base everything else builds on.
RayJobCreates a RayCluster, submits a job when it's ready, and (optionally) tears the cluster down when the job finishes. Ideal for training/batch runs.
RayServiceA RayCluster + a Ray Serve deployment graph, with high availability and zero-downtime upgrades. For online/LLM inference.
KubeRayoperator RayJob → train/batch RayCluster → compute RayService → serve head + workersautoscale · GPUfault tolerant

Fig 1 — One operator reconciles Ray clusters for jobs, compute, and serving.

Ecosystem integration

KubeRay slots into the rest of the stack rather than reinventing it: pair it with a batch scheduler (Volcano, YuniKorn) or queueing (Kueue) for gang scheduling and quota; export metrics to Prometheus/Grafana and profile with py-spy; front RayService with an ingress or gateway. Since v1.3 the kubectl ray plugin smooths common workflows.

Quick start

Install the operator with Helm, then apply a RayCluster (or RayJob/RayService):

helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm install kuberay-operator kuberay/kuberay-operator -n ray-system --create-namespace

kubectl ray create cluster my-cluster      # kubectl ray plugin (v1.3+)
kubectl get rayclusters

For a one-shot training run you'd apply a RayJob with shutdownAfterJobFinishes: true so the cluster is created, used, and reclaimed automatically.

When to use, when to skip

Use it when your AI workload is already (or should be) a Ray app — distributed training, hyperparameter sweeps, large batch inference, or Ray Serve for online/LLM inference — and you run on Kubernetes. RayService's zero-downtime upgrades make it a strong serving base.

Skip it if you're not using Ray — a plain Deployment or a dedicated serving runtime like KServe may be simpler. And remember KubeRay manages Ray clusters; it leans on Volcano/Kueue for cluster-wide gang scheduling and quota.

heads up KubeRay autoscaling is its own thing (Ray's autoscaler driving worker groups), distinct from the cluster autoscaler and HPA — understand which layer is adding nodes vs pods. And match the Ray version in your images to what the operator expects.

vs the alternatives

ToolBest forTrade-off
KubeRayRunning Ray apps (train/tune/serve) on K8sOnly if you're on Ray
KServeStandardized model serving/inferenceNot general distributed compute
KubeflowBroader ML platform / pipelinesHeavier, more components
Volcano / KueueScheduling/quota under KubeRayComplementary, not replacements

References

Extra reads

Verified against the official Ray/KubeRay docs (docs.ray.io), May 2026.

← AI Native Stack
© cvam — written in plaintext, served warm