TL;DR — Ray is the distributed-compute engine behind a lot of modern AI (training, tuning, batch + online inference). KubeRay is the Kubernetes operator that runs it: three CRDs — RayCluster, RayJob, RayService — turn "stand up a Ray cluster, run my job, serve my model" into declarative YAML, with autoscaling, fault tolerance, and zero-downtime upgrades handled for you.
What it is
Ray is an open-source framework for distributed Python — it spreads tasks and actors across a cluster, and ships libraries for training, tuning (Ray Tune), and serving (Ray Serve). KubeRay is the official operator that makes Ray a first-class Kubernetes citizen: you describe what you want as custom resources and the operator reconciles the Ray cluster for you. In the AI Native landscape it's in AI Native Infra › Orchestration and Scheduling.
Why it exists
Running Ray by hand on Kubernetes means wiring head/worker pods, services, autoscaling, and failure recovery yourself — and redoing it every time. KubeRay encodes all that operational knowledge into an operator, so a Ray cluster becomes as easy to declare as a Deployment, and integrates with the rest of the K8s ecosystem (schedulers, monitoring, ingress).
The three CRDs
| CRD | What it manages |
|---|---|
RayCluster | The full lifecycle of a Ray cluster — head + worker groups, autoscaling, fault tolerance. The base everything else builds on. |
RayJob | Creates a RayCluster, submits a job when it's ready, and (optionally) tears the cluster down when the job finishes. Ideal for training/batch runs. |
RayService | A RayCluster + a Ray Serve deployment graph, with high availability and zero-downtime upgrades. For online/LLM inference. |
Fig 1 — One operator reconciles Ray clusters for jobs, compute, and serving.
Ecosystem integration
KubeRay slots into the rest of the stack rather than reinventing it: pair it with a batch scheduler (Volcano, YuniKorn) or queueing (Kueue) for gang scheduling and quota; export metrics to Prometheus/Grafana and profile with py-spy; front RayService with an ingress or gateway. Since v1.3 the kubectl ray plugin smooths common workflows.
Quick start
Install the operator with Helm, then apply a RayCluster (or RayJob/RayService):
helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm install kuberay-operator kuberay/kuberay-operator -n ray-system --create-namespace
kubectl ray create cluster my-cluster # kubectl ray plugin (v1.3+)
kubectl get rayclusters
For a one-shot training run you'd apply a RayJob with shutdownAfterJobFinishes: true so the cluster is created, used, and reclaimed automatically.
When to use, when to skip
Use it when your AI workload is already (or should be) a Ray app — distributed training, hyperparameter sweeps, large batch inference, or Ray Serve for online/LLM inference — and you run on Kubernetes. RayService's zero-downtime upgrades make it a strong serving base.
Skip it if you're not using Ray — a plain Deployment or a dedicated serving runtime like KServe may be simpler. And remember KubeRay manages Ray clusters; it leans on Volcano/Kueue for cluster-wide gang scheduling and quota.
vs the alternatives
| Tool | Best for | Trade-off |
|---|---|---|
| KubeRay | Running Ray apps (train/tune/serve) on K8s | Only if you're on Ray |
| KServe | Standardized model serving/inference | Not general distributed compute |
| Kubeflow | Broader ML platform / pipelines | Heavier, more components |
| Volcano / Kueue | Scheduling/quota under KubeRay | Complementary, not replacements |
References
- Ray on Kubernetes (KubeRay) — official docs.
- Getting started — install + first cluster.
- RayJob quickstart — run a job end to end.
- ray-project/kuberay — source + examples.
Extra reads
- Ray on GKE overview — managed perspective.
- KubeRay on ACK — distributed AI architecture.
- Ray Serve — the serving layer behind RayService.
Verified against the official Ray/KubeRay docs (docs.ray.io), May 2026.