TL;DR — Ray is an AI compute engine: a distributed runtime (Ray Core) plus a set of AI libraries (Data, Train, Tune, RLlib, Serve). Add two decorators and your Python functions run in parallel across a cluster — no distributed-systems PhD required. It's the engine under a lot of modern training and inference pipelines; KubeRay is how you run it on Kubernetes.
What it is
Ray is an open-source unified framework for scaling AI and Python. At the bottom is Ray Core — a general-purpose distributed runtime built on two primitives: tasks (stateless functions) and actors (stateful workers). On top sit ML libraries that use that runtime for specific jobs. In the AI Native landscape it's the heart of AI Native Infra › Workload Runtime.
Why it exists
Scaling Python normally means rewriting it for MPI, Spark, or bespoke multiprocessing — and becoming a distributed-systems expert in the process. Ray's pitch: keep writing ordinary Python, mark what should run remotely, and let Ray handle scheduling, data movement, and fault tolerance. The same code runs on your laptop and on a 1000-node cluster.
The core model
Two decorators cover most of it — @ray.remote turns a function into a parallel task or a class into an actor; .remote() launches it and returns a future you resolve with ray.get():
import ray
ray.init()
@ray.remote
def square(x): return x * x
futures = [square.remote(i) for i in range(4)]
print(ray.get(futures)) # [0, 1, 4, 9] — computed in parallel
That's the whole mental model: decorate, call .remote(), ray.get(). Everything else is libraries built on it.
The AI libraries
| Library | For |
|---|---|
| Ray Data | Scalable, framework-agnostic data loading + transformation across train/tune/predict. |
| Ray Train | Distributed training & fine-tuning (PyTorch, etc.). |
| Ray Tune | Hyperparameter search at scale. |
| RLlib | Reinforcement learning. |
| Ray Serve | Scalable online/LLM inference APIs. |
Fig 1 — AI libraries on a common Core, portable across laptop, cluster, and Kubernetes.
How you run it
Locally it's just pip install and ray.init(). For real scale you start a Ray cluster (a head node + workers) on VMs or cloud, or — most commonly in the AI Native world — on Kubernetes through the KubeRay operator, which manages the cluster lifecycle, autoscaling, and serving for you. Managed flavors exist on Vertex AI and Databricks too.
Quick start
pip install "ray[default]" # add [train],[serve],[data],[tune],[rllib] as needed
ray start --head # start a local cluster head
ray status # see resources
From there, decorate functions with @ray.remote and scale out, or import a library (from ray import serve / train / tune) for the higher-level workflows.
When to use, when to skip
Use it when Python workloads outgrow one machine — distributed training, big hyperparameter sweeps, large batch inference, RL, or scalable model serving — and you want one engine across all of them. It's especially strong for AI patterns, reportedly far faster than Spark for those.
Skip it for single-node work that fits comfortably in memory, or pure SQL/ETL where a data-warehouse engine fits better. If you only need to serve a model, a dedicated runtime like KServe or vLLM may be simpler than standing up Ray Serve.
vs the alternatives
| Tool | Best for | Trade-off |
|---|---|---|
| Ray | Unified distributed Python for AI (train→tune→serve) | Another runtime to learn/operate |
| Spark | Large-scale SQL/ETL, classic big data | Slower for AI task patterns |
| KServe | Standardized model serving only | Not general compute |
| Kubeflow | End-to-end ML platform/pipelines | Heavier, more pieces |
References
- Ray overview — official docs.
- Ray Core walkthrough — tasks & actors.
- ray-project/ray — source.
- Ray Serve — online/LLM inference.
Extra reads
- Ray (OSDI '18 paper) — the original design.
- Ray on Vertex AI — managed on GCP.
- Ray on Databricks — managed alongside Spark.
Verified against the official Ray docs (docs.ray.io), May 2026.