// AI NATIVE STACK

AI Native › AI Native Infra › Workload Runtime › Ray

CRASH COURSE · AI-NATIVE · intermediate · 10 min read · 2.x

Ray — scale Python from your laptop to a thousand GPUs.

workload-runtime ai-native ray distributed python

TL;DR — Ray is an AI compute engine: a distributed runtime (Ray Core) plus a set of AI libraries (Data, Train, Tune, RLlib, Serve). Add two decorators and your Python functions run in parallel across a cluster — no distributed-systems PhD required. It's the engine under a lot of modern training and inference pipelines; KubeRay is how you run it on Kubernetes.

What it is

Ray is an open-source unified framework for scaling AI and Python. At the bottom is Ray Core — a general-purpose distributed runtime built on two primitives: tasks (stateless functions) and actors (stateful workers). On top sit ML libraries that use that runtime for specific jobs. In the AI Native landscape it's the heart of AI Native Infra › Workload Runtime.

Why it exists

Scaling Python normally means rewriting it for MPI, Spark, or bespoke multiprocessing — and becoming a distributed-systems expert in the process. Ray's pitch: keep writing ordinary Python, mark what should run remotely, and let Ray handle scheduling, data movement, and fault tolerance. The same code runs on your laptop and on a 1000-node cluster.

The core model

Two decorators cover most of it — @ray.remote turns a function into a parallel task or a class into an actor; .remote() launches it and returns a future you resolve with ray.get():

import ray
ray.init()

@ray.remote
def square(x): return x * x

futures = [square.remote(i) for i in range(4)]
print(ray.get(futures))     # [0, 1, 4, 9] — computed in parallel

That's the whole mental model: decorate, call .remote(), ray.get(). Everything else is libraries built on it.

The AI libraries

LibraryFor
Ray DataScalable, framework-agnostic data loading + transformation across train/tune/predict.
Ray TrainDistributed training & fine-tuning (PyTorch, etc.).
Ray TuneHyperparameter search at scale.
RLlibReinforcement learning.
Ray ServeScalable online/LLM inference APIs.
Data Train Tune RLlib Serve Ray Core — tasks + actors (distributed runtime) laptop · VM cluster · Kubernetes (via KubeRay) · cloud

Fig 1 — AI libraries on a common Core, portable across laptop, cluster, and Kubernetes.

How you run it

Locally it's just pip install and ray.init(). For real scale you start a Ray cluster (a head node + workers) on VMs or cloud, or — most commonly in the AI Native world — on Kubernetes through the KubeRay operator, which manages the cluster lifecycle, autoscaling, and serving for you. Managed flavors exist on Vertex AI and Databricks too.

Quick start

pip install "ray[default]"          # add [train],[serve],[data],[tune],[rllib] as needed
ray start --head                    # start a local cluster head
ray status                          # see resources

From there, decorate functions with @ray.remote and scale out, or import a library (from ray import serve / train / tune) for the higher-level workflows.

When to use, when to skip

Use it when Python workloads outgrow one machine — distributed training, big hyperparameter sweeps, large batch inference, RL, or scalable model serving — and you want one engine across all of them. It's especially strong for AI patterns, reportedly far faster than Spark for those.

Skip it for single-node work that fits comfortably in memory, or pure SQL/ETL where a data-warehouse engine fits better. If you only need to serve a model, a dedicated runtime like KServe or vLLM may be simpler than standing up Ray Serve.

heads up Don't confuse Ray with KubeRay: Ray is the framework; KubeRay is the Kubernetes operator that runs it. For gang scheduling and quota on a shared cluster, KubeRay leans on Volcano/Kueue.

vs the alternatives

ToolBest forTrade-off
RayUnified distributed Python for AI (train→tune→serve)Another runtime to learn/operate
SparkLarge-scale SQL/ETL, classic big dataSlower for AI task patterns
KServeStandardized model serving onlyNot general compute
KubeflowEnd-to-end ML platform/pipelinesHeavier, more pieces

References

Extra reads

Verified against the official Ray docs (docs.ray.io), May 2026.

← AI Native Stack
© cvam — written in plaintext, served warm