TL;DR — Kubeflow isn't one tool — it's a composable AI platform: a family of Kubernetes-native projects covering the whole ML lifecycle. Pipelines for workflows, Trainer for distributed training, Katib for AutoML/tuning, Notebooks for dev, Model Registry for artifacts, and KServe for serving. Deploy the full reference platform, or pick the pieces you need.
What it is
Kubeflow is the foundation of tools for building AI platforms on Kubernetes. It's modular and composable: each project stands alone, or you deploy the whole reference platform for an end-to-end MLOps stack. In the AI Native landscape it's in AI Native Infra › Workload Runtime — the broadest entry there, because it's really a platform that ties the others together.
Why it exists
The ML lifecycle — explore in a notebook, build a pipeline, train distributed, tune hyperparameters, register the model, serve it — is a dozen separate concerns. Without a platform each team wires its own glue on Kubernetes. Kubeflow provides Kubernetes-native, interoperable building blocks for every stage so the lifecycle is one coherent system, portable across clouds.
The components
| Project | Stage |
|---|---|
| Notebooks / Workspaces | Interactive dev environments on the cluster. |
| Pipelines (KFP) | Build & run portable, scalable ML workflows (DAGs). |
| Trainer | Orchestrate distributed training/fine-tuning (PyTorch, etc.). |
| Katib | AutoML — hyperparameter tuning, early stopping, neural architecture search. |
| Model Registry / Hub | Catalog & version models and artifacts. |
| KServe | Model serving (its own project; see the KServe page). |
| Spark Operator | Run Spark data jobs in the same platform. |
Fig 1 — The lifecycle as composable Kubeflow projects, glued by Pipelines.
How it fits together
The connective tissue is Pipelines: you author a DAG where steps call Trainer for distributed training, Katib for tuning, then push the result to the Model Registry and deploy via KServe — all as one reproducible, parameterized workflow. Notebooks are where you prototype before promoting code into a pipeline.
Quick start
The simplest path is the manifests repo (full platform) or installing individual components. For a kick-the-tires platform:
# full reference platform (needs a real cluster)
kubectl apply -k "github.com/kubeflow/manifests/example?ref=master"
kubectl get pods -n kubeflow # wait for everything to come up
Prefer a single component (e.g. just Pipelines or Katib)? Each project ships its own install — you don't have to take the whole platform.
When to use, when to skip
Use it when an org needs a shared, end-to-end MLOps platform on Kubernetes — multiple teams, reproducible pipelines, governance over models, all in one portable stack. It's the most complete open MLOps reference platform.
Skip it if you need one capability, not a platform — grab the single component (or a focused tool) instead of standing up all of Kubeflow. If your compute is Ray-centric, Ray + KServe may cover you with less surface area.
vs the alternatives
| Tool | Best for | Trade-off |
|---|---|---|
| Kubeflow | Full end-to-end MLOps platform on K8s | Heavy; many moving parts |
| Ray | Distributed compute for train/tune/serve | Not a pipeline/governance platform |
| KServe | Just serving (also a Kubeflow component) | Serving only |
| Managed (Vertex/SageMaker) | Hands-off MLOps | Cloud lock-in |
References
- kubeflow.org — project home.
- Architecture — how the projects compose.
- Katib overview — AutoML/tuning.
- github.com/kubeflow — the project org.
Extra reads
- Introduction — concepts & lifecycle.
- Getting started with Kubeflow — hands-on.
- Running ML on K8s with Kubeflow — beginner's guide.
Verified against the official Kubeflow docs (kubeflow.org), May 2026.