// AI NATIVE STACK

AI Native › AI Native Infra › Workload Runtime › Kubeflow

CRASH COURSE · AI-NATIVE · intermediate · 11 min read · platform

Kubeflow — the ML platform you assemble from parts.

workload-runtime ai-native kubeflow mlops kubernetes

TL;DR — Kubeflow isn't one tool — it's a composable AI platform: a family of Kubernetes-native projects covering the whole ML lifecycle. Pipelines for workflows, Trainer for distributed training, Katib for AutoML/tuning, Notebooks for dev, Model Registry for artifacts, and KServe for serving. Deploy the full reference platform, or pick the pieces you need.

What it is

Kubeflow is the foundation of tools for building AI platforms on Kubernetes. It's modular and composable: each project stands alone, or you deploy the whole reference platform for an end-to-end MLOps stack. In the AI Native landscape it's in AI Native Infra › Workload Runtime — the broadest entry there, because it's really a platform that ties the others together.

Why it exists

The ML lifecycle — explore in a notebook, build a pipeline, train distributed, tune hyperparameters, register the model, serve it — is a dozen separate concerns. Without a platform each team wires its own glue on Kubernetes. Kubeflow provides Kubernetes-native, interoperable building blocks for every stage so the lifecycle is one coherent system, portable across clouds.

The components

ProjectStage
Notebooks / WorkspacesInteractive dev environments on the cluster.
Pipelines (KFP)Build & run portable, scalable ML workflows (DAGs).
TrainerOrchestrate distributed training/fine-tuning (PyTorch, etc.).
KatibAutoML — hyperparameter tuning, early stopping, neural architecture search.
Model Registry / HubCatalog & version models and artifacts.
KServeModel serving (its own project; see the KServe page).
Spark OperatorRun Spark data jobs in the same platform.
Notebook Pipeline Trainer Katib Registry Serve

Fig 1 — The lifecycle as composable Kubeflow projects, glued by Pipelines.

How it fits together

The connective tissue is Pipelines: you author a DAG where steps call Trainer for distributed training, Katib for tuning, then push the result to the Model Registry and deploy via KServe — all as one reproducible, parameterized workflow. Notebooks are where you prototype before promoting code into a pipeline.

Quick start

The simplest path is the manifests repo (full platform) or installing individual components. For a kick-the-tires platform:

# full reference platform (needs a real cluster)
kubectl apply -k "github.com/kubeflow/manifests/example?ref=master"
kubectl get pods -n kubeflow            # wait for everything to come up

Prefer a single component (e.g. just Pipelines or Katib)? Each project ships its own install — you don't have to take the whole platform.

When to use, when to skip

Use it when an org needs a shared, end-to-end MLOps platform on Kubernetes — multiple teams, reproducible pipelines, governance over models, all in one portable stack. It's the most complete open MLOps reference platform.

Skip it if you need one capability, not a platform — grab the single component (or a focused tool) instead of standing up all of Kubeflow. If your compute is Ray-centric, Ray + KServe may cover you with less surface area.

heads up The full platform is heavy — many controllers, an auth/ingress layer, and real cluster resources. Start with the one or two components you actually need; "install all of Kubeflow to try pipelines" is a common over-commitment.

vs the alternatives

ToolBest forTrade-off
KubeflowFull end-to-end MLOps platform on K8sHeavy; many moving parts
RayDistributed compute for train/tune/serveNot a pipeline/governance platform
KServeJust serving (also a Kubeflow component)Serving only
Managed (Vertex/SageMaker)Hands-off MLOpsCloud lock-in

References

Extra reads

Verified against the official Kubeflow docs (kubeflow.org), May 2026.

← AI Native Stack
© cvam — written in plaintext, served warm