TL;DR — kgateway (formerly Gloo Gateway, now a CNCF project) is a mature, Envoy-based Kubernetes Gateway API implementation that added a full AI Gateway mode: unified LLM API, prompt guards, LLM auth, and — its standout — inference-aware routing that picks the best model-server pod by watching GPU memory and queue depth. Agent/MCP traffic now lives in its sibling, agentgateway.
What it is
kgateway is a cloud-native API gateway and AI gateway. It's the control plane for a widely-deployed Envoy data plane, configured entirely through the Kubernetes Gateway API. Born as Gloo in 2018, it was renamed and accepted into the CNCF (Sandbox) in March 2025, bringing seven years of production history with it.
In the AI Native landscape it sits in AI Native Infra › Gateway — but unlike a purpose-built LLM proxy, it's a general-purpose ingress/API gateway that also does AI, so it can be the single front door for all your cluster traffic.
Why it exists
Most teams already need a Kubernetes gateway for normal ingress. kgateway's pitch: don't bolt a separate AI proxy next to it — use one Envoy-based, Gateway-API-native gateway for both regular APIs and LLM traffic, with AI features layered on the same battle-tested data plane.
How it works
You configure it with standard Gateway API resources (Gateway, HTTPRoute) plus kgateway policy CRDs. The control plane translates those into Envoy config; Envoy moves the traffic. AI behavior — provider backends, prompt rules, auth — attaches as policy on the routes.
Fig 1 — One Envoy-based gateway for normal APIs + LLMs, routing by live GPU/queue signals.
AI Gateway features
- Unified LLM API — one OpenAI-compatible surface; switch providers without touching app code.
- Inference-aware routing — the headline feature. Via the Gateway API Inference Extension it watches Prometheus signals (queue depth, free GPU memory) and routes to the best model-server pod, instead of dumb round-robin.
- A/B & canary model rollouts — cohort traffic between model versions; ship a new model to a slice before full rollout.
- Prompt management & guards — pre-set/append system & user prompts per route; filter unsafe or off-policy content.
- LLM auth & backend security — handle upstream provider credentials at the gateway.
agentgateway & MCP
The agent/MCP story moved to a sibling project: agentgateway, a Rust data plane purpose-built for agent-to-agent (A2A) and agent-to-tool (MCP) connectivity, including turning existing REST APIs into agent-native tools. kgateway used to be its control plane; from v2.3 that role migrated into the agentgateway repo so kgateway can stay focused on being a rock-solid Envoy API gateway. Use kgateway for LLM/API traffic, reach for agentgateway when you're wiring up MCP/agent meshes.
Quick start
Install the Gateway API CRDs, then kgateway via Helm, and enable the AI extension:
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/latest/download/standard-install.yaml
helm upgrade -i kgateway oci://cr.kgateway.dev/kgateway-dev/charts/kgateway \
-n kgateway-system --create-namespace --set gateway.aiExtension.enabled=true
Then define a Gateway + HTTPRoute pointing at your LLM backends and send OpenAI-style requests at the gateway address. (Full backend/policy YAML in the AI Gateway docs.)
When to use, when to skip
Use it when you're standardizing on the Kubernetes Gateway API and want one mature Envoy gateway for both normal ingress and LLM traffic — especially if you self-host models on GPU and want inference-aware routing. Coming from Gloo or migrating off ingress-nginx, it's a natural landing spot.
Skip it for a quick, app-level multi-provider setup — LiteLLM is far lighter. If you specifically want a narrowly-scoped AI gateway CRD set, Envoy AI Gateway is more focused. For pure agent/MCP meshes, go to agentgateway.
vs the alternatives
| Tool | Best for | Trade-off |
|---|---|---|
| kgateway | One Gateway-API gateway for APIs + LLMs, inference-aware routing | Heavier; Gateway API learning curve |
| Envoy AI Gateway | Focused AI-only CRD surface on Envoy | Narrower scope |
| LiteLLM | Fast app-level multi-provider proxy | Not a full ingress gateway |
| Higress | Plugin-rich AI gateway | Different ecosystem |
References
- Official documentation — concepts, Gateway API, policies.
- AI Gateway docs — LLM routing, prompts, auth.
- kgateway-dev/kgateway — source + releases.
- CNCF project page — maturity + governance.
- kgateway v2.1 release — recent capabilities.
Extra reads
- CNCF project spotlight — the Gloo → kgateway story.
- agentgateway — the Rust agent/MCP data plane.
- Why traditional gateways failed AI workloads — the design argument.
- AI Gateway deep dive (2026) — cross-product comparison.
Verified against the official kgateway docs (kgateway.dev) and CNCF sources, May 2026. Targets kgateway v2.x.