// AI NATIVE STACK

AI Native › AI Native Infra › Gateway › kgateway

CRASH COURSE · AI-NATIVE · intermediate · 11 min read · v2.x

kgateway — the Gateway API gateway that grew AI superpowers.

gateway ai-native kgateway envoy kubernetes

TL;DR — kgateway (formerly Gloo Gateway, now a CNCF project) is a mature, Envoy-based Kubernetes Gateway API implementation that added a full AI Gateway mode: unified LLM API, prompt guards, LLM auth, and — its standout — inference-aware routing that picks the best model-server pod by watching GPU memory and queue depth. Agent/MCP traffic now lives in its sibling, agentgateway.

What it is

kgateway is a cloud-native API gateway and AI gateway. It's the control plane for a widely-deployed Envoy data plane, configured entirely through the Kubernetes Gateway API. Born as Gloo in 2018, it was renamed and accepted into the CNCF (Sandbox) in March 2025, bringing seven years of production history with it.

In the AI Native landscape it sits in AI Native Infra › Gateway — but unlike a purpose-built LLM proxy, it's a general-purpose ingress/API gateway that also does AI, so it can be the single front door for all your cluster traffic.

Why it exists

Most teams already need a Kubernetes gateway for normal ingress. kgateway's pitch: don't bolt a separate AI proxy next to it — use one Envoy-based, Gateway-API-native gateway for both regular APIs and LLM traffic, with AI features layered on the same battle-tested data plane.

How it works

You configure it with standard Gateway API resources (Gateway, HTTPRoute) plus kgateway policy CRDs. The control plane translates those into Envoy config; Envoy moves the traffic. AI behavior — provider backends, prompt rules, auth — attaches as policy on the routes.

clientsGateway API kgateway + Envoy unified LLM API prompt guards · auth inference-aware routing cloud LLM providers vLLM model-serverpods (GPU) agentgateway → MCP

Fig 1 — One Envoy-based gateway for normal APIs + LLMs, routing by live GPU/queue signals.

AI Gateway features

  • Unified LLM API — one OpenAI-compatible surface; switch providers without touching app code.
  • Inference-aware routing — the headline feature. Via the Gateway API Inference Extension it watches Prometheus signals (queue depth, free GPU memory) and routes to the best model-server pod, instead of dumb round-robin.
  • A/B & canary model rollouts — cohort traffic between model versions; ship a new model to a slice before full rollout.
  • Prompt management & guards — pre-set/append system & user prompts per route; filter unsafe or off-policy content.
  • LLM auth & backend security — handle upstream provider credentials at the gateway.

agentgateway & MCP

The agent/MCP story moved to a sibling project: agentgateway, a Rust data plane purpose-built for agent-to-agent (A2A) and agent-to-tool (MCP) connectivity, including turning existing REST APIs into agent-native tools. kgateway used to be its control plane; from v2.3 that role migrated into the agentgateway repo so kgateway can stay focused on being a rock-solid Envoy API gateway. Use kgateway for LLM/API traffic, reach for agentgateway when you're wiring up MCP/agent meshes.

Quick start

Install the Gateway API CRDs, then kgateway via Helm, and enable the AI extension:

kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/latest/download/standard-install.yaml

helm upgrade -i kgateway oci://cr.kgateway.dev/kgateway-dev/charts/kgateway \
  -n kgateway-system --create-namespace --set gateway.aiExtension.enabled=true

Then define a Gateway + HTTPRoute pointing at your LLM backends and send OpenAI-style requests at the gateway address. (Full backend/policy YAML in the AI Gateway docs.)

When to use, when to skip

Use it when you're standardizing on the Kubernetes Gateway API and want one mature Envoy gateway for both normal ingress and LLM traffic — especially if you self-host models on GPU and want inference-aware routing. Coming from Gloo or migrating off ingress-nginx, it's a natural landing spot.

Skip it for a quick, app-level multi-provider setup — LiteLLM is far lighter. If you specifically want a narrowly-scoped AI gateway CRD set, Envoy AI Gateway is more focused. For pure agent/MCP meshes, go to agentgateway.

heads up "kgateway" and "agentgateway" are now distinct projects — don't conflate them. Inference-aware routing depends on the Gateway API Inference Extension and your model servers exposing the right Prometheus metrics; without those it falls back to ordinary load balancing.

vs the alternatives

ToolBest forTrade-off
kgatewayOne Gateway-API gateway for APIs + LLMs, inference-aware routingHeavier; Gateway API learning curve
Envoy AI GatewayFocused AI-only CRD surface on EnvoyNarrower scope
LiteLLMFast app-level multi-provider proxyNot a full ingress gateway
HigressPlugin-rich AI gatewayDifferent ecosystem

References

Extra reads

Verified against the official kgateway docs (kgateway.dev) and CNCF sources, May 2026. Targets kgateway v2.x.

← AI Native Stack
© cvam — written in plaintext, served warm