TL;DR — Envoy AI Gateway puts a single, OpenAI-compatible endpoint in front of every LLM provider you use. Your apps talk to one URL; the gateway handles provider routing, automatic failover, credential injection, token-based rate limits, and cost visibility. It's a Kubernetes-native layer built on top of Envoy Gateway, driven by a handful of CRDs.
What it is
Envoy AI Gateway is an open-source AI gateway built on Envoy Proxy and Envoy Gateway. It sits between your applications and the GenAI services they call — OpenAI, Anthropic, AWS Bedrock, Google, or self-hosted models like vLLM — and gives clients one unified, OpenAI/Anthropic-compatible API to hit.
It's a CNCF-adjacent Envoy subproject (started by Tetrate and Bloomberg) and reached its first production-ready API surface in v0.6.0, with its CRDs served at v1beta1. In the AI Native landscape it lives in AI Native Infra › Gateway: the traffic-control plane for AI.
Why it exists
Once more than one app calls LLMs, the same problems show up everywhere: API keys scattered across services, no shared rate limits, no failover when a provider has an outage, no idea who's spending how much, and a rewrite every time you switch model providers.
A gateway centralizes all of that. Clients stop knowing or caring which provider answers — they hit one endpoint, and policy, security, and routing live in one place instead of in every codebase.
How it works
The gateway reads the model field from each incoming request, tags it as a header (x-ai-eg-model), and routes on that. Credentials for the chosen backend are injected at the edge, so your app never holds provider keys. Token usage is parsed from the response (OpenAI schema) and fed into rate limits and cost metrics.
Fig 1 — Apps hit one endpoint; the gateway routes, secures, and meters traffic to every provider.
The core CRDs
You configure everything declaratively with Kubernetes resources. Five do the heavy lifting:
| CRD | What it defines |
|---|---|
AIGatewayRoute | The unified API entry — match rules that route requests (by model, etc.) to backends. |
AIServiceBackend | A provider/model target (OpenAI, Bedrock, a vLLM service…). |
BackendSecurityPolicy | Injects upstream credentials (API keys, cloud auth) into requests securely. |
GatewayConfig | Gateway-wide settings tying it to Envoy Gateway. |
MCPRoute | Routes Model Context Protocol traffic to MCP servers. |
Because routes stay stable while backends are swapped behind them, you can add, combine, or fail over providers without touching client code.
Key capabilities
- Unified API — one OpenAI/Anthropic-compatible surface for all providers.
- Routing & failover — model-aware routing with automatic failover across providers and self-hosted models.
- Token-based rate limiting — limits on input/output/total tokens per model and per user, via Envoy Gateway's global rate-limit API.
- Cost & usage visibility — token usage extracted from responses for analytics and budgeting.
- Backend security — credentials injected at the edge; apps never hold provider keys.
- MCP routing — first-class support for routing to MCP servers for agent tooling.
Quick start
It rides on Envoy Gateway, so install that first, then the AI Gateway control plane — both as Helm charts:
# 1. Envoy Gateway
helm install eg oci://docker.io/envoyproxy/gateway-helm -n envoy-gateway-system --create-namespace
# 2. Envoy AI Gateway
helm install aieg oci://docker.io/envoyproxy/ai-gateway-helm -n envoy-ai-gateway-system --create-namespace
Then apply the sample route + backend and send an OpenAI-style request at the gateway address:
kubectl apply -f basic.yaml # AIGatewayRoute + AIServiceBackend + BackendSecurityPolicy
curl $GATEWAY_URL/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"hi"}]}'
The model name in the request body is what the route matches on — switch gpt-4o-mini to a Bedrock or self-hosted model and the gateway re-routes, no client change.
When to use, when to skip
Use it when you run on Kubernetes, already use (or want) Envoy/Envoy Gateway, and need centralized routing, security, and token-level rate limiting across many apps and providers. It's the cloud-native, infra-team choice.
Skip it for a single app or a quick prototype — a library like LiteLLM gives you multi-provider routing in-process with far less setup. If you're not on Kubernetes, the operational overhead of Envoy Gateway probably isn't worth it yet.
v1beta1), and it supports a subset of the full OpenAI API. Check the supported-endpoints doc before assuming a specific route exists, and pin chart versions.vs the alternatives
| Tool | Best for | Trade-off |
|---|---|---|
| Envoy AI Gateway | K8s-native, Envoy shops, infra-grade policy | Young; needs Envoy Gateway |
| LiteLLM | In-process multi-provider routing, fast start | Library, not infra policy plane |
| kgateway | Gateway-API-native, broader API gateway | Less AI-specific tuning |
| Higress | AI-native gateway with rich plugins | Different ecosystem (Istio/Higress) |
References
- Official documentation — docs home, concepts, capabilities.
- envoyproxy/ai-gateway — source, CRDs, examples.
- Getting started — full install + first route walkthrough.
- Release notes — what landed in v0.6 and the v1beta1 CRDs.
- Envoy Gateway — the base it builds on.
Extra reads
- A Reference Architecture for Adopters — the two-tier gateway design.
- Usage-based rate limiting — token limits in depth.
- Tetrate & Bloomberg's journey — why the project exists.
- Supported API endpoints — what's actually implemented.
Verified against the official Envoy AI Gateway docs (aigateway.envoyproxy.io), May 2026. Targets v0.6+ (v1beta1 CRDs).