TL;DR — Higress is an Istio+Envoy-based API gateway, born at Alibaba, rebuilt around AI. One unified protocol fronts every LLM provider, with token quotas, semantic caching, multi-key load balancing, content safety, and grayscale rollouts. Its trick is Wasm plugins (Go/Rust/JS) hot-loaded with zero downtime — and it can turn any REST API into an MCP server.
What it is
Higress is a cloud-native, AI-native API gateway with an Istio + Envoy kernel, open-sourced under Apache 2.0. It connects to all mainstream LLM providers (international and Chinese) through one unified protocol, and extends through a rich Wasm plugin system. In the AI Native landscape it sits in AI Native Infra › Gateway as a full ingress gateway that doubles as an LLM and MCP gateway.
Why it exists
Higress was born inside Alibaba to fix two production pains: NGINX/Tengine reloads dropping long-lived connections, and weak load balancing for gRPC/Dubbo. Both get worse in the AI era — LLM streaming holds connections open for minutes. Using Envoy's xDS, Higress pushes config changes in milliseconds with no reload, which matters enormously for SSE streaming and gRPC.
How it works
The Istio control plane + Envoy data plane do the routing; behavior is layered on as Wasm plugins. Plugins run in a sandbox (memory-safe), can be written in Go/Rust/JS, version independently, and hot-update without dropping traffic. The ai-proxy plugin is what normalizes provider protocols; other plugins add quotas, caching, safety, and MCP.
Fig 1 — Envoy data plane; AI behavior added as hot-swappable Wasm plugins.
AI Gateway features
- Unified LLM protocol — one secure endpoint, switch models behind it; flexible multi-model switching with fallback retries.
- Token quota & rate limiting — per-key token budgets and limits; multi-API-key load balancing across keys.
- Semantic caching — cache responses by meaning to cut cost and latency on repeat queries.
- Content safety — compliance/guard filtering for prompts and responses.
- Grayscale rollouts & cost auditing — canary new models; audit per-call spend.
- Smart load balancing — newer strategies: minimum-load (Wasm), global-least-request (Redis), and prompt-prefix matching to reuse warm backends.
MCP & the marketplace
Higress hosts MCP servers through its plugin mechanism, so agents can call tools through the gateway. With openapi-to-mcp you convert an OpenAPI spec into a remote MCP server in minutes, complete with auth, rate limiting, and observability. Its open MCP Marketplace (HiMarket / mcp.higress.ai) is aimed at enterprises with many existing REST APIs that want to expose them to agents fast — "API is MCP."
Quick start
Quickest local try is the standalone Docker install; for clusters use Helm:
# standalone (Docker)
curl -fsSL https://higress.cn/ai-gateway/install.sh | bash
# Kubernetes (Helm)
helm install higress oci://higress.cn/charts/higress -n higress-system --create-namespace
Then open the console, add an LLM provider + key, and call the gateway with an OpenAI-style request. Switching providers is a console/config change, not a code change.
When to use, when to skip
Use it when you want one gateway for ingress + AI, lean on streaming/gRPC heavily, need semantic caching or content safety out of the box, or want to expose existing REST APIs as MCP servers quickly. Strong fit if you value the Wasm-plugin extensibility or run in the Alibaba/China ecosystem.
Skip it for a tiny app — LiteLLM is lighter. If you're standardizing strictly on the Kubernetes Gateway API, kgateway or Envoy AI Gateway align more closely with that spec.
vs the alternatives
| Tool | Best for | Trade-off |
|---|---|---|
| Higress | Wasm-plugin extensibility, streaming, semantic cache, API→MCP | Own config model; Istio footprint |
| kgateway | Gateway-API-native, inference-aware routing | Gateway API learning curve |
| Envoy AI Gateway | Focused AI-only CRDs on Envoy | Younger, narrower |
| LiteLLM | Fast app-level multi-provider proxy | Not a full ingress gateway |
References
- What is Higress — official overview + docs.
- higress.ai — AI gateway product site.
- alibaba/higress — source, plugins, releases.
- HiMarket / MCP Marketplace — API-to-MCP marketplace.
Extra reads
- API is MCP — the marketplace launch — the REST→MCP pitch.
- From ingress-nginx to Higress — migration story.
- Beyond NGINX Ingress — why reloads hurt AI.
- Higress notes — Jimmy Song — concise architecture take.
Verified against the official Higress docs (higress.cn / higress.ai) and project sources, May 2026.