// AI NATIVE STACK

AI Native › AI Native Infra › Gateway › Higress

CRASH COURSE · AI-NATIVE · intermediate · 11 min read · Apache-2.0

Higress — the AI-native gateway with a plugin habit.

gateway ai-native higress envoy mcp

TL;DR — Higress is an Istio+Envoy-based API gateway, born at Alibaba, rebuilt around AI. One unified protocol fronts every LLM provider, with token quotas, semantic caching, multi-key load balancing, content safety, and grayscale rollouts. Its trick is Wasm plugins (Go/Rust/JS) hot-loaded with zero downtime — and it can turn any REST API into an MCP server.

What it is

Higress is a cloud-native, AI-native API gateway with an Istio + Envoy kernel, open-sourced under Apache 2.0. It connects to all mainstream LLM providers (international and Chinese) through one unified protocol, and extends through a rich Wasm plugin system. In the AI Native landscape it sits in AI Native Infra › Gateway as a full ingress gateway that doubles as an LLM and MCP gateway.

Why it exists

Higress was born inside Alibaba to fix two production pains: NGINX/Tengine reloads dropping long-lived connections, and weak load balancing for gRPC/Dubbo. Both get worse in the AI era — LLM streaming holds connections open for minutes. Using Envoy's xDS, Higress pushes config changes in milliseconds with no reload, which matters enormously for SSE streaming and gRPC.

How it works

The Istio control plane + Envoy data plane do the routing; behavior is layered on as Wasm plugins. Plugins run in a sandbox (memory-safe), can be written in Go/Rust/JS, version independently, and hot-update without dropping traffic. The ai-proxy plugin is what normalizes provider protocols; other plugins add quotas, caching, safety, and MCP.

clientsSSE / gRPC Higress (Envoy) Wasm plugins: ai-proxy · quota · cache safety · MCP LLM providers semantic cache REST API → MCP

Fig 1 — Envoy data plane; AI behavior added as hot-swappable Wasm plugins.

AI Gateway features

  • Unified LLM protocol — one secure endpoint, switch models behind it; flexible multi-model switching with fallback retries.
  • Token quota & rate limiting — per-key token budgets and limits; multi-API-key load balancing across keys.
  • Semantic caching — cache responses by meaning to cut cost and latency on repeat queries.
  • Content safety — compliance/guard filtering for prompts and responses.
  • Grayscale rollouts & cost auditing — canary new models; audit per-call spend.
  • Smart load balancing — newer strategies: minimum-load (Wasm), global-least-request (Redis), and prompt-prefix matching to reuse warm backends.

MCP & the marketplace

Higress hosts MCP servers through its plugin mechanism, so agents can call tools through the gateway. With openapi-to-mcp you convert an OpenAPI spec into a remote MCP server in minutes, complete with auth, rate limiting, and observability. Its open MCP Marketplace (HiMarket / mcp.higress.ai) is aimed at enterprises with many existing REST APIs that want to expose them to agents fast — "API is MCP."

Quick start

Quickest local try is the standalone Docker install; for clusters use Helm:

# standalone (Docker)
curl -fsSL https://higress.cn/ai-gateway/install.sh | bash

# Kubernetes (Helm)
helm install higress oci://higress.cn/charts/higress -n higress-system --create-namespace

Then open the console, add an LLM provider + key, and call the gateway with an OpenAI-style request. Switching providers is a console/config change, not a code change.

When to use, when to skip

Use it when you want one gateway for ingress + AI, lean on streaming/gRPC heavily, need semantic caching or content safety out of the box, or want to expose existing REST APIs as MCP servers quickly. Strong fit if you value the Wasm-plugin extensibility or run in the Alibaba/China ecosystem.

Skip it for a tiny app — LiteLLM is lighter. If you're standardizing strictly on the Kubernetes Gateway API, kgateway or Envoy AI Gateway align more closely with that spec.

heads up Higress predates and partly sits beside the Gateway API — it has its own console and Istio-style config model. Great for plugin power and streaming; a slightly different mental model if your team is all-in on vanilla Gateway API resources.

vs the alternatives

ToolBest forTrade-off
HigressWasm-plugin extensibility, streaming, semantic cache, API→MCPOwn config model; Istio footprint
kgatewayGateway-API-native, inference-aware routingGateway API learning curve
Envoy AI GatewayFocused AI-only CRDs on EnvoyYounger, narrower
LiteLLMFast app-level multi-provider proxyNot a full ingress gateway

References

Extra reads

Verified against the official Higress docs (higress.cn / higress.ai) and project sources, May 2026.

← AI Native Stack
© cvam — written in plaintext, served warm