TL;DR — LiteLLM speaks the OpenAI format to 100+ providers (OpenAI, Anthropic, Bedrock, Gemini, Azure, Ollama…). Use it as a Python SDK to normalize calls in your code, or run the Proxy — a self-hosted gateway with virtual keys, spend tracking, budgets, and load-balanced fallbacks. The fastest way to stop hard-coding one vendor.
What it is
LiteLLM is two things sharing one translation layer:
- The SDK — a Python library where
litellm.completion(...)takes OpenAI-style arguments and calls whatever provider you name, returning an OpenAI-shaped response. - The Proxy (AI Gateway) — a self-hosted server that exposes that same unified API over HTTP, plus auth, keys, budgets, logging, and routing. This is the part that belongs in AI Native Infra › Gateway.
Why it exists
Every provider has a slightly different SDK, request shape, and error format. Hard-code one and you're locked in; support several and your code fills with branches. Multiply that across a team and nobody knows who's spending what.
LiteLLM collapses all of it to one format. Apps target one interface; switching or mixing providers becomes a config change, and the proxy gives ops a single place for keys, limits, and cost.
Fig 1 — One OpenAI-shaped endpoint in front of every provider, with keys + spend in the middle.
The Proxy — the gateway features
Run as a server and you get the LLM-ops control plane:
- Virtual keys — issue
sk-...keys per app/user/team; the proxy authenticates, applies limits, logs spend, and routes. - Spend tracking — automatic per-key, per-user, per-team cost in Postgres, with provider-specific pricing (Bedrock tiers, Vertex PayGo, Azure mapping).
- Budgets & rate limits — USD caps and TPM/RPM limits per key, resettable on a window (
30s/30m/30d). - JWT → key mapping — map SSO tokens to virtual keys so every client gets the same controls.
Routing & fallbacks
Group several deployments of the same model and the built-in Router load-balances across them. simple-shuffle is the recommended production strategy; least-busy and usage-based-routing are alternatives. On failure the Router retries within the group, then falls through to a configured fallback model — so a provider outage degrades gracefully instead of erroring.
Quick start
Point a tiny config at your models, run the proxy, then call it like OpenAI:
# config.yaml
model_list:
- model_name: gpt-4o
litellm_params: { model: openai/gpt-4o }
- model_name: claude
litellm_params: { model: anthropic/claude-sonnet-4-6 }
pip install 'litellm[proxy]'
litellm --config config.yaml # serves on :4000
curl http://0.0.0.0:4000/v1/chat/completions \
-H "Authorization: Bearer sk-1234" \
-d '{"model":"claude","messages":[{"role":"user","content":"hi"}]}'
Any OpenAI client works — just change base_url to the proxy and use a virtual key. Switch "model":"claude" to "gpt-4o" and traffic re-routes, no client change.
When to use, when to skip
Use it when you want multi-provider access fast: the SDK for in-code normalization, the Proxy when several apps/teams need shared keys, budgets, and spend reporting. It's the lowest-friction AI gateway to stand up.
Skip / graduate when you need deep Kubernetes-native, Envoy-grade traffic policy at the mesh level — Envoy AI Gateway fits better there. For a single app calling one provider, the raw SDK is plenty.
/spend/report) are enterprise-tier — check before you depend on them.vs the alternatives
| Tool | Best for | Trade-off |
|---|---|---|
| LiteLLM | Fast multi-provider access, keys + spend, SDK or proxy | Needs Postgres/Redis at scale; some enterprise gating |
| Envoy AI Gateway | K8s/Envoy-native infra-grade policy | Heavier to operate |
| kgateway | Gateway-API-native routing | Less AI-specific tooling |
| Higress | Plugin-rich AI-native gateway | Different ecosystem |
References
- Official documentation — getting started, providers, SDK.
- LiteLLM Proxy (AI Gateway) — the server setup.
- Virtual keys — issuing and scoping keys.
- Routing & load balancing — strategies + fallbacks.
- BerriAI/litellm — source + issues.
Extra reads
- Spend tracking — how cost is attributed.
- Router architecture — retries/fallbacks internals.
- Fallbacks & reliability — graceful degradation.
- Enterprise features — what's gated.
Verified against the official LiteLLM docs (docs.litellm.ai), May 2026.