// AI NATIVE STACK

AI Native › AI Native Infra › Gateway › LiteLLM

CRASH COURSE · AI-NATIVE · beginner · 10 min read · proxy

LiteLLM — call 100+ LLMs with one OpenAI-shaped API.

gateway ai-native litellm llm-ops python

TL;DR — LiteLLM speaks the OpenAI format to 100+ providers (OpenAI, Anthropic, Bedrock, Gemini, Azure, Ollama…). Use it as a Python SDK to normalize calls in your code, or run the Proxy — a self-hosted gateway with virtual keys, spend tracking, budgets, and load-balanced fallbacks. The fastest way to stop hard-coding one vendor.

What it is

LiteLLM is two things sharing one translation layer:

  • The SDK — a Python library where litellm.completion(...) takes OpenAI-style arguments and calls whatever provider you name, returning an OpenAI-shaped response.
  • The Proxy (AI Gateway) — a self-hosted server that exposes that same unified API over HTTP, plus auth, keys, budgets, logging, and routing. This is the part that belongs in AI Native Infra › Gateway.

Why it exists

Every provider has a slightly different SDK, request shape, and error format. Hard-code one and you're locked in; support several and your code fills with branches. Multiply that across a team and nobody knows who's spending what.

LiteLLM collapses all of it to one format. Apps target one interface; switching or mixing providers becomes a config change, and the proxy gives ops a single place for keys, limits, and cost.

apps sk-... key LiteLLM Proxy virtual keys · budgets spend log · routing retries · fallbacks OpenAI · Azure Anthropic · Bedrock Gemini · Ollama

Fig 1 — One OpenAI-shaped endpoint in front of every provider, with keys + spend in the middle.

The Proxy — the gateway features

Run as a server and you get the LLM-ops control plane:

  • Virtual keys — issue sk-... keys per app/user/team; the proxy authenticates, applies limits, logs spend, and routes.
  • Spend tracking — automatic per-key, per-user, per-team cost in Postgres, with provider-specific pricing (Bedrock tiers, Vertex PayGo, Azure mapping).
  • Budgets & rate limits — USD caps and TPM/RPM limits per key, resettable on a window (30s/30m/30d).
  • JWT → key mapping — map SSO tokens to virtual keys so every client gets the same controls.

Routing & fallbacks

Group several deployments of the same model and the built-in Router load-balances across them. simple-shuffle is the recommended production strategy; least-busy and usage-based-routing are alternatives. On failure the Router retries within the group, then falls through to a configured fallback model — so a provider outage degrades gracefully instead of erroring.

Quick start

Point a tiny config at your models, run the proxy, then call it like OpenAI:

# config.yaml
model_list:
  - model_name: gpt-4o
    litellm_params: { model: openai/gpt-4o }
  - model_name: claude
    litellm_params: { model: anthropic/claude-sonnet-4-6 }
pip install 'litellm[proxy]'
litellm --config config.yaml          # serves on :4000

curl http://0.0.0.0:4000/v1/chat/completions \
  -H "Authorization: Bearer sk-1234" \
  -d '{"model":"claude","messages":[{"role":"user","content":"hi"}]}'

Any OpenAI client works — just change base_url to the proxy and use a virtual key. Switch "model":"claude" to "gpt-4o" and traffic re-routes, no client change.

When to use, when to skip

Use it when you want multi-provider access fast: the SDK for in-code normalization, the Proxy when several apps/teams need shared keys, budgets, and spend reporting. It's the lowest-friction AI gateway to stand up.

Skip / graduate when you need deep Kubernetes-native, Envoy-grade traffic policy at the mesh level — Envoy AI Gateway fits better there. For a single app calling one provider, the raw SDK is plenty.

heads up The proxy needs Postgres (keys/spend) and Redis (shared rate-limit/cooldown state across replicas) to run highly-available. Some advanced controls (tag budgets, log export, /spend/report) are enterprise-tier — check before you depend on them.

vs the alternatives

ToolBest forTrade-off
LiteLLMFast multi-provider access, keys + spend, SDK or proxyNeeds Postgres/Redis at scale; some enterprise gating
Envoy AI GatewayK8s/Envoy-native infra-grade policyHeavier to operate
kgatewayGateway-API-native routingLess AI-specific tooling
HigressPlugin-rich AI-native gatewayDifferent ecosystem

References

Extra reads

Verified against the official LiteLLM docs (docs.litellm.ai), May 2026.

← AI Native Stack
© cvam — written in plaintext, served warm