LiteLLM Gateway
LiteLLM is a proxy that presents a unified OpenAI-compatible API in front of 100+ LLM providers. Deploy it to abstract away provider-specific APIs and add centralized logging and rate limiting.
What this page covers
- LiteLLM architecture and why to use it
- Docker Compose deployment
- Configuring providers (Anthropic, OpenAI, local llama.cpp)
- Model routing and fallback configuration
- API key management for downstream clients
Why use LiteLLM
With LiteLLM, your agents always talk to the same OpenAI-compatible endpoint (http://litellm:4000/v1). You can:
- Add a new provider without touching agent code
- Route
claude-3-5-sonnetcalls to Anthropic andllama3calls to a local llama.cpp instance - See all LLM spending in one dashboard
Docker Compose deployment
services:
litellm:
image: ghcr.io/berriai/litellm:main-latest
ports:
- "4000:4000"
volumes:
- ./litellm-config.yaml:/app/config.yaml
command: ["--config", "/app/config.yaml", "--port", "4000"]
Provider configuration
# litellm-config.yaml
model_list:
- model_name: claude-sonnet
litellm_params:
model: anthropic/claude-sonnet-4-5
api_key: os.environ/ANTHROPIC_API_KEY
- model_name: llama-local
litellm_params:
model: openai/llama3
api_base: http://llamacpp:8080/v1
api_key: none