Table of Contents

LiteLLM Gateway

LiteLLM is a proxy that presents a unified OpenAI-compatible API in front of 100+ LLM providers. Deploy it to abstract away provider-specific APIs and add centralized logging and rate limiting.

What this page covers

  • LiteLLM architecture and why to use it
  • Docker Compose deployment
  • Configuring providers (Anthropic, OpenAI, local llama.cpp)
  • Model routing and fallback configuration
  • API key management for downstream clients

Why use LiteLLM

With LiteLLM, your agents always talk to the same OpenAI-compatible endpoint (http://litellm:4000/v1). You can:

  • Add a new provider without touching agent code
  • Route claude-3-5-sonnet calls to Anthropic and llama3 calls to a local llama.cpp instance
  • See all LLM spending in one dashboard

Docker Compose deployment

services:
  litellm:
    image: ghcr.io/berriai/litellm:main-latest
    ports:
      - "4000:4000"
    volumes:
      - ./litellm-config.yaml:/app/config.yaml
    command: ["--config", "/app/config.yaml", "--port", "4000"]

Provider configuration

# litellm-config.yaml
model_list:
  - model_name: claude-sonnet
    litellm_params:
      model: anthropic/claude-sonnet-4-5
      api_key: os.environ/ANTHROPIC_API_KEY

  - model_name: llama-local
    litellm_params:
      model: openai/llama3
      api_base: http://llamacpp:8080/v1
      api_key: none