LLM Provider Setup
Claude Code (the AI agent runtime) needs access to an LLM. You can use Anthropic's cloud API, a self-hosted model via LiteLLM, or a combination.
What this page covers
- Cloud API: Anthropic Claude (recommended for agents)
- Self-hosted inference: when and why to run your own models
- LiteLLM gateway as a unified API layer
- Provider configuration in Claude Code
Cloud API (recommended)
The Anthropic API provides access to Claude models — the same models used to build the agent pipeline. This is the easiest path and recommended for the agent workflow.
Requirements:
- An Anthropic account
- An API key
Set the key as an environment variable or in Claude Code's config:
export ANTHROPIC_API_KEY=sk-ant-...
Self-hosted inference
Self-hosted inference lets you run open models locally without per-token costs. This is useful for:
- Cost reduction on high-volume tasks
- Air-gapped environments
- Experimentation with quantized models
See Self-Hosted Inference with llama.cpp for setup details.
LiteLLM gateway
LiteLLM presents a unified OpenAI-compatible API in front of multiple providers. Use it when you want to:
- Route different workloads to different models (e.g., agents to Claude, batch tasks to a local model)
- Add rate limiting, logging, and cost tracking
- Switch providers without changing agent code
See LiteLLM Gateway for setup details.