LLM Provider Setup

Claude Code (the AI agent runtime) needs access to an LLM. You can use Anthropic's cloud API, a self-hosted model via LiteLLM, or a combination.

What this page covers

Cloud API: Anthropic Claude (recommended for agents)
Self-hosted inference: when and why to run your own models
LiteLLM gateway as a unified API layer
Provider configuration in Claude Code

Cloud API (recommended)

The Anthropic API provides access to Claude models — the same models used to build the agent pipeline. This is the easiest path and recommended for the agent workflow.

Requirements:

An Anthropic account
An API key

Set the key as an environment variable or in Claude Code's config:

export ANTHROPIC_API_KEY=sk-ant-...

Self-hosted inference

Self-hosted inference lets you run open models locally without per-token costs. This is useful for:

Cost reduction on high-volume tasks
Air-gapped environments
Experimentation with quantized models

See Self-Hosted Inference with llama.cpp for setup details.

LiteLLM gateway

LiteLLM presents a unified OpenAI-compatible API in front of multiple providers. Use it when you want to:

Route different workloads to different models (e.g., agents to Claude, batch tasks to a local model)
Add rate limiting, logging, and cost tracking
Switch providers without changing agent code

See LiteLLM Gateway for setup details.

RTK (Rust Token Killer)

Table of Contents

LLM Provider Setup

What this page covers

Cloud API (recommended)

Self-hosted inference

LiteLLM gateway

Related reference docs