Table of Contents

LLM Provider Setup

Claude Code (the AI agent runtime) needs access to an LLM. You can use Anthropic's cloud API, a self-hosted model via LiteLLM, or a combination.

What this page covers

  • Cloud API: Anthropic Claude (recommended for agents)
  • Self-hosted inference: when and why to run your own models
  • LiteLLM gateway as a unified API layer
  • Provider configuration in Claude Code

The Anthropic API provides access to Claude models — the same models used to build the agent pipeline. This is the easiest path and recommended for the agent workflow.

Requirements:

Set the key as an environment variable or in Claude Code's config:

export ANTHROPIC_API_KEY=sk-ant-...

Self-hosted inference

Self-hosted inference lets you run open models locally without per-token costs. This is useful for:

  • Cost reduction on high-volume tasks
  • Air-gapped environments
  • Experimentation with quantized models

See Self-Hosted Inference with llama.cpp for setup details.

LiteLLM gateway

LiteLLM presents a unified OpenAI-compatible API in front of multiple providers. Use it when you want to:

  • Route different workloads to different models (e.g., agents to Claude, batch tasks to a local model)
  • Add rate limiting, logging, and cost tracking
  • Switch providers without changing agent code

See LiteLLM Gateway for setup details.