Claude API

Anthropic Claude API reference — messages, models, vision, tool use, streaming, and more.

Anthropic Claude API - Core Documentation

Condensed reference for building with the Claude API. Covers models, authentication, API calls, and all core features.


Table of Contents

  1. Authentication & Setup
  2. Models & Pricing
  3. Client SDKs
  4. Messages API Reference
  5. Streaming
  6. Tool Use
  7. Vision (Images)
  8. Structured Outputs
  9. Extended Thinking
  10. Prompt Caching
  11. Batch Processing
  12. Errors
  13. Rate Limits

Authentication & Setup

API Key

Get your key from the Claude Console.

export ANTHROPIC_API_KEY='your-api-key-here'

Required Headers

HeaderValue
x-api-keyYour API key
anthropic-version2023-06-01
content-typeapplication/json

Base URL

https://api.anthropic.com/v1/messages

Models & Pricing

Current Models

ModelAPI IDAliasInputOutputContextMax Output
Opus 4.6claude-opus-4-6claude-opus-4-6$5/MTok$25/MTok200K (1M beta)128K
Sonnet 4.5claude-sonnet-4-5-20250929claude-sonnet-4-5$3/MTok$15/MTok200K (1M beta)64K
Haiku 4.5claude-haiku-4-5-20251001claude-haiku-4-5$1/MTok$5/MTok200K64K

Legacy Models (still available)

ModelAPI IDAliasInputOutput
Opus 4.5claude-opus-4-5-20251101claude-opus-4-5$5/MTok$25/MTok
Opus 4.1claude-opus-4-1-20250805claude-opus-4-1$15/MTok$75/MTok
Sonnet 4claude-sonnet-4-20250514claude-sonnet-4-0$3/MTok$15/MTok
Opus 4claude-opus-4-20250514claude-opus-4-0$15/MTok$75/MTok
Haiku 3claude-3-haiku-20240307$0.25/MTok$1.25/MTok

Third-Party Platform IDs

ModelAWS BedrockGCP Vertex AI
Opus 4.6anthropic.claude-opus-4-6-v1claude-opus-4-6
Sonnet 4.5anthropic.claude-sonnet-4-5-20250929-v1:0claude-sonnet-4-5@20250929
Haiku 4.5anthropic.claude-haiku-4-5-20251001-v1:0claude-haiku-4-5@20251001

Batch API Pricing (50% discount)

ModelBatch InputBatch Output
Opus 4.6$2.50/MTok$12.50/MTok
Sonnet 4.5$1.50/MTok$7.50/MTok
Haiku 4.5$0.50/MTok$2.50/MTok

Prompt Caching Pricing

ModelBase Input5m Cache Write1h Cache WriteCache ReadOutput
Opus 4.6$5/MTok$6.25/MTok$10/MTok$0.50/MTok$25/MTok
Sonnet 4.5$3/MTok$3.75/MTok$6/MTok$0.30/MTok$15/MTok
Haiku 4.5$1/MTok$1.25/MTok$2/MTok$0.10/MTok$5/MTok

Multipliers: cache write = 1.25x base (5m) or 2x base (1h), cache read = 0.1x base.

Model Selection Guide

NeedModelUse Cases
Maximum intelligenceOpus 4.6Advanced agents, coding, research, complex reasoning
Balance of speed + intelligenceSonnet 4.5Code gen, data analysis, content creation, tool use
Speed + economyHaiku 4.5Real-time apps, high-volume processing, sub-agents

Client SDKs

Installation

# Python
pip install anthropic

# TypeScript / Node.js
npm install @anthropic-ai/sdk

# Java (Gradle)
implementation("com.anthropic:anthropic-java:2.11.1")

# Go
go get github.com/anthropics/anthropic-sdk-go

# Ruby
bundler add anthropic

# C#
dotnet add package Anthropic

# PHP
composer require anthropic-ai/sdk

Quick Start Examples

Python:

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello, Claude"}],
)
print(message.content)

TypeScript:

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const message = await client.messages.create({
  model: "claude-opus-4-6",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Hello, Claude" }]
});
console.log(message.content);

cURL:

curl https://api.anthropic.com/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-opus-4-6",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Hello, Claude"}]
  }'

Messages API Reference

POST /v1/messages

Create a message. The API is stateless — provide the full conversation in each request.

Request Body Parameters

ParameterTypeRequiredDescription
modelstringYesModel ID (e.g., "claude-opus-4-6")
max_tokensintegerYesMax tokens to generate (model-dependent max)
messagesarrayYesArray of message objects with role and content
systemstring | arrayNoSystem prompt (string or array of content blocks)
temperaturefloatNoRandomness (0.0-1.0, default ~1.0). Lower = more deterministic
top_pfloatNoNucleus sampling threshold
top_kintegerNoTop-K sampling
stop_sequencesstring[]NoCustom stop sequences
streambooleanNoEnable SSE streaming
toolsarrayNoTool definitions for tool use
tool_choiceobjectNoHow Claude should use tools (auto, any, tool, none)
metadataobjectNoRequest metadata (e.g., user_id)
thinkingobjectNoExtended thinking config
output_configobjectNoOutput format config (for structured outputs)

Message Format

{
  "role": "user",
  "content": "Hello, Claude"
}

Content can be a string or an array of content blocks:

{
  "role": "user",
  "content": [
    {"type": "text", "text": "What's in this image?"},
    {"type": "image", "source": {"type": "base64", "media_type": "image/jpeg", "data": "..."}}
  ]
}

Content Block Types (Input)

TypeDescription
textText content
imageImage (base64 or URL)
documentPDF or plain text document
tool_useTool call from assistant
tool_resultResult of a tool call
thinkingThinking block (for multi-turn with extended thinking)

Response Object

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {"type": "text", "text": "Hello! How can I help you today?"}
  ],
  "model": "claude-opus-4-6",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 15
  }
}

Stop Reasons

ValueMeaning
end_turnNatural end of response
max_tokensHit max_tokens limit
stop_sequenceHit a custom stop sequence
tool_useClaude wants to call a tool (client tools)
pause_turnServer tool loop hit iteration limit — send response back to continue

Multi-turn Conversations

response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What is the capital of France?"},
        {"role": "assistant", "content": "The capital of France is Paris."},
        {"role": "user", "content": "What's its population?"},
    ],
)

System Prompts

response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    system="You are a helpful assistant that responds in JSON.",
    messages=[{"role": "user", "content": "List 3 colors"}],
)

Streaming

Set "stream": true or use SDK streaming helpers for incremental responses via SSE.

SDK Streaming

Python:

with client.messages.stream(
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
    model="claude-opus-4-6",
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

TypeScript:

await client.messages.stream({
  messages: [{ role: "user", content: "Hello" }],
  model: "claude-opus-4-6",
  max_tokens: 1024
}).on("text", (text) => {
  console.log(text);
});

Get Final Message (no event handling needed)

with client.messages.stream(
    max_tokens=128000,
    messages=[{"role": "user", "content": "Write a detailed analysis..."}],
    model="claude-opus-4-6",
) as stream:
    message = stream.get_final_message()
print(message.content[0].text)

SSE Event Flow

  1. message_start — Message object with empty content
  2. content_block_start — Start of a content block
  3. content_block_delta — Incremental content (text_delta, input_json_delta, thinking_delta)
  4. content_block_stop — End of content block
  5. message_delta — Top-level changes (stop_reason, usage)
  6. message_stop — Stream complete

cURL Streaming

curl https://api.anthropic.com/v1/messages \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -d '{
    "model": "claude-opus-4-6",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 256,
    "stream": true
  }'

Tool Use

Claude can call tools (functions) you define. Two types: client tools (you execute) and server tools (Anthropic executes, e.g., web search).

Defining Tools

response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    tools=[{
        "name": "get_weather",
        "description": "Get the current weather in a given location",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA"
                }
            },
            "required": ["location"]
        }
    }],
    messages=[{"role": "user", "content": "What's the weather in San Francisco?"}],
)

Tool Use Flow (Client Tools)

  1. Send request with tools and user message
  2. Claude returns stop_reason: "tool_use" with a tool_use content block
  3. Execute the tool on your side
  4. Send tool_result back in a new user message
  5. Claude uses the result to form its final response

Tool Use Response

{
  "stop_reason": "tool_use",
  "content": [
    {"type": "text", "text": "I'll check the weather for you."},
    {
      "type": "tool_use",
      "id": "toolu_01A09q90qw90lq917835lq9",
      "name": "get_weather",
      "input": {"location": "San Francisco, CA"}
    }
  ]
}

Returning Tool Results

response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    tools=[...],  # same tools
    messages=[
        {"role": "user", "content": "What's the weather in SF?"},
        {"role": "assistant", "content": [
            {"type": "text", "text": "I'll check..."},
            {"type": "tool_use", "id": "toolu_01A09q90qw90lq917835lq9",
             "name": "get_weather", "input": {"location": "San Francisco, CA"}}
        ]},
        {"role": "user", "content": [
            {"type": "tool_result",
             "tool_use_id": "toolu_01A09q90qw90lq917835lq9",
             "content": "65°F, sunny"}
        ]}
    ],
)

Tool Choice Options

TypeDescription
{"type": "auto"}Claude decides whether to use tools (default)
{"type": "any"}Claude must use one of the provided tools
{"type": "tool", "name": "..."}Claude must use the specified tool
{"type": "none"}Claude won't use tools
response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    tools=[{"type": "web_search_20250305", "name": "web_search"}],
    messages=[{"role": "user", "content": "What's the latest news on AI?"}],
)

Web search: $10 per 1,000 searches + standard token costs. Web fetch: No additional cost beyond tokens.

Strict Tool Use (Structured Outputs)

Add strict: true to guarantee schema validation on tool inputs:

{
  "name": "get_weather",
  "strict": true,
  "input_schema": { ... }
}

Vision (Images)

Claude accepts images via base64 encoding or URL. Supports JPEG, PNG, GIF, WebP.

Limits

  • Up to 100 images per API request (20 on claude.ai)
  • Max 8000x8000 px per image (2000x2000 if >20 images)
  • Optimal: resize to max 1568px on longest edge
  • Token cost: tokens = (width * height) / 750

Base64 Image

message = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image",
                "source": {
                    "type": "base64",
                    "media_type": "image/jpeg",
                    "data": image_data,  # base64-encoded string
                },
            },
            {"type": "text", "text": "Describe this image."},
        ],
    }],
)

URL Image

message = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image",
                "source": {
                    "type": "url",
                    "url": "https://example.com/image.jpg",
                },
            },
            {"type": "text", "text": "What's in this image?"},
        ],
    }],
)

Structured Outputs

Guarantee valid JSON responses matching a schema via constrained decoding.

JSON Schema Output

response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Extract info: John Smith (john@example.com) wants Enterprise plan."}],
    output_config={
        "format": {
            "type": "json_schema",
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "email": {"type": "string"},
                    "plan": {"type": "string"},
                },
                "required": ["name", "email", "plan"],
                "additionalProperties": False,
            },
        }
    },
)
# Response: {"name": "John Smith", "email": "john@example.com", "plan": "Enterprise"}

With Pydantic (Python SDK)

from pydantic import BaseModel
from anthropic import Anthropic

class ContactInfo(BaseModel):
    name: str
    email: str
    plan_interest: str

client = Anthropic()
# Use with .parse() for automatic validation

With Zod (TypeScript SDK)

import { z } from "zod";
// Use with client.messages.create() and zod schema

Output Format Options

TypeDescription
json_schemaStrict JSON matching a provided schema
textDefault text output

Extended Thinking

Enhanced reasoning for complex tasks. Claude shows its step-by-step thought process.

Supported Models

All current models except Haiku 3. For Opus 4.6, use adaptive thinking (type: "adaptive") instead of manual mode.

Basic Usage

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 10000},
    messages=[{"role": "user", "content": "Prove there are infinite primes where n mod 4 == 3."}],
)

for block in response.content:
    if block.type == "thinking":
        print(f"Thinking: {block.thinking}")
    elif block.type == "text":
        print(f"Response: {block.text}")
response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=16000,
    thinking={"type": "adaptive"},
    messages=[{"role": "user", "content": "Solve this complex problem..."}],
)

Key Points

  • budget_tokens must be < max_tokens
  • Claude 4 models return summarized thinking (charged for full tokens, not summary)
  • Claude Sonnet 3.7 returns full thinking output
  • With tool use: only tool_choice: auto or none is supported
  • When using tools with thinking, pass thinking blocks back to the API for the last assistant message
  • Cannot toggle thinking mid-turn (during tool use loops)

Response Format

{
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me analyze this step by step...",
      "signature": "EqQBCgIYAhIM..."
    },
    {
      "type": "text",
      "text": "Based on my analysis..."
    }
  ]
}

Prompt Caching

Cache repeated content (system prompts, documents, tool definitions) to reduce cost and latency.

How to Use

Add cache_control to content blocks you want cached:

response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    system=[
        {"type": "text", "text": "You are an expert analyst."},
        {
            "type": "text",
            "text": "<large document content here>",
            "cache_control": {"type": "ephemeral"}  # default 5-min TTL
        }
    ],
    messages=[{"role": "user", "content": "Summarize the document."}],
)

Key Rules

  • Cache hierarchy: toolssystemmessages
  • Default TTL: 5 minutes (refreshed on use). Optional 1-hour TTL available
  • Min cacheable tokens: 4096 (Opus 4.6, Opus 4.5, Haiku 4.5), 1024 (Sonnet 4.5, Opus 4.1, Opus 4, Sonnet 4)
  • Up to 4 cache breakpoints per request
  • 20-block lookback window for automatic prefix checking
  • Cache reads are 10% of base input price

1-Hour Cache

"cache_control": {"type": "ephemeral", "ttl": "1h"}

Tracking Cache Performance

Response usage fields:

  • cache_creation_input_tokens — tokens written to cache
  • cache_read_input_tokens — tokens read from cache
  • input_tokens — uncached tokens (after last breakpoint)
total_input_tokens = cache_read_input_tokens + cache_creation_input_tokens + input_tokens

Batch Processing

Process large volumes of requests asynchronously at 50% discount.

Create a Batch

batch = client.messages.batches.create(
    requests=[
        {
            "custom_id": "request-1",
            "params": {
                "model": "claude-opus-4-6",
                "max_tokens": 1024,
                "messages": [{"role": "user", "content": "Hello"}]
            }
        },
        # ... more requests
    ]
)

Batch Limits

TierMax RPMMax Requests in QueueMax per Batch
150100,000100,000
21,000200,000100,000
32,000300,000100,000
44,000500,000100,000

Errors

HTTP Error Codes

CodeTypeDescription
400invalid_request_errorBad request format or content
401authentication_errorInvalid API key
403permission_errorInsufficient permissions
404not_found_errorResource not found
413request_too_largeExceeds max request size
429rate_limit_errorRate limit hit
500api_errorInternal server error
529overloaded_errorAPI temporarily overloaded

Request Size Limits

EndpointMax Size
Messages API32 MB
Token Counting API32 MB
Batch API256 MB
Files API500 MB

Error Response Format

{
  "type": "error",
  "error": {
    "type": "not_found_error",
    "message": "The requested resource could not be found."
  },
  "request_id": "req_011CSHoEeqs5C35K2UUqR7Fy"
}

Request ID

Every response includes a request-id header. Access via SDKs:

print(f"Request ID: {message._request_id}")

Breaking Changes (Opus 4.6)

  • Prefill removal: Prefilling assistant messages returns 400 error. Use structured outputs or system prompts instead.

Rate Limits

Measured in RPM (requests/min), ITPM (input tokens/min), OTPM (output tokens/min).

Rate Limits by Tier

TierOpus 4.x RPMOpus 4.x ITPMOpus 4.x OTPMSonnet 4.x RPMHaiku 4.5 RPM
15030,0008,0005050
21,000450,00090,0001,0001,000
32,000800,000160,0002,0002,000
44,0002,000,000400,0004,0004,000

Tier Requirements

TierCredit PurchaseMax Credit Purchase
1$5$100
2$40$500
3$200$1,000
4$400$5,000

Cache-Aware ITPM

Only uncached input tokens count toward ITPM limits for most models:

  • input_tokens + cache_creation_input_tokens → count toward ITPM
  • cache_read_input_tokens → do NOT count toward ITPM

This means prompt caching effectively multiplies your throughput.

Rate Limit Response Headers

HeaderDescription
retry-afterSeconds to wait before retrying
anthropic-ratelimit-requests-limitMax requests in period
anthropic-ratelimit-requests-remainingRemaining requests
anthropic-ratelimit-tokens-limitMax tokens in period
anthropic-ratelimit-tokens-remainingRemaining tokens

Handling 429 Errors

  • Use exponential backoff with jitter
  • Check retry-after header
  • Reduce max_tokens if hitting OTPM limits
  • Use prompt caching to reduce ITPM usage
  • Ramp up traffic gradually to avoid acceleration limits

Quick Reference: Common Patterns

Simple Text Generation

message = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Explain quantum computing in simple terms."}],
)

With System Prompt + Streaming

with client.messages.stream(
    model="claude-opus-4-6",
    max_tokens=2048,
    system="You are a concise technical writer.",
    messages=[{"role": "user", "content": "Explain REST APIs."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

JSON Output

response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    output_config={"format": {"type": "json_schema", "schema": {
        "type": "object",
        "properties": {"summary": {"type": "string"}, "key_points": {"type": "array", "items": {"type": "string"}}},
        "required": ["summary", "key_points"],
        "additionalProperties": False,
    }}},
    messages=[{"role": "user", "content": "Summarize the benefits of exercise."}],
)

Image Analysis

import base64, httpx

image_data = base64.b64encode(httpx.get("https://example.com/chart.png").content).decode()

message = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": [
        {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": image_data}},
        {"type": "text", "text": "Analyze this chart."},
    ]}],
)

Tool Use Loop

import anthropic, json

client = anthropic.Anthropic()
tools = [{
    "name": "get_weather",
    "description": "Get weather for a location",
    "input_schema": {
        "type": "object",
        "properties": {"location": {"type": "string"}},
        "required": ["location"]
    }
}]

messages = [{"role": "user", "content": "What's the weather in Tokyo?"}]

# Step 1: Initial request
response = client.messages.create(model="claude-opus-4-6", max_tokens=1024, tools=tools, messages=messages)

# Step 2: Process tool calls
if response.stop_reason == "tool_use":
    tool_block = next(b for b in response.content if b.type == "tool_use")
    # Execute your tool
    result = "72°F, partly cloudy"

    # Step 3: Send result back
    messages.append({"role": "assistant", "content": response.content})
    messages.append({"role": "user", "content": [
        {"type": "tool_result", "tool_use_id": tool_block.id, "content": result}
    ]})

    # Step 4: Get final response
    final = client.messages.create(model="claude-opus-4-6", max_tokens=1024, tools=tools, messages=messages)
    print(final.content[0].text)

Additional Resources