OpenAI API

OpenAI API reference — models, chat completions, embeddings, images, assistants, and more.

OpenAI API — Condensed Reference

1. Authentication & Setup

# Set API key as environment variable
export OPENAI_API_KEY="your_api_key_here"
  • Base URL (Responses API): https://api.openai.com/v1/responses
  • Base URL (Chat Completions): https://api.openai.com/v1/chat/completions
  • Auth Header: Authorization: Bearer $OPENAI_API_KEY

All SDKs auto-read OPENAI_API_KEY from environment.


2. Models & Pricing

Flagship Models (GPT-5 Family — Reasoning)

ModelBest ForContextReasoning
gpt-5.2Complex reasoning, broad knowledge, coding, multi-step agentic tasks1M tokensnone/low/medium/high/xhigh
gpt-5.2-proHardest problems requiring extended thinking1M tokenslow/medium/high/xhigh
gpt-5.2-codexInteractive coding products, full-spectrum coding1M tokensnone/low/medium/high/xhigh
gpt-5-miniCost-optimized reasoning and chat1M tokenslow/medium/high
gpt-5-nanoHigh-throughput, simple tasks, classification1M tokenslow/medium/high

Non-Reasoning Models (GPT-4 Family)

ModelBest ForContext
gpt-4.1Best balance of intelligence, speed, cost1M tokens
gpt-4.1-miniFast, cost-effective for simpler tasks1M tokens
gpt-4.1-nanoFastest, cheapest, simple tasks1M tokens
gpt-4oMultimodal (text + vision + audio)128K tokens
gpt-4o-miniAffordable multimodal128K tokens

Reasoning-Only Models

ModelBest ForContext
o3STEM reasoning, coding, complex analysis200K tokens
o4-miniFast reasoning, cost-efficient200K tokens

Image Models

ModelCapability
gpt-image-1Native multimodal image generation
dall-e-3Image generation
dall-e-2Image generation (legacy)
  • General tasks: gpt-4.1
  • Complex reasoning: gpt-5.2 with reasoning.effort: "medium"
  • Cost-sensitive: gpt-4.1-mini or gpt-5-mini
  • High-throughput: gpt-5-nano or gpt-4.1-nano

3. Client SDKs

Python

pip install openai
from openai import OpenAI
client = OpenAI()  # reads OPENAI_API_KEY from env

response = client.responses.create(
    model="gpt-5",
    input="Hello, world!"
)
print(response.output_text)

JavaScript/TypeScript

npm install openai
import OpenAI from "openai";
const client = new OpenAI();  // reads OPENAI_API_KEY from env

const response = await client.responses.create({
    model: "gpt-5",
    input: "Hello, world!",
});
console.log(response.output_text);

Other Official SDKs

  • .NET: dotnet add package OpenAI
  • Java: Maven/Gradle — com.openai:openai-java
  • Go: go get github.com/openai/openai-go

cURL

curl https://api.openai.com/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-5",
    "input": "Hello, world!"
  }'

The Responses API is the recommended API for all new projects. Reasoning models perform better with the Responses API.

Basic Request

from openai import OpenAI
client = OpenAI()

response = client.responses.create(
    model="gpt-5",
    input="Write a haiku about coding.",
)
print(response.output_text)

With Instructions (System Prompt)

response = client.responses.create(
    model="gpt-5",
    instructions="You are a helpful coding assistant. Be concise.",
    input="How do I reverse a string in Python?",
)

With Message Roles

response = client.responses.create(
    model="gpt-5",
    input=[
        {"role": "developer", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is machine learning?"},
    ],
)

Message role priority: developer > user > assistant

  • developer: Application-level instructions (highest priority)
  • user: End-user input
  • assistant: Model-generated messages

Key Parameters

ParameterTypeDescription
modelstringRequired. Model ID (e.g., "gpt-5")
inputstring/arrayRequired. Text string or array of messages
instructionsstringSystem-level instructions (high priority)
toolsarrayAvailable tools (functions, built-in tools)
reasoningobject{"effort": "none"|"low"|"medium"|"high"|"xhigh"}
textobject{"format": {...}, "verbosity": "low"|"medium"|"high"}
streambooleanEnable streaming
max_output_tokensintegerMax tokens in response (reasoning + output)
temperaturefloat0-2, randomness (not for reasoning models)
top_pfloatNucleus sampling
storebooleanWhether to store for later retrieval
previous_response_idstringContinue a conversation
prompt_cache_retentionstring"in_memory" or "24h"
prompt_cache_keystringCustom cache routing key

Response Object

{
  "id": "resp_abc123",
  "object": "response",
  "status": "completed",
  "model": "gpt-5",
  "output": [
    {
      "id": "msg_abc123",
      "type": "message",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "Response text here.",
          "annotations": []
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 25,
    "output_tokens": 150,
    "total_tokens": 175,
    "output_tokens_details": {
      "reasoning_tokens": 100
    }
  }
}

Important: The output array can contain multiple items (text, tool calls, reasoning). Use response.output_text (SDK convenience) to get aggregated text.

Conversation State

# Option 1: Use previous_response_id for automatic context
response1 = client.responses.create(
    model="gpt-5",
    input="My name is Alice.",
)

response2 = client.responses.create(
    model="gpt-5",
    input="What's my name?",
    previous_response_id=response1.id,
)

# Option 2: Manually pass conversation history
response = client.responses.create(
    model="gpt-5",
    input=[
        {"role": "user", "content": "My name is Alice."},
        {"role": "assistant", "content": "Hello Alice!"},
        {"role": "user", "content": "What's my name?"},
    ],
)

5. Chat Completions API (Legacy, Still Supported)

response = client.chat.completions.create(
    model="gpt-4.1",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"},
    ],
)
print(response.choices[0].message.content)
const response = await client.chat.completions.create({
    model: "gpt-4.1",
    messages: [
        { role: "system", content: "You are a helpful assistant." },
        { role: "user", content: "Hello!" },
    ],
});
console.log(response.choices[0].message.content);
curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4.1",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"}
    ]
  }'

Chat Completions Response Format

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "gpt-4.1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 10,
    "total_tokens": 30
  }
}

Key difference: Chat Completions uses messages array with system/user/assistant roles. Responses API uses input with developer/user/assistant roles.


6. Reasoning Models

Reasoning models (GPT-5 family, o3, o4-mini) generate internal chain-of-thought tokens before responding. Reasoning tokens are billed as output tokens but not visible in the API response.

Reasoning Effort

response = client.responses.create(
    model="gpt-5.2",
    reasoning={"effort": "medium"},
    input="Write a bash script that transposes a matrix.",
)
EffortDescriptionGPT-5.2 Default
noneNo reasoning, lowest latency (GPT-5.2 only)✓ (default)
lowMinimal reasoning, fast
mediumBalanced speed and accuracy
highThorough reasoning for complex tasks
xhighMaximum reasoning (GPT-5.2 only)

Verbosity Control (GPT-5 Family)

response = client.responses.create(
    model="gpt-5.2",
    input="What is recursion?",
    reasoning={"effort": "low"},
    text={"verbosity": "low"},  # low | medium | high
)

Reasoning Summaries

response = client.responses.create(
    model="gpt-5",
    input="What is the capital of France?",
    reasoning={"effort": "low", "summary": "auto"},
)

Handling Incomplete Responses

response = client.responses.create(
    model="gpt-5",
    reasoning={"effort": "medium"},
    input=[{"role": "user", "content": "Complex problem..."}],
    max_output_tokens=300,
)

if response.status == "incomplete":
    reason = response.incomplete_details.reason  # "max_output_tokens"
    if response.output_text:
        print("Partial:", response.output_text)
    else:
        print("Ran out of tokens during reasoning")

Tip: Reserve at least 25,000 tokens for reasoning + output when starting.

Reasoning Token Usage

{
  "usage": {
    "input_tokens": 75,
    "output_tokens": 1186,
    "output_tokens_details": {
      "reasoning_tokens": 1024
    },
    "total_tokens": 1261
  }
}

7. Streaming

Responses API Streaming

stream = client.responses.create(
    model="gpt-5",
    input="Tell me a story.",
    stream=True,
)

for event in stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="", flush=True)
const stream = await client.responses.create({
    model: "gpt-5",
    input: "Tell me a story.",
    stream: true,
});

for await (const event of stream) {
    if (event.type === "response.output_text.delta") {
        process.stdout.write(event.delta);
    }
}

Key Streaming Events

EventDescription
response.createdResponse object created
response.output_text.deltaText chunk received
response.completedGeneration complete
response.function_call_arguments.deltaTool call argument chunk
errorError occurred

Chat Completions Streaming

stream = client.chat.completions.create(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

8. Function Calling (Tool Use)

Define and Use Functions

import json
from openai import OpenAI
client = OpenAI()

# Step 1: Define tools
tools = [
    {
        "type": "function",
        "name": "get_weather",
        "description": "Get current weather for a location.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City and country, e.g. 'Paris, France'"
                },
                "units": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"]
                }
            },
            "required": ["location", "units"],
            "additionalProperties": False
        },
        "strict": True
    }
]

# Step 2: Send request with tools
input_messages = [
    {"role": "user", "content": "What's the weather in Paris?"}
]

response = client.responses.create(
    model="gpt-5",
    tools=tools,
    input=input_messages,
)

# Step 3: Handle tool calls
input_messages += response.output

for item in response.output:
    if item.type == "function_call":
        # Step 4: Execute function and return result
        args = json.loads(item.arguments)
        result = get_weather(**args)  # your function
        input_messages.append({
            "type": "function_call_output",
            "call_id": item.call_id,
            "output": json.dumps(result)
        })

# Step 5: Get final response
final = client.responses.create(
    model="gpt-5",
    tools=tools,
    input=input_messages,
)
print(final.output_text)

Function Definition Schema

FieldTypeDescription
typestringAlways "function"
namestringFunction name (e.g., "get_weather")
descriptionstringWhen/how to use the function
parametersobjectJSON Schema for input arguments
strictbooleanEnforce strict schema adherence

Built-in Tools

Tool TypeDescription
web_search_previewSearch the web
file_searchSearch uploaded files/vector stores
code_interpreterExecute Python code
image_generationGenerate images with gpt-image-1
computer_use_previewControl computer interfaces (beta)
# Using web search
response = client.responses.create(
    model="gpt-5",
    tools=[{"type": "web_search_preview"}],
    input="What happened in tech news today?",
)

Important: When using function calling with reasoning models, pass back all reasoning items from previous responses along with function call outputs.


9. Structured Outputs

Ensures model responses conform to a JSON Schema.

With Responses API (Pydantic)

from pydantic import BaseModel
from openai import OpenAI
client = OpenAI()

class CalendarEvent(BaseModel):
    name: str
    date: str
    participants: list[str]

response = client.responses.parse(
    model="gpt-5",
    input="Alice and Bob are going to a conference on Jan 15.",
    text_format=CalendarEvent,
)

for item in response.output:
    if item.type == "message":
        for content in item.content:
            print(content.parsed)  # CalendarEvent object

With Responses API (Zod — JavaScript)

import { z } from "zod";
import { zodTextFormat } from "openai/helpers/zod";
import OpenAI from "openai";
const client = new OpenAI();

const CalendarEvent = z.object({
    name: z.string(),
    date: z.string(),
    participants: z.array(z.string()),
});

const response = await client.responses.parse({
    model: "gpt-5",
    input: "Alice and Bob are going to a conference on Jan 15.",
    text: { format: zodTextFormat(CalendarEvent, "calendar_event") },
});

JSON Schema via text.format

response = client.responses.create(
    model="gpt-5",
    input="Extract the event details.",
    text={
        "format": {
            "type": "json_schema",
            "strict": True,
            "name": "calendar_event",
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "date": {"type": "string"},
                    "participants": {
                        "type": "array",
                        "items": {"type": "string"}
                    }
                },
                "required": ["name", "date", "participants"],
                "additionalProperties": False
            }
        }
    },
)

Handling Refusals

for item in response.output:
    if item.type == "message":
        for content in item.content:
            if content.type == "refusal":
                print("Refused:", content.refusal)
            elif content.parsed:
                print(content.parsed)

10. Vision (Image Analysis)

Analyze Image by URL

response = client.responses.create(
    model="gpt-4.1-mini",
    input=[{
        "role": "user",
        "content": [
            {"type": "input_text", "text": "What's in this image?"},
            {
                "type": "input_image",
                "image_url": "https://example.com/photo.jpg",
            },
        ],
    }],
)
print(response.output_text)

Analyze Image by Base64

import base64

with open("image.jpg", "rb") as f:
    b64 = base64.standard_b64encode(f.read()).decode("utf-8")

response = client.responses.create(
    model="gpt-4.1-mini",
    input=[{
        "role": "user",
        "content": [
            {"type": "input_text", "text": "Describe this image."},
            {
                "type": "input_image",
                "image_url": f"data:image/jpeg;base64,{b64}",
            },
        ],
    }],
)

Generate Images

response = client.responses.create(
    model="gpt-4.1-mini",
    input="Generate an image of a cat wearing a hat.",
    tools=[{"type": "image_generation"}],
)

import base64
for item in response.output:
    if item.type == "image_generation_call":
        with open("output.png", "wb") as f:
            f.write(base64.b64decode(item.result))

11. Prompt Caching

Automatic for prompts ≥1024 tokens. No code changes needed.

Benefits

  • Up to 80% latency reduction
  • Up to 90% input cost reduction (cached tokens pricing)
  • Works automatically on all recent models (gpt-4o and newer)

Optimize for Caching

  • Place static content (instructions, examples) at the beginning of prompts
  • Place dynamic content (user input) at the end

Extended Cache Retention

response = client.responses.create(
    model="gpt-5.1",
    input="Your prompt here...",
    prompt_cache_retention="24h",  # "in_memory" (default) or "24h"
)

Custom Cache Key

response = client.responses.create(
    model="gpt-5.1",
    input="Your prompt here...",
    prompt_cache_key="my-app-v2",  # improve cache routing
)

Monitor Cache Performance

{
  "usage": {
    "prompt_tokens": 2006,
    "completion_tokens": 300,
    "prompt_tokens_details": {
      "cached_tokens": 1920
    }
  }
}

12. Batch API

Process large volumes at 50% cost reduction with 24-hour turnaround.

Usage

# 1. Create JSONL input file
# Each line: {"custom_id": "req-1", "method": "POST", "url": "/v1/chat/completions", "body": {...}}

# 2. Upload file
file = client.files.create(file=open("batch.jsonl", "rb"), purpose="batch")

# 3. Create batch
batch = client.batches.create(
    input_file_id=file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h",
)

# 4. Check status
batch = client.batches.retrieve(batch.id)
print(batch.status)  # validating -> in_progress -> completed

# 5. Download results
result_file = client.files.content(batch.output_file_id)

13. Errors & Error Handling

HTTP Error Codes

CodeDescriptionSolution
401Invalid/incorrect API keyCheck API key, regenerate if needed
403Region not supportedCheck supported countries
429Rate limit exceededImplement backoff, reduce request rate
429Quota exceededCheck billing, upgrade plan
500Server errorRetry after brief wait
503Server overloadedRetry with exponential backoff

Python SDK Exception Types

ExceptionCause
APIConnectionErrorNetwork/proxy/SSL issue
APITimeoutErrorRequest timed out
AuthenticationErrorInvalid/expired API key
BadRequestErrorMalformed request
RateLimitErrorRate limit hit
InternalServerErrorServer-side issue
ConflictErrorResource update conflict
NotFoundErrorResource doesn't exist
PermissionDeniedErrorNo access to resource

Retry with Exponential Backoff

from openai import OpenAI
from tenacity import retry, wait_random_exponential, stop_after_attempt

client = OpenAI()

@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))
def completion_with_backoff(**kwargs):
    return client.responses.create(**kwargs)

response = completion_with_backoff(
    model="gpt-5",
    input="Hello!",
)

14. Rate Limits

  • Applied per organization and per project
  • Measured in: RPM (requests/min), TPM (tokens/min), RPD (requests/day)
  • Higher tiers unlock higher limits
  • Cached tokens still count toward TPM limits
  • Check limits at: platform.openai.com/settings/organization/limits

15. Quick Reference Patterns

Simple Text Generation

response = client.responses.create(model="gpt-5", input="Hello!")
print(response.output_text)

With Reasoning

response = client.responses.create(
    model="gpt-5.2",
    input="Solve this complex problem...",
    reasoning={"effort": "high"},
)

Fast Low-Latency Response

response = client.responses.create(
    model="gpt-5.2",
    input="Quick question?",
    reasoning={"effort": "none"},
    text={"verbosity": "low"},
)

Multi-turn Conversation

r1 = client.responses.create(model="gpt-5", input="Hi, I'm Alice.")
r2 = client.responses.create(
    model="gpt-5",
    input="What's my name?",
    previous_response_id=r1.id,
)

Streaming + Tool Use

stream = client.responses.create(
    model="gpt-5",
    input="Search the web for today's news.",
    tools=[{"type": "web_search_preview"}],
    stream=True,
)
for event in stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="")

Image Analysis + Text

response = client.responses.create(
    model="gpt-4.1-mini",
    input=[{
        "role": "user",
        "content": [
            {"type": "input_text", "text": "Describe this."},
            {"type": "input_image", "image_url": "https://example.com/img.jpg"},
        ],
    }],
)

Structured Output

from pydantic import BaseModel

class Summary(BaseModel):
    title: str
    key_points: list[str]
    sentiment: str

response = client.responses.parse(
    model="gpt-5",
    input="Summarize this article: ...",
    text_format=Summary,
)

Chat Completions (Legacy Pattern)

response = client.chat.completions.create(
    model="gpt-4.1",
    messages=[
        {"role": "system", "content": "You are helpful."},
        {"role": "user", "content": "Hello!"},
    ],
)
print(response.choices[0].message.content)

16. Migration: Chat Completions → Responses API

Chat CompletionsResponses API
messages arrayinput array or string
system roledeveloper role or instructions param
response_formattext.format
response.choices[0].message.contentresponse.output_text
Manual conversation historyprevious_response_id for auto context
client.chat.completions.create()client.responses.create()
client.chat.completions.parse()client.responses.parse()