Skip to main content

Overview

Model fallback enables specifying multiple models in a models array. The API tries each model in order until one succeeds, providing automatic failover when a model is unavailable.

How It Works

Provide a models array containing up to 5 models:
  1. The API tries the first model in the array
  2. If it fails or is unavailable, the next model is tried
  3. This continues until one succeeds or all models are exhausted
The models array takes precedence over the single model field when both are provided.
Benefits:
  • Higher availability: Automatic failover when primary model is unavailable
  • Provider redundancy: Use models from different providers for maximum reliability
  • Seamless operation: No code refactoring needed, fallback is handled automatically by the API

Basic Example

from perplexity import Perplexity

client = Perplexity()

response = client.responses.create(
    models=["openai/gpt-5.2", "openai/gpt-5.1", "openai/gpt-5-mini"],
    input="What are the latest developments in AI?",
    instructions="You have access to a web_search tool. Use it for questions about current events.",
)

print(f"Model used: {response.model}")

Cross-Provider Fallback

For maximum reliability, use models from different providers:
from perplexity import Perplexity

client = Perplexity()

response = client.responses.create(
    models=[
        "openai/gpt-5.2",
        "anthropic/claude-sonnet-4-5",
        "google/gemini-2.5-pro"
    ],
    input="Explain quantum computing in detail",
)

Pricing

Billing is based on the model that serves the request, not all models in the fallback chain.
The model field in the response indicates which model was used, and the usage field shows the token counts for that model.
Request:
{
  "models": ["openai/gpt-5.2", "openai/gpt-5.1"],
  "input": "..."
}
Response (if first model failed):
{
  "model": "openai/gpt-5.1",
  "usage": {
    "input_tokens": 150,
    "output_tokens": 320,
    "total_tokens": 470
  }
}
In this case, billing is based on gpt-5.1 pricing for 470 tokens.
Place preferred models first in the array. Consider pricing differences when ordering the fallback chain.

Next Steps