Documentation Index
Fetch the complete documentation index at: https://docs.perplexity.ai/llms.txt
Use this file to discover all available pages before exploring further.
Multi-Provider Model Comparison
A command-line tool that sends the same prompt to multiple AI models through Perplexity’s Agent API and produces a side-by-side comparison of response quality, latency, and cost. Useful for evaluating which model best fits your use case.Features
- Send identical prompts to 5 models across different providers in a single run
- Measure response latency using wall-clock timing
- Extract per-request cost from the
response.usage.cost.total_costfield - Tabulated output comparing response length, latency, and cost
- Model fallback chain support using the
models=[...]parameter for high-availability workflows - Configurable prompt input via command-line argument or file
Supported Models
The default comparison set spans five providers:| Model | Provider |
|---|---|
openai/gpt-5.4 | OpenAI |
anthropic/claude-sonnet-4-6 | Anthropic |
google/gemini-3.1-flash-lite | |
xai/grok-4.20-non-reasoning | xAI |
perplexity/sonar | Perplexity |
Installation
API Key Setup
Set your Perplexity API key as an environment variable. The SDK reads it automatically:Perplexity’s Agent API provides access to models from multiple providers through a single API key. You do not need separate API keys for OpenAI, Anthropic, Google, or xAI.
Usage
Compare models with a prompt
Read the prompt from a file
Use a custom set of models
Export results as JSON
Use model fallback chain
Instead of comparing models, you can test the fallback chain feature. The API tries each model in order until one succeeds:How It Works
- The CLI accepts a prompt and an optional list of models.
- For each model, the tool records a start timestamp, calls
client.responses.create(model=..., input=...), and records the end timestamp. - From each response, it extracts
response.usage.cost.total_costfor the request cost and computes latency as the elapsed wall-clock time. - Results are collected and displayed in a comparison table sorted by latency.
- In fallback mode, the tool sends a single request with
models=[...]and reports which model was ultimately used.
Full Code
Example Output
Running the comparison:Model fallback is useful for production systems where availability matters more than model selection. The API tries each model in the
models array in order and returns the first successful response. See the model fallback guide for details.Tips for Meaningful Comparisons
- Use the same
max_output_tokensacross all models to keep output lengths comparable. - Run multiple trials and average the results, since latency can vary between requests due to load.
- Test with representative prompts for your actual use case rather than generic questions.
- Consider cost per token in addition to total cost, especially for high-volume applications.
Limitations
- Latency measurements reflect end-to-end wall-clock time including network round trips, not pure model inference time.
- Cost values come from the API response and reflect per-request pricing at the time of the call.
- Response quality is subjective and not captured by quantitative metrics alone. Review the actual output text for qualitative evaluation.
- Rate limits vary by model and provider. Sequential comparison requests may be affected by rate limiting on high-demand models.