The Agent API supports direct access to models from multiple providers. All models are accessed directly from first-party providers with transparent token-based pricing.Pricing rates are updated monthly and reflect direct first-party provider pricing with no markup. All charges are based on actual token consumption, and every API response includes exact token counts so you know your costs per request.
Looking for pre-configured model setups? See Presets — optimized for specific use cases.
from perplexity import Perplexityclient = Perplexity()response = client.responses.create( model="openai/gpt-5.5", input="Explain the difference between supervised and unsupervised learning in machine learning.", max_output_tokens=300,)print(f"Response ID: {response.id}")print(response.output_text)
See Your Costs in Real-Time: Every response includes a usage field with exact input tokens, output tokens, and cache read tokens. Calculate your cost instantly using the pricing table above.
For high-availability applications, you can specify multiple models in a fallback chain. When one model fails or is unavailable, the API automatically tries the next model in the chain.
Model Fallback Chain
Learn how to use model fallback chains to ensure high availability and reliability by automatically trying multiple models when one fails.