Skip to main content

Models

Yes, the Sonar Models leverage information from Perplexity’s search index and the public internet.
Context window size varies by model. For Sonar models, see the Sonar models reference; for the third-party models available through the Agent API, see the Agent API models page and the linked provider documentation.Note that the context window—the maximum number of tokens a model can process in a single request—is different from search context size, which controls how much web information is retrieved. See the pricing glossary for the distinction.
Currently, we do not support fine-tuning.
When a model is retired, we announce it in the changelog along with a recommended replacement. To avoid disruption, pin an explicit model ID in your requests rather than relying on a default, and watch the changelog for deprecation notices. The current list of available models is always on the Sonar models and Agent API models pages.
We expose the CoTs for Sonar Reasoning Pro. We don’t currently expose the CoTs for Deep Research.

Output & Capabilities

Yes. Use the response_format parameter to constrain a response to a JSON schema. See Structured Outputs for Sonar and Structured Outputs for the Agent API. For reasoning models, see the note below about <think> tokens preceding the JSON.
The sonar-reasoning-pro model is designed to output a <think> section containing reasoning tokens, immediately followed by a valid JSON object. As a result, the response_format parameter does not remove these reasoning tokens from the output.We recommend using a custom parser to extract the valid JSON portion. An example implementation can be found here.
Yes. Pass stream: true to receive the response incrementally as it is generated, instead of waiting for the full result. See Streaming Responses for Sonar and Streaming Responses for the Agent API. The official SDKs expose streaming directly.
Yes. The Agent API provides built-in tools, including web search, fetch URL content, people search, finance search, and a code sandbox. The model decides when to call them while answering your request.
The Perplexity API is designed to be broadly compatible with OpenAI’s chat completions endpoint. It adopts a similar structure—including fields such as id, model, and usage—and supports analogous parameters like model, messages, and stream.Key Differences from the standard OpenAI response include:
  • Response Object Structure:
    • OpenAI responses typically have an object value of "chat.completion" and a created timestamp, whereas our response uses object: "response" and a created_at field.
    • Instead of a choices array, our response content is provided under an output array that contains detailed message objects.
  • Message Details:
    • Each message in our output includes a type (usually "message"), a unique id, and a status.
    • The actual text is nested within a content array that contains objects with type, text, and an annotations array for additional context.
  • Additional Fields:
    • Our API response provides extra meta-information (such as status, error, instructions, and max_output_tokens) that are not present in standard OpenAI responses.
    • The usage field also differs, offering detailed breakdowns of input and output tokens (including fields like input_tokens_details and output_tokens_details).
These differences are intended to provide enhanced functionality and additional context while maintaining broad compatibility with OpenAI’s API design.
Reasoning tokens in Deep Research are a bit different than the CoTs in the answer—these tokens are used to reason through the research material before generating the final output via the CoTs.

Search & Results

  1. The API uses the same search system as the UI with differences in configuration—so their outputs may differ.
  2. The underlying AI model might differ between the API and the UI for a given query.
Yes, the API offers exactly the same internet data access as Perplexity’s web platform.
Yes, for the API, content filtering in the form of SafeSearch is turned on by default. This helps filter out potentially offensive and inappropriate content, including pornography, from search results. SafeSearch is an automated filter that works across search results to provide a safer experience. You can learn more about SafeSearch on the official Wikipedia page.

Account, Billing & Limits

Pricing is usage-based and varies by API. Sonar charges per token plus a per-request fee that depends on search context size; the Search API charges per request; the Agent API charges per token at direct provider rates with no markup, plus a per-invocation fee for tools; and Embeddings charge per token. See the full breakdown on the Pricing page.
Purchase API credits in the API Platform console under the billing section. Enable auto top-up to add credits automatically when your balance runs low and avoid interrupted service. You can also buy credits through the AWS Marketplace for consolidated billing, or contact our sales team for enterprise procurement. Your usage tier advances automatically as your cumulative purchases grow.
The only way for an account to be upgraded to the next usage tier is through all-time credit purchase.Here are the spending criteria associated with each tier:
TierCredit Purchase (all time)
Tier 0-
Tier 1$50
Tier 2$250
Tier 3$500
Tier 4$1000
Tier 5$5000
You can find our rate limits here.
We offer a way to track your billing per API key. You can do this by navigating to the following location:Settings > View Dashboard > Invoice history > InvoicesThen click on any invoice and each item from the total bill will have a code at the end of it (e.g., pro (743S)). Those 4 characters are the last 4 of your API key.

Errors & Troubleshooting

401 error codes indicate that the provided API key is invalid, deleted, or belongs to an account which ran out of credits. You likely need to purchase more credits in the API Platform console. You can avoid this issue by configuring auto-top-up.
A 429 means you’ve exceeded your rate limit. Rate limits scale with your usage tier and use a leaky-bucket algorithm that allows short bursts up to your limit. Retry with exponential backoff and jitter, use burst capacity for batch jobs, and upgrade your tier—or request a custom limit—for sustained higher throughput. See Error Handling for retry examples in both SDKs.
5xx responses indicate a transient server-side issue, and connection or timeout errors usually point to a network problem. Retry these with exponential backoff, set sensible client timeouts, and log the X-Request-ID response header to include when you contact support. The Error Handling guide covers the SDK exception types and recovery patterns.
Current API availability and incident history are on the System Status page. If you’re seeing errors that aren’t explained there, reach out through the support channels below.

Data, Privacy & Security

We collect the following types of information:API Usage Data: We collect billable usage metadata such as the number of requests and tokens. You can view your own usage in the API Platform console.User Account Information: When you create an account with us, we collect your name, email address, and other relevant contact information.We do not retain any query data sent through the API and do not train on any of your data.
Our compute is hosted via Amazon Web Services in North America. By default, the API has zero day retention of user prompt data, which is never used for AI training.
Perplexity maintains a SOC 2 Type II report, a 2025 HIPAA gap assessment, and a CAIQlite cloud security assessment. These reports, along with data-processing and other compliance documentation (for example, for DPA or GDPR requests), are available through the Perplexity Trust Center. See Privacy & Security for an overview, or contact us for documents not published there.

Support & Reliability

To file a bug report, please head to our Developer Community and create a new post in the “Bug Reports” category.We truly appreciate your patience, and we’ll get back to you as soon as possible. Due to the current volume of reports, it may take a little time for us to respond—but rest assured, we’re on it.
A Feature Request is a suggestion to improve or add new functionality to the Perplexity Sonar API, such as:
  • Requesting support for a new model or capability (e.g., image processing, fine-tuning options)
  • Asking for new API parameters (e.g., additional filters, search options)
  • Suggesting performance improvements (e.g., faster response times, better citation handling)
  • Enhancing existing API features (e.g., improving streaming reliability, adding new output formats)
If your request aligns with these, please submit a feature request here: Github Feature requests
We email users about new developments and also post in the changelog.
We do not guarantee this at the moment.
Please reach out to api@perplexity.ai or support@perplexity.ai for other API inquiries. You can also post on our discussion forum and we will get back to you.