# Chat Completions
post /chat/completions
Generates a model's response for the given chat conversation.
# Changelog
We are excited to announce the public availability of citations in the Perplexity API. In addition, we have also increased our default rate limit for the sonar online models to 50 requests/min for all users.
Effective immediately, all API users will see citations returned as part of their requests by default. This is not a breaking change. The **return\_citations** parameter will no longer have any effect.
If you have any questions or need assistance, feel free to reach out to our team at [api@perplexity.ai](mailto:api@perplexity.ai)
We are excited to announce the launch of our latest Perplexity Sonar models:
**Online Models** -
`llama-3.1-sonar-small-128k-online`
`llama-3.1-sonar-large-128k-online`
**Chat Models** -
`llama-3.1-sonar-small-128k-chat`
`llama-3.1-sonar-large-128k-chat`
These new additions surpass the performance of the previous iteration. For detailed information on our supported models, please visit our model card documentation.
**\[Action Required]** Model Deprecation Notice
Please note that several models will no longer be accessible effective 8/12/2024. We recommend updating your applications to use models in the Llama-3.1 family immediately.
The following model names will no longer be available via API -
`llama-3-sonar-small-32k-online`
`llama-3-sonar-large-32k-online`
`llama-3-sonar-small-32k-chat`
`llama-3-sonar-large-32k-chat`
`llama-3-8b-instruct`
`llama-3-70b-instruct`
`mistral-7b-instruct`
`mixtral-8x7b-instruct`
We recommend switching to models in the Llama-3.1 family:
**Online Models** -
`llama-3.1-sonar-small-128k-online`
`llama-3.1-sonar-large-128k-online`
**Chat Models** -
`llama-3.1-sonar-small-128k-chat`
`llama-3.1-sonar-large-128k-chat`
**Instruct Models** -
`llama-3.1-70b-instruct`
`llama-3.1-8b-instruct`
If you have any questions, please email [support@perplexity.ai](mailto:support@perplexity.ai).
Thank you for being a Perplexity API user.
Stay curious,
Team Perplexity
Please note that as of May 14, several models and model name aliases will no longer be accessible. We recommend updating your applications to use models in the Llama-3 family immediately. The following model names will no longer be available via API:
`codellama-70b-instruct`
`mistral-7b-instruct`
`mixtral-8x22b-instruct`
`pplx-7b-chat`
`pplx-7b-online`
`sonar-small-chat`
`sonar-small-online`
`pplx-70b-chat`
`pplx-70b-online`
`pplx-8x7b-chat`
`pplx-8x7b-online`
`sonar-medium-chat`
`sonar-medium-online`
In lieu of the above, we recommend switching to models from the Llama 3 family:
`llama-3-sonar-small-32k-chat`
`llama-3-sonar-small-32k-online`
`llama-3-sonar-large-32k-chat`
`llama-3-sonar-large-32k-online`
`llama-3-8b-instruct`
`llama-3-70b-instruct`
Effective immediately, input and output tokens are now charged with the same price. Previously, output tokens were more expensive than input tokens. Prices have generally gone down as a result.
**Announcing Our Newest Model**
We are excited to announce the launch of our latest Perplexity models: sonar-small-chat and sonar-medium-chat, along with their search-enhanced versions, sonar-small-online and sonar-medium-online. These new additions surpass our earlier models in cost-efficiency, speed, and performance. For detailed information on our supported models, please visit our model card documentation.
**Expanded Context Windows**
The context window length for several models has been doubled from 8k to 16k, including mixtral-8x7b-instruct and all Perplexity models. 4k tokens are reserved for search results in online models.
**Model Deprecation Notice**
Please note that as of March 15, the pplx-70b-chat, pplx-70b-online, llama-2-70b-chat, and codellama-34b-instruct models will no longer be available through the Perplexity API. We will gradually phase out less frequently used models in favor of our newer and more performant offerings.
**Revised Pricing Structure for 8x7b Models**
The pricing for the mixtral-8x7b-instruct model will be adjusted. Previously charged at $0.14 / $0.58 per million input and output tokens, the rates will change to $0.60 / $1.80 per million input and output tokens moving forward.
**Increased Public Rate Limits**
Public limits for all models have increased by \~2x. Find the current rate limits here.
**Access to Citations and Elevated Rate Limits**
Responding to popular demand in our API discussion forum, we are introducing URL citation access for our Online LLMs to approved users. For access to citations, or to request a rate limit increase, please complete this form.
**Terms of Service and Data Processing Addendum**
We wish to reiterate our commitment to data privacy for commercial application developers using the Perplexity API. The updated Terms of Service and Data Processing Addendum can be found here.
Thank you for being a Perplexity API user.
We're excited to announce that pplx-api is now serving the latest open-source mixture-of-experts model, `mixtral-8x7b-instruct`, at the blazingly fast speed of inference you are accustomed to.
We’re excited to share two new PPLX models: pplx-7b-online and pplx-70b-online. These first-of-a-kind models are integrated with our in-house search technology for factual grounding. Read our blog post for more information!
[https://blog.perplexity.ai/blog/introducing-pplx-online-llms](https://blog.perplexity.ai/blog/introducing-pplx-online-llms)
We're also announcing general availability for our API. We've rolled out usage-based pricing, which enables us to gradually relax the rate limits on our models. Follow the updated steps for getting started.
We have removed support for `replit-code-v1.5-3b` and `openhermes-2-mistral-7b`. There are no immediate plans to add these models back. If you were a user who enjoyed `openhermes-2-mistral-7b`, try instead using our in-house models, `pplx-7b-chat`, `pplx-70b-chat`!
The Perplexity AI API is currently in beta release v0. Clients are not protected from backwards incompatible changes and cannot specify their desired API version. Examples of backwards incompatible changes include...
Removing support for a given model.
Renaming a response field.
Removing a response field.
Adding a required request parameter.
Backwards incompatible changes will be documented here.
Generally, the API is currently designed to be compatible with OpenAI client libraries. This means that, given the same request body, swapping the API base URL and adding your Perplexity API key will yield a response that can be parsed in the same way as the response OpenAI would yield, except for certain explicitly unsupported body parameters documented in the reference (link to reference).
# Forum
We host our dicsussions on GitHub. You can access our GitHub discussion forum [here](https://github.com/ppl-ai/api-discussion/discussions).
# Frequently Asked Questions
The answers you get from the API should closely resemble the default search answers in the UI. The API uses the same search subsystem as the UI with small differences in configuration. However, you could be seeing differences between the API and UI due to the following reasons:
**Pro Search**
The API doesn't support Pro Search today. Pro Search uses a multi-step reasoning process which increases the quality of the answer.
**Using third party models**
At this time, the API only supports the Sonar models. Using other third party models like GPT-4o/Sonnet 3.5 in the UI could lead to diverging results.
**Tuning of sampling parameters (presence\_penalty, top\_p etc) and system prompt in the API**
Our defaults are tuned to give the best results from the API and match the default search experience in the UI. We give users the power to tune the API to their respective use cases and custom tuning to specific use cases might lead to less generalization of the API/different results vs the UI. We recommend not to explicitly provide sampling parameters in your API requests if you want parity with the default experience in the UI.
We collect the following types of information:
**API Usage Data:** We collect billable usage metadata such as the number of requests and tokens. You can view your own usage in the [Perplexity API dashboard](https://perplexity.ai/settings/api).
**User Account Information:** When you create an account with us, we collect your name, email address, and other relevant contact information.
Yes! You can use the [Perplexity Sonar Models](https://docs.perplexity.ai/guides/model-cards), which leverage information from Perplexity’s search index and the public internet.
You can find our [rate limits here](https://docs.perplexity.ai/guides/rate-limits).
We email users about new developments and also post in the [changelog](/changelog.mdx).
Please fillout this [form](https://perplexity.typeform.com/apiaccessform?typeform-source=docs.perplexity.ai) if you require a rate limit increase for commercial purposes. We consider rate limit increases on a case by case basis.
401 error codes indicate that the provided API key is invalid, deleted or belongs to an account which ran out of credits. You likely need to purchase more credits in the [Perplexity API dashboard](https://perplexity.ai/settings/api). You can avoid this issue by configuring auto-top-up.
Currently, we do not support fine-tuning.
Please reach out to [api@perplexity.ai](mailto:api@perplexity.ai) or [support@perplexity.ai](mailto:support@perplexity.ai) for other API inquiries. You can also post on our [discussion forum](https://github.com/ppl-ai/api-discussion/discussions) and we will get back to you.
We do not guarantee this at the moment.
# Getting Started with Perplexity API
You can access our api using HTTPS requests.
Authenticating involves the following steps:
Start by visiting the [Perplexity API Settings page](https://www.perplexity.ai/pplx-api).
Register your credit card to get started. This step will not charge your credit card. Rather, it stores payment information for later API usage.
Generate an API key. The API key is a long-lived access token that can be used until it is manually refreshed or deleted.
Send the API key as a bearer token in the Authorization header with each API request.
When you run out of credits, your API keys will be blocked until you add to your credit balance. You can avoid this by configuring "Automatic Top Up", which refreshes your balance whenever you drop below \$2.
Our supported models are listed on the [Supported Models](/api-docs/model-cards) page.
The API is conveniently OpenAI client-compatible for easy integration with existing applications.
```python python
from openai import OpenAI
YOUR_API_KEY = "INSERT API KEY HERE"
messages = [
{
"role": "system",
"content": (
"You are an artificial intelligence assistant and you need to "
"engage in a helpful, detailed, polite conversation with a user."
),
},
{
"role": "user",
"content": (
"How many stars are in the universe?"
),
},
]
client = OpenAI(api_key=YOUR_API_KEY, base_url="https://api.perplexity.ai")
# chat completion without streaming
response = client.chat.completions.create(
model="llama-3.1-sonar-large-128k-online",
messages=messages,
)
print(response)
# chat completion with streaming
response_stream = client.chat.completions.create(
model="llama-3.1-sonar-large-128k-online",
messages=messages,
stream=True,
)
for response in response_stream:
print(response)
```
# Supported Models
## Perplexity Sonar Models
| Model | Parameter Count | Context Length | Model Type |
| ----------------------------------- | --------------- | -------------- | --------------- |
| `llama-3.1-sonar-small-128k-online` | 8B | 127,072 | Chat Completion |
| `llama-3.1-sonar-large-128k-online` | 70B | 127,072 | Chat Completion |
| `llama-3.1-sonar-huge-128k-online` | 405B | 127,072 | Chat Completion |
The search subsystem of the Online LLMs do not attend to the system prompt. You can use the system prompt to provide instructions related to style, tone, and language of the response.
**Beta Access**: The Online LLMs have some features in closed beta - to request access to them, please fill out this [form](https://perplexity.typeform.com/apiaccessform?typeform-source=docs.perplexity.ai).
## Perplexity Chat Models
| Model | Parameter Count | Context Length | Model Type |
| --------------------------------- | --------------- | -------------- | --------------- |
| `llama-3.1-sonar-small-128k-chat` | 8B | 127,072 | Chat Completion |
| `llama-3.1-sonar-large-128k-chat` | 70B | 127,072 | Chat Completion |
## Open-Source Models
Where possible, we try to match the Hugging Face implementation.
| Model | Parameter Count | Context Length | Model Type |
| ------------------------ | --------------- | -------------- | --------------- |
| `llama-3.1-8b-instruct` | 8B | 131,072 | Chat Completion |
| `llama-3.1-70b-instruct` | 70B | 131,072 | Chat Completion |
## Special Tokens
We do not raise any exceptions if your chat inputs contain messages with special tokens. If avoiding prompt injections is a concern for your use case, it is recommended that you check for special tokens prior to calling the API. For more details, read [Meta's recommendations for Llama](https://github.com/meta-llama/llama/blob/008385a/UPDATES.md#token-sanitization-update).
# PerplexityBot
We strive to improve our service every day. To provide the best search experience, we need to collect data. We use web crawlers to gather information from the internet and index it for our search engine.
You can identify our web crawler by its user agent
```JSX JSX
User agent token: PerplexityBot
Full user agent: User-Agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)
```
## Customizing access
### Disallow PerplexityBot
To prevent PerplexityBot from accessing your site data, add a record to your site's robots.txt
```JSX JSX
User-Agent: PerplexityBot
Disallow: /
```
### Custom access rules
You can also customize access, disallowing data retrieval only from specific paths.
```JSX JSX
User-Agent: PerplexityBot
Allow: /public/
Disallow: /private/
```
# Pricing
## Perplexity Sonar Models
| Model | Price per 1000 requests | Price per 1M tokens |
| ----------------------------------- | ----------------------- | ------------------- |
| `llama-3.1-sonar-small-128k-online` | \$5 | \$0.2 |
| `llama-3.1-sonar-large-128k-online` | \$5 | \$1 |
| `llama-3.1-sonar-huge-128k-online` | \$5 | \$5 |
The pricing for sonar models is a combination of a fixed price per request and a small variable price based on number of input and output tokens in a request.
## Perplexity Chat Models
| Model | Price per 1M tokens |
| --------------------------------- | ------------------- |
| `llama-3.1-sonar-small-128k-chat` | \$0.2 |
| `llama-3.1-sonar-large-128k-chat` | \$1 |
## Open-Source Models
| Model | Price per 1M tokens |
| ------------------------ | ------------------- |
| `llama-3.1-8b-instruct` | \$0.2 |
| `llama-3.1-70b-instruct` | \$1 |
# Rate Limits
## Perplexity Sonar Models
| Model | Request Rate Limit |
| ----------------------------------- | ------------------ |
| `llama-3.1-sonar-small-128k-online` | 50/min |
| `llama-3.1-sonar-large-128k-online` | 50/min |
| `llama-3.1-sonar-huge-128k-online` | 50/min |
## Perplexity Chat Models
| Model | Request Rate Limit |
| --------------------------------- | ------------------ |
| `llama-3.1-sonar-small-128k-chat` | 50/min |
| `llama-3.1-sonar-large-128k-chat` | 50/min |
## Open Source Models
| Model | Request rate limit |
| ------------------------ | ------------------ |
| `llama-3.1-8b-instruct` | - 100/min |
| `llama-3.1-70b-instruct` | - 60/min |
We limit model usage if the request rate exceeds the limit for any given model.
To request elevated rate limits, fill out this [form](https://perplexity.typeform.com/apiaccessform).
# null
export function openSearch() {
document.getElementById('search-bar-entry').click();
}
Welcome to Perplexity API
Our online models are focused on delivering helpful, up-to-date, and factual responses to all your questions
# Application Status
You can check the status of our services [here](https://status.perplexity.com/).
If you experience any issues or have any questions, please contact us at [api@perplexity.ai](mailto:api@perplexity.ai) or flag a bug report in our [Discord](https://discord.com/invite/perplexity-ai) channel.