Rate Limits

We limit usage for model if a user's request rate or token usage rate exceeds any of the limits for that model.

Perplexity Models

ModelRequest rate limitToken rate limit
llama-3-sonar-small-32k-online- 20/min- 2,000,000/min
llama-3-sonar-small-32k-chat- 20/min- 2,000,000/min
llama-3-sonar-large-32k-online- 20/min- 2,000,000/min
llama-3-sonar-large-32k-chat- 20/min- 2,000,000/min

Open-Source Models

ModelRequest rate limitToken rate limit
mixtral-8x7b-instruct- 24/min- 16,000/min
llama-3-8b-instruct- 100/min- 160,000/min
llama-3-70b-instruct- 60/min- 40,000/min

To request elevated rate limits, fill out this form and an send email describing your use-case to [email protected]