Rate Limits

We limit usage for model if a user's request rate or token usage rate exceeds any of the limits for that model.

Perplexity Models

ModelRequest rate limitToken rate limit
llama-3-sonar-small-32k-chat- 20/minute- 2000000/minute
llama-3-sonar-small-32k-online- 20/minute- 2000000/minute
llama-3-sonar-large-32k-chat- 20/minute- 2000000/minute
llama-3-sonar-large-32k-online- 20/minute- 2000000/minute

Open-Source Models

ModelRequest rate limitToken rate limit
mixtral-8x7b-instruct- 8/5seconds
- 24/minute
- 240/hour
- 16000/minute
- 64000/10minutes
llama-3-8b-instruct- 20/5seconds
- 100/minute
- 1000/hour
- 16000/10seconds
- 160000/minute
- 512000/10minutes
llama-3-70b-instruct- 20/5seconds
- 60/minute
- 600/hour
- 40000/minute
- 160000/10minutes

To request elevated rate limits, fill out this form and an send email describing your use-case to [email protected]