Rate Limits

We limit usage for model if a user's request rate or token usage rate exceeds any of the limits for that model.

Perplexity Models

ModelRequest rate limitToken rate limit
sonar-small-chat- 8/5seconds
- 24/minute
- 240/hour
- 16000/minute
- 64000/10minutes
sonar-small-online- 20/minuteN/A
sonar-medium-chat- 8/5seconds
- 24/minute
- 240/hour
- 16000/minute
- 64000/10minutes
sonar-medium-online- 20/minuteN/A

Open-Source Models

ModelRequest rate limitToken rate limit
mistral-7b-instruct- 20/5seconds
- 100/minute
- 1000/hour
- 16000/10seconds
- 160000/minute
- 512000/10minutes
mixtral-8x7b-instruct- 8/5seconds
- 24/minute
- 240/hour
- 16000/minute
- 64000/10minutes
codellama-70b-instruct- 20/5seconds
- 60/minute
- 600/hour
- 40000/minute
- 160000/10minutes

Note: these rate limits are subject to change.