Rate Limits
We limit usage for model if a user's request rate or token usage rate exceeds any of the limits for that model.
Perplexity Models
Model | Request rate limit | Token rate limit |
---|---|---|
llama-3-sonar-small-32k-online | - 20/min | - 2,000,000/min |
llama-3-sonar-small-32k-chat | - 20/min | - 2,000,000/min |
llama-3-sonar-large-32k-online | - 20/min | - 2,000,000/min |
llama-3-sonar-large-32k-chat | - 20/min | - 2,000,000/min |
Open-Source Models
Model | Request rate limit | Token rate limit |
---|---|---|
mixtral-8x7b-instruct | - 24/min | - 16,000/min |
llama-3-8b-instruct | - 100/min | - 160,000/min |
llama-3-70b-instruct | - 60/min | - 40,000/min |
To request elevated rate limits, fill out this form and an send email describing your use-case to [email protected]
Updated 11 days ago