Rate Limits
We limit usage for model if a user's request rate or token usage rate exceeds any of the limits for that model.
Perplexity Models
Model | Request rate limit | Token rate limit |
---|---|---|
llama-3-sonar-small-32k-chat | - 20/minute | - 2000000/minute |
llama-3-sonar-small-32k-online | - 20/minute | - 2000000/minute |
llama-3-sonar-large-32k-chat | - 20/minute | - 2000000/minute |
llama-3-sonar-large-32k-online | - 20/minute | - 2000000/minute |
Open-Source Models
Model | Request rate limit | Token rate limit |
---|---|---|
mixtral-8x7b-instruct | - 8/5seconds - 24/minute - 240/hour | - 16000/minute - 64000/10minutes |
llama-3-8b-instruct | - 20/5seconds - 100/minute - 1000/hour | - 16000/10seconds - 160000/minute - 512000/10minutes |
llama-3-70b-instruct | - 20/5seconds - 60/minute - 600/hour | - 40000/minute - 160000/10minutes |
To request elevated rate limits, fill out this form and an send email describing your use-case to [email protected]
Updated 5 days ago