Supported Models
Where possible, we try to match the Hugging Face implementation. We are open to adjusting the API, so please reach out with feedback regarding these details.
Model | Context Length | Model Type |
---|---|---|
codellama-34b-instruct | 16384 | Chat Completion |
llama-2-70b-chat | 4096 | Chat Completion |
mistral-7b-instruct \[2\] | 4096 [1] | Chat Completion |
mixtral-8x7b-instruct | 4096 [1] | Chat Completion |
pplx-7b-chat | 8192 | Chat Completion |
pplx-70b-chat | 4096 | Chat Completion |
pplx-7b-online | 4096 | Chat Completion |
pplx-70b-online | 4096 | Chat Completion |
[1] We will be increasing the context length of
mistral-7b-instruct
to 32k tokens (see roadmap).[2] This model refers to the
v0.2
release ofmistral-7b-instruct
.
Special Tokens
We do not raise any exceptions if your chat inputs contain messages with special tokens. If avoiding prompt injections is a concern for your use case, it is recommended that you check for special tokens prior to calling the API. For more details, read Meta’s recommendations for Llama.
Online LLMs
It is recommended to use only single-turn conversations for the online LLMs (pplx-7b-online
and pplx-70b-online
). Any system messages given in the request will additionally be ignored.