Chat Completions
Generates a model’s response for the given chat conversation.
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Body
The name of the model that will complete your prompt. Refer to Supported Models to find all the models offered.
"sonar"
A list of messages comprising the conversation so far.
[
{
"role": "system",
"content": "Be precise and concise."
},
{
"role": "user",
"content": "How many stars are there in our galaxy?"
}
]
The maximum number of completion tokens returned by the API. Controls the length of the model's response. If the response would exceed this limit, it will be truncated. Higher values allow for longer responses but may increase processing time and costs.
The amount of randomness in the response, valued between 0 and 2. Lower values (e.g., 0.1) make the output more focused, deterministic, and less creative. Higher values (e.g., 1.5) make the output more random and creative. Use lower values for factual/information retrieval tasks and higher values for creative applications.
0 <= x < 2
The nucleus sampling threshold, valued between 0 and 1. Controls the diversity of generated text by considering only the tokens whose cumulative probability exceeds the top_p value. Lower values (e.g., 0.5) make the output more focused and deterministic, while higher values (e.g., 0.95) allow for more diverse outputs. Often used as an alternative to temperature.
A list of domains to limit search results to. Currently limited to only 3 domains for whitelisting and blacklisting. For blacklisting, add a -
at the beginning of the domain string. More information about this here.
Determines whether search results should include images.
Determines whether related questions should be returned.
Filters search results based on time (e.g., 'week', 'day').
The number of tokens to keep for top-k filtering. Limits the model to consider only the k most likely next tokens at each step. Lower values (e.g., 10) make the output more focused and deterministic, while higher values allow for more diverse outputs. A value of 0 disables this filter. Often used in conjunction with top_p to control output randomness.
Determines whether to stream the response incrementally.
Positive values increase the likelihood of discussing new topics. Applies a penalty to tokens that have already appeared in the text, encouraging the model to talk about new concepts. Values typically range from 0 (no penalty) to 2.0 (strong penalty). Higher values reduce repetition but may lead to more off-topic text.
Decreases likelihood of repetition based on prior frequency. Applies a penalty to tokens based on how frequently they've appeared in the text so far. Values typically range from 0 (no penalty) to 2.0 (strong penalty). Higher values (e.g., 1.5) reduce repetition of the same words and phrases. Useful for preventing the model from getting stuck in loops.
Enables structured JSON output formatting.
Configuration for using web search in model responses.
{ "search_context_size": "high" }
Response
The response is of type any
.