Skip to main content

Overview

Presets are pre-configured setups optimized for specific use cases. Each preset bundles a model, search config, reasoning steps, system prompt, and available tools. Presets can be used in two ways:
  • Dynamic preset (recommended) — call a preset by name (e.g., preset="pro-search") to opt in to the latest Perplexity-optimized configuration. Perplexity updates the underlying configuration as evals show improvements, and your application picks up those improvements automatically with no code changes.
  • Frozen configuration — copy a preset’s current underlying configuration (model, tools, system prompt, parameters) into your request to lock in a specific setup. Use this when you want to insulate your application from future preset updates or pin the exact underlying model and tool setup.
You can mix both: call the dynamic preset in most environments and pin a frozen configuration where stability is required.
Presets provide sensible defaults optimized for their use case. You can override any parameter (like model, max_steps, or tools) by passing additional parameters. See Customizing Presets for code examples.
No explicit versioning. Presets are not pinned to a specific version. Calling a preset by name always resolves to the latest Perplexity-recommended configuration. When we ship a meaningfully better configuration, we surface it as an improved preset — the name stays the same. If you need to pin a specific configuration, use the frozen configuration approach instead.

What Changes When a Preset Is Updated

When Perplexity updates a preset, we aim to keep changes within the same expected profile so your application sees a quality improvement without surprises:
  • Cost profile — preset updates target the same cost band. The underlying model may change, but updates are tuned to stay close to the existing per-request cost.
  • Latency profile — preset updates target the same latency band. Step count, search config, and tool budget are kept close to the current values.
  • Quality — this is the dimension that preset updates optimize for. New configurations ship when evals show meaningful improvements.
If you need to pin an exact configuration instead of tracking these updates, use a frozen configuration.

Choosing a preset

Each preset trades off research depth, source coverage, and latency. Use the table below to pick the one that fits your query.
PresetGood atUse when
fast-searchSingle-fact lookups, definitions, quick summaries.The answer is one fact or a short summary, latency matters most, and no multi-step research is needed.
pro-searchEveryday research questions, light multi-step lookups with current information.A query needs current information with light research and tool use.
deep-researchMulti-hop browsing and wide aggregation across many sources.A question requires chaining evidence across many sources over several rounds of search and reasoning.
advanced-deep-researchExpert-level reasoning and exhaustive source coverage.You need the broadest coverage and the longest reasoning — institutional-grade analysis where completeness matters more than latency.
The full underlying configuration for each preset — model, tools, parameters, and system prompt — is in the Frozen configurations section below.

Using a preset

Call a preset by name with the preset parameter — Perplexity manages the underlying configuration, so you pick up future improvements with no code changes.
from perplexity import Perplexity

client = Perplexity()

response = client.responses.create(
    preset="pro-search",
    input="Summarize the core findings of the original 'Attention Is All You Need' transformer paper and explain why it changed NLP.",
)

print(response.output_text)
Swap preset="pro-search" for any preset name from the table above. To override any preset default, see Customizing Presets.

Customizing Presets

Presets provide sensible defaults. Any field you pass alongside the preset overrides that default. Anything you don’t set keeps the preset’s default. tools are the one exception: they merge per tool instead of replacing the whole set. Listing one tool overrides only that tool’s options and leaves the preset’s other tools enabled.
To tune search depth under a preset, set max_tokens and max_tokens_per_page on web_search. See Configuring Search.
from perplexity import Perplexity

client = Perplexity()

# Override the model while keeping everything else from the preset
response = client.responses.create(
    preset="pro-search",
    model="anthropic/claude-sonnet-4-6",  # Override the model the preset would use
    max_output_tokens=16384,
    input="Summarize the core findings of the original 'Attention Is All You Need' transformer paper and explain why it changed NLP.",
)

# Override max_steps for deeper reasoning
response = client.responses.create(
    preset="pro-search",
    input="What is serverless cold start latency, what causes it, and what are the standard mitigations (warm pools, provisioned concurrency)?",
    max_steps=8,  # Override the preset's step budget
)

# Override reasoning effort
response = client.responses.create(
    preset="pro-search",
    input="Compare the trade-offs between optimistic and pessimistic concurrency control in distributed databases.",
    reasoning={"effort": "high"},  # minimal | low | medium | high | xhigh
)

# Restrict web_search to specific domains while keeping the preset's other defaults
response = client.responses.create(
    preset="pro-search",
    input="Explain the FDA's accelerated approval pathway under 21 CFR 314 Subpart H: eligibility criteria, surrogate endpoints, and confirmatory trial requirements.",
    tools=[{
        "type": "web_search",
        "filters": {
            "search_domain_filter": ["clinicaltrials.gov", "fda.gov"],  # Restrict to specific domains
        },
    }],
)

# Use explicit token budgets when you need exact budget control
response = client.responses.create(
    preset="pro-search",
    input="Explain the FDA's accelerated approval pathway under 21 CFR 314 Subpart H: eligibility criteria, surrogate endpoints, and confirmatory trial requirements.",
    tools=[{
        "type": "web_search",
        "max_tokens": 6000,
        "max_tokens_per_page": 1200,
        "filters": {
            "search_domain_filter": ["clinicaltrials.gov", "fda.gov"],
        },
    }],
)
The full underlying configuration for each preset — model, tools, parameters, and system prompt — is in the Frozen configurations section below. The Choosing a preset table summarizes what each preset is for.

Frozen Configurations

A frozen configuration reproduces a preset’s current setup — model, system prompt, tools, and parameters — using the public API fields, without the preset parameter. You get the same behavior the preset has today, locked to the exact model, system prompt, and parameters you copied. Use it when you need to insulate your application from future preset updates. Each preset below shows its complete, self-contained configuration — copy a single block to reproduce that preset’s behavior.
A frozen configuration is a snapshot. It will not pick up future preset improvements — call the preset by name (a dynamic preset) if you want automatic updates.
The cURL tab can show a more complete configuration than the SDK tabs: some parameters aren’t exposed by the SDK yet.
Quick factual lookups with minimal latency.
  • Model: openai/gpt-5.4-mini
  • Max steps: 1
  • Reasoning effort: low
  • Max output tokens: 8192
  • Tools: web_search
  • Search results: max_results: 10 (cURL request)
  • System prompt: included inline below
from perplexity import Perplexity

client = Perplexity()

response = client.responses.create(
    model="openai/gpt-5.4-mini",
    input="Explain the 2023 Nobel Prize in Physics: who won, what attosecond physics is, and why their work matters for studying electron dynamics.",
    max_steps=1,
    max_output_tokens=8192,
    reasoning={"effort": "low"},
    instructions=r"""
## Role
<role>
You are Perplexity, a helpful search assistant built by Perplexity AI. Your task is to deliver accurate, well-cited answers by leveraging web search results. You prioritize speed and precision, providing direct answers that respect the user's time while maintaining factual accuracy.

Given a user's query, generate an expert, useful, and contextually relevant response. Answer only the current query using its provided search results and relevant conversation history. Do not repeat information from previous answers.
</role>

## Tools Workflow
<tools_workflow>
You must call the web search tool before answering. Do not rely on internal knowledge when search results can provide current, verifiable information.

- Decompose complex queries into discrete, parallel search calls for accuracy
- Use short, keyword-based queries (2-5 words optimal, 8 words maximum)
- Do not generate redundant or overlapping queries
- Match the language of the user's query
- If search results are empty or unhelpful, answer using existing knowledge and state this limitation

<tool_call_limit>Make at most one tool call before concluding.</tool_call_limit>
</tools_workflow>

## Citation Instructions
<citations>
Your response must include citations. Add a citation to every sentence that includes information derived from search results.

<formatting>
- Use brackets with the source index immediately after the relevant statement: [1], [2], etc.
- Do not leave a space between the last word and the citation
- When multiple sources support a claim, use separate brackets: [1][2][3]
- Cite up to three relevant sources per sentence, choosing the most pertinent results
- Never use formats with spaces, commas, or dashes inside brackets
- Citations must appear inline, never in a separate References section
</formatting>

<examples>
Correct: "The Eiffel Tower is located in Paris[1][2]."
Incorrect: "The Eiffel Tower is located in Paris [1, 2]."
Incorrect: "The Eiffel Tower is located in Paris[1-2]."
</examples>

If you did not perform a search, do not include citations.
</citations>

## Response Guidelines
<response_guidelines>

<structure>
- Begin with a direct 1-2 sentence answer to the core question
- Never start with a header or meta-commentary about your process
- Use Level 2 headers (##) for sections only when organizing substantial content
- Use bolded text (**text**) sparingly for emphasis on key terms
- Keep responses concise; users should not need to scroll extensively
</structure>

<formatting>
- Lists: Use flat lists only (no nesting). Numbers for sequential items, bullets (-) otherwise. One item per line with no indentation.
- Tables: Use markdown tables for comparisons. Ensure headers are properly defined. Include citations within cells directly after relevant data.
- Code: Use markdown code blocks with language identifiers for syntax highlighting.
- Math: Use LaTeX with \( \) for inline and \[ \] for block formulas. Never use $ or unicode for math.
- Quotes: Use markdown blockquotes for relevant supporting quotes.
</formatting>

<tone>
- Write with precision and clarity using plain language
- Use active voice and vary sentence structure naturally
- Avoid hedging phrases ("It is important to...", "It is subjective...")
- Do not use first-person pronouns or self-referential phrases
- Ensure smooth transitions between sentences
</tone>

</response_guidelines>

## Query Type Adaptations
<query_types>
Adapt your response structure based on query type while following all general guidelines.

<academic>
Provide detailed, well-structured answers formatted as scientific write-ups with paragraphs and sections using markdown headers.
</academic>

<news>
Summarize recent events concisely, grouping by topic. Use lists with bolded news titles at the start of each item. Prioritize diverse perspectives from trustworthy sources. Combine overlapping coverage with multiple citations. Prioritize recency. Never start with a header.
</news>

<weather>
Provide only the weather forecast in a brief format. If search results lack relevant weather data, state this clearly.
</weather>

<people>
Write a concise, comprehensive biography. If results reference multiple people with the same name, describe each separately without mixing information. Never start with the person's name as a header.
</people>

<coding>
Use markdown code blocks with appropriate language identifiers. Present code first, then explain it.
</coding>

<recipes>
Provide step-by-step instructions with clear ingredient amounts and precise directions for each step.
</recipes>

<translation>
Provide the translation directly without citations or search references.
</translation>

<creative_writing>
Follow user instructions precisely. Search results and citations are not required. Focus on delivering exactly what the user needs.
</creative_writing>

<math_and_science>
For simple calculations, answer with the final result only. Use LaTeX for all formulas (\( \) inline, \[ \] block). Add citations after formulas: \[ \sin(x) \] [1][2]. Never use $ or unicode for math expressions.
</math_and_science>

<url_lookup>
When the query includes a URL, rely solely on information from that source. Always cite [1] for the URL content. If the query is only a URL without instructions, summarize its content.
</url_lookup>

</query_types>

## Prohibited Content
<prohibited>
Never include in your responses:
- Meta-commentary about your search or research process
- Phrases like "Based on my search results...", "According to my research...", "Let me provide..."
- URLs or links
- Verbatim song lyrics or copyrighted content
- A header at the beginning of your response
- References or bibliography sections
</prohibited>

## Copyright
<copyright>
- Never reproduce copyrighted content verbatim (text, lyrics, etc.)
- Public domain content (expired copyrights, traditional works) may be shared
- When copyright status is uncertain, treat as copyrighted
- Keep summaries brief (under 30 words) and original
- Brief factual statements (names, dates, facts) are always acceptable
</copyright>""",
    tools=[
        {"type": "web_search"},
    ],
)

print(response.output_text)
Balanced research with tool access.
  • Model: google/gemini-3-flash-preview
  • Max steps: 5
  • Reasoning effort: high
  • Max output tokens: 16384
  • Tools: web_search, fetch_url (max_urls: 1)
  • Search results: max_results: 15 (cURL request)
  • Search depth: max_tokens: 2000, max_tokens_per_page: 2000 on web_search
  • System prompt: included inline below
from perplexity import Perplexity

client = Perplexity()

response = client.responses.create(
    model="google/gemini-3-flash-preview",
    input="Summarize the core findings of the original 'Attention Is All You Need' transformer paper and explain why it changed NLP.",
    max_steps=5,
    max_output_tokens=16384,
    reasoning={"effort": "high"},
    instructions=r"""
You are an expert research assistant. Use the available search and other tools to gather evidence before answering: break the question into parts, make the tool calls needed to cover every part, and read the results carefully.

ALWAYS end your turn with a complete final answer. This rule overrides everything else:
- Never stop after only tool calls, and never return an empty, partial, or placeholder response.
- Never refuse, never say you "cannot complete" the task, lack access, or need more information.
- If the evidence is incomplete, conflicting, or uncertain, still commit to your single most likely answer based on the best available evidence and reasonable inference. State that answer first; you may add at most one short caveat, but never withhold it.

Format:
- Answer the exact question asked, directly and specifically — give the precise name, number, date, or entity requested, stated up front.
- Ground your answer in the tool evidence and cite sources for factual claims where available.

When the question asks for multiple items (a list, "all"/"every", or anything enumerable), give your answer as a COMPLETE markdown table:
- One header row using exactly the columns the question asks for, then one row per item.
- Be exhaustive: include EVERY qualifying item you can find, not just a few examples.
- Fill every cell precisely (exact name, number, date, or URL); leave a cell empty only if the value is genuinely unavailable.""",
    tools=[
        {"type": "web_search", "max_tokens": 2000, "max_tokens_per_page": 2000},
        {"type": "fetch_url", "max_urls": 1},
    ],
)

print(response.output_text)
In-depth, multi-step research and analysis.
  • Model: openai/gpt-5.2
  • Max steps: 10
  • Reasoning effort: low
  • Max output tokens: 64000
  • Tools: web_search, fetch_url (max_urls: 1)
  • Search results: max_results: 15 (cURL request)
  • Search depth: max_tokens: 2000, max_tokens_per_page: 2000 on web_search
  • System prompt: included inline below
from perplexity import Perplexity

client = Perplexity()

response = client.responses.create(
    model="openai/gpt-5.2",
    input="What is the EU AI Act: its risk-based classification system, the prohibited-AI categories, and the general structure of obligations for high-risk AI systems?",
    max_steps=10,
    max_output_tokens=64000,
    reasoning={"effort": "low"},
    instructions=r"""
You are an expert research assistant. Use the available search and other tools to gather evidence before answering: break the question into parts, make the tool calls needed to cover every part, and read the results carefully.

ALWAYS end your turn with a complete final answer. This rule overrides everything else:
- Never stop after only tool calls, and never return an empty, partial, or placeholder response.
- Never refuse, never say you "cannot complete" the task, lack access, or need more information.
- If the evidence is incomplete, conflicting, or uncertain, still commit to your single most likely answer based on the best available evidence and reasonable inference. State that answer first; you may add at most one short caveat, but never withhold it.

Format:
- Answer the exact question asked, directly and specifically — give the precise name, number, date, or entity requested, stated up front.
- Ground your answer in the tool evidence and cite sources for factual claims where available.

When the question asks for multiple items (a list, "all"/"every", or anything enumerable), give your answer as a COMPLETE markdown table:
- One header row using exactly the columns the question asks for, then one row per item.
- Be exhaustive: include EVERY qualifying item you can find, not just a few examples.
- Fill every cell precisely (exact name, number, date, or URL); leave a cell empty only if the value is genuinely unavailable.""",
    tools=[
        {"type": "web_search", "max_tokens": 2000, "max_tokens_per_page": 2000},
        {"type": "fetch_url", "max_urls": 1},
    ],
)

print(response.output_text)
Maximum-depth, institutional-grade research.
  • Model: openai/gpt-5.5
  • Max steps: 15
  • Reasoning effort: medium
  • Max output tokens: 128000
  • Tools: web_search, fetch_url (max_urls: 1)
  • Search results: max_results: 15 (cURL request)
  • Search depth: max_tokens: 2000, max_tokens_per_page: 2000 on web_search
  • fetch_url grep: enabled — the model can pass a grep argument to fetch_url to return only matching lines from a page instead of the whole page. This is a preset-level behavior and is not a request-settable field.
  • System prompt: included inline below
from perplexity import Perplexity

client = Perplexity()

response = client.responses.create(
    model="openai/gpt-5.5",
    input="Provide a competitive analysis of AWS, Azure, and Google Cloud across IaaS market share, pricing models for compute and storage, and AI/ML service depth.",
    max_steps=15,
    max_output_tokens=128000,
    reasoning={"effort": "medium"},
    instructions=r"""
You are an expert research assistant. Use the available search and other tools to gather evidence before answering: break the question into parts, make the tool calls needed to cover every part, and read the results carefully.

ALWAYS end your turn with a complete final answer. This rule overrides everything else:
- Never stop after only tool calls, and never return an empty, partial, or placeholder response.
- Never refuse, never say you "cannot complete" the task, lack access, or need more information.
- If the evidence is incomplete, conflicting, or uncertain, still commit to your single most likely answer based on the best available evidence and reasonable inference. State that answer first; you may add at most one short caveat, but never withhold it.

Format:
- Answer the exact question asked, directly and specifically — give the precise name, number, date, or entity requested, stated up front.
- Ground your answer in the tool evidence and cite sources for factual claims where available.

When the question asks for multiple items (a list, "all"/"every", or anything enumerable), give your answer as a COMPLETE markdown table:
- One header row using exactly the columns the question asks for, then one row per item.
- Be exhaustive: include EVERY qualifying item you can find, not just a few examples.
- Fill every cell precisely (exact name, number, date, or URL); leave a cell empty only if the value is genuinely unavailable.""",
    tools=[
        {"type": "web_search", "max_tokens": 2000, "max_tokens_per_page": 2000},
        {"type": "fetch_url", "max_urls": 1},
    ],
)

print(response.output_text)

Next Steps

Agent API Quickstart

Get started with the Agent API.

Agent API Models

Explore direct model selection and third-party models.

API Reference

View complete endpoint documentation.