The Agent API runs a bounded multi-turn loop: on each turn the model can call a tool (such asDocumentation Index
Fetch the complete documentation index at: https://docs.perplexity.ai/llms.txt
Use this file to discover all available pages before exploring further.
web_search), read the result, and decide whether to continue or answer. Prompts that work well with single-shot LLMs often underperform here, because the same text shapes tool selection, search query generation, and final response together.
Two parameters drive most of the prompt design:
instructionssets the role, tone, formatting, and grounding rules that apply regardless of the user’s question.inputholds the actual question. It also seeds the first search query, so specificity here directly improves retrieval.
Instructions
Use theinstructions parameter for role, tone, language, formatting, and grounding rules. Instructions apply on every turn of the agent loop, so put things here that hold regardless of the user’s question.
Example instructions block:
Instructions
instructions focused. They are re-read on every turn of the agent loop, so bloat compounds across tool calls. If your block is growing long, check whether parts of it would be better expressed as request parameters: use response_format with a JSON schema for machine-readable output, web_search filters for retrieval constraints, or move query-specific framing into input.
Built-in tools like web_search and fetch_url are tuned to work well without prompt-side guidance. You don’t need to describe what they do, when to call them, or how to construct queries. Adjust tool-call count with the max_steps parameter and search constraints with web_search filters. If you’re using custom instructions and want to nudge how the model uses built-in tools, you can reference them there as well.
For custom function tools you define yourself, the model relies on the description and parameter schema you provide, so make those as clear as you can. You can reinforce the tool’s role in instructions if the description alone isn’t enough to steer behavior.
Input
Use theinput parameter for the actual query you want answered. Input strongly shapes search behavior, so descriptive and specific phrasing directly improves retrieval. Vague inputs lead to vague searches.
Example user prompt:
Input
API Example
Best Practices
Be Specific and Descriptive
Use natural language, but include the vocabulary and context that would actually appear on relevant pages. Add a few words of context to disambiguate when a term could mean multiple things. Specificity in
input directly improves retrieval.Good Example: “Compare energy efficiency ratings of heat pumps vs. traditional HVAC for residential use”Poor Example: “Tell me which home heating is better”Cap Result Counts
If you want a list, say how long. Without an explicit cap, the model picks an arbitrary length.Good Example: “List the top 5 sushi restaurants in Tokyo”Poor Example: “Give me a list of sushi restaurants”
Use Instructions to Shape Tool Output
Can be useful if you want to nudge how the model handles tool output. Things like citation style, grounding behavior, or response formatting fit naturally here, since instructions apply on every turn of the agent loop.Example (
instructions): “Cite sources inline by domain (e.g., reuters.com). State explicitly when tool results don’t fully answer the question.”Reading Sources from the Response
Read URLs and source metadata from the response payload, not from the model’s written answer. For non-streaming responses, search results are available at the top level asresponse.search_results and inside response.output[] as items where type == "search_results" (both carry the same data). Pull URLs from results[].url. For streaming, listen for response.reasoning.search_results events. See Output Control for the full response shape.
The model has access to URLs from tool output and can include them in its response if asked, but it’s prone to mistyping or paraphrasing them. Presets also configure the model to cite by index (e.g., [web:1]), not by URL, so asking for URLs in prose fights the default citation format. Treat the model’s text as the prose answer and the structured search_results field as the authoritative source list.
Reduce Hallucinations
LLMs are tuned to be helpful, which can occasionally lead them to provide an answer when search results are thin or off-target rather than flagging the gap. The agent loop helps, since the model can refine queries and search again, but it does not eliminate the failure modes. Hallucination is most likely when the information isn’t web-accessible (LinkedIn posts, private documents, paywalled content), when repeated searches return related but non-matching results, or when very recent information isn’t indexed yet. A few short additions toinstructions cover most of these cases. Grounding rules belong here because instructions are re-read on every turn of the agent loop, so the same rule applies to the first search and to any follow-ups.
Give the model permission to say it didn’t find anything. With an explicit out, the model is more likely to acknowledge insufficient results instead of leaning on training data to fill the gap.
Instructions
Instructions
Use Parameters, Not Prose, for Hard Constraints
For source, date, or region constraints, prefer theweb_search parameters over describing the constraint in prose. Parameters are applied by the search backend on every call, while prose-based filters are interpreted by the model and may not carry through every turn of the loop.
Keep input focused on the question itself, and move structural constraints into the tool config:
Avoid
Prefer
Next Steps
Output Control
Shape responses with
response_format and learn the full response payload structure.Filters
Constrain search with domain, recency, and region parameters.
Tools
Configure
web_search and other tools available to the Agent API.Presets
Choose a preset that matches your latency, depth, and tool requirements.