Skip to main content
Every agent run begins with one Agent API request. Before prompts, tools, or output format, you decide what runs and how far it can go: which model answers, whether you start from a preset, and how many tool-use steps the agent can take. This page sets up the call the rest of this section builds on.

Pick the engine: model, models, or preset

Each run needs an engine. There are three ways to set one, and at least one is required:
FieldWhat it is
modelA single model in provider/model format, for example openai/gpt-5.5.
modelsA fallback chain of up to 5 models, tried in order until one succeeds.
presetA named, pre-tuned bundle: model, system prompt, search config, and tools.
If you set models, it takes precedence over model. A preset gives you a working agent in one field. Override individual settings on top of it as needed.
from perplexity import Perplexity

client = Perplexity()

response = client.responses.create(
    model="openai/gpt-5.5",
    input="Summarize the key risks in this quarter's earnings call.",
    instructions="You are a financial analyst. Be precise and cite figures.",
    max_steps=5,
)

print(response.output_text)
Use a preset to start from a tuned configuration, then move to an explicit model (or models chain) when you need full control. See Presets for the available presets and what each one bundles, and Models for the model catalog and pricing.

Set the standing rules: instructions

instructions are the system rules that hold for every turn of the agent loop — role, tone, and grounding rules. They apply regardless of what the user asks on a given turn.
instructions="You are a research assistant. Cite every claim by source domain, and never speculate beyond the retrieved evidence."
A preset ships its own tuned system prompt. Setting instructions replaces that prompt rather than appending to it; omit instructions to keep the preset’s prompt. Keep instructions to durable rules — per-question framing belongs in input, which the next page covers.
Keep instructions lean. Every token here is re-processed on each step of the loop, so it adds up across tool calls. For machine-readable output or retrieval constraints, prefer request parameters (see Shape the output and Give it tools) over prose rules.

Bound the loop: max_steps

An agent run is a loop: the model reasons, optionally calls tools, reads the results, and repeats until it answers. One step is one pass through that cycle — a single model turn that may call tools. max_steps caps how many steps a run may take. Use it to bound runaway loops and to trade latency against depth. When a run reaches the cap, the agent doesn’t error — it makes one final pass to answer from what it has gathered so far.
You wantSet
One pass — a tool can still run, but no looping on its resultsmax_steps: 1
A few rounds of search-and-refinemax_steps: 35
Deep, multi-step researchmax_steps: 10+
A low max_steps doesn’t disable tools; it limits how many times the agent can act on what they return. At max_steps: 1 the model still gets one turn and can call a tool — the run just won’t loop back to reason over the result. If you pass max_steps alongside a preset, it overrides the preset’s value, up to the preset’s own ceiling for chat-style presets. To cap raw generation length rather than loop iterations, set max_output_tokens instead.

Run it in the background

Long agent runs don’t have to block. Set background: true to submit the run and retrieve it later by ID with GET /v1/responses/{id} — useful for deep research, sandbox work, or any run that may take minutes. Background and streaming runs are covered in Shape the output.

Next steps

Prompt the agent

Split standing rules from the question, and ground the run.

Give it tools

Enable built-in and function tools, then read their results.

Presets

See the tuned presets you can start from.

Models

Browse the model catalog and token pricing.