Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.perplexity.ai/llms.txt

Use this file to discover all available pages before exploring further.

Overview

This guide shows how to expose Perplexity’s Search API to an OpenAI model as a function tool. The Responses API uses a manual tool-call loop: the model emits a function_call item with a call_id, you execute the tool (in this case, call client.search.create), and you send the result back as a function_call_output item paired by call_id. The loop continues until the response no longer contains pending function calls.
This page uses the Responses API (client.responses.create), the newer OpenAI surface. The tool shape is flat ({type: "function", name, description, parameters, strict}) — different from the legacy chat.completions function-calling shape that nests under function: {…}.

Prerequisites

pip install openai perplexityai
export OPENAI_API_KEY="your_openai_key"
export PERPLEXITY_API_KEY="your_perplexity_key"

Tool definition

The Responses API takes a flat tool object: type, name, description, parameters (JSON Schema), and optional strict. The description below was tuned for Perplexity’s Search API; keep it verbatim — the wording is what produces good, short, keyword-style queries.
WEB_SEARCH_TOOL_DESCRIPTION = """\
Searches the web for current and factual information to answer user queries, returning relevant results with titles, URLs, and content snippets, similar to Google or Bing. Intended for questions about up-to-date or externally verified information beyond your knowledge cutoff. The tool works best with an array of short, keyword-focused queries. Complex queries that require multi-step reasoning are not supported. Time-sensitive queries are supported if the date is included in the query.

Best practices for using this tool:
- Limit the number of queries in each request to a maximum of three to maintain efficiency.
- For multi-entity questions, break them into separate, single-entity queries:
  - Preferred:
    [
      "Brand A protein powder review",
      "Brand B protein powder review"
    ]
  - Not recommended:
    [
      "Brand A vs Brand B protein powder review"
    ]

- For simple queries, keep each query straightforward and focused:
  - Preferred: ["inflation rate Canada"]
  - Not recommended: ["What is the inflation rate in Canada?"]

Each query should be short to ensure optimal tool performance. Make sure all provided examples and generated queries follow this guideline."""

QUERIES_PARAM_DESCRIPTION = (
    "An array of keyword-based search queries. Each query should be short, "
    "as longer queries may reduce performance. Do not provide more than three "
    "queries to maintain efficiency."
)

WEB_SEARCH_TOOL = {
    "type": "function",
    "name": "web_search",
    "description": WEB_SEARCH_TOOL_DESCRIPTION,
    "strict": True,
    "parameters": {
        "type": "object",
        "additionalProperties": False,
        "properties": {
            "queries": {
                "type": "array",
                "description": QUERIES_PARAM_DESCRIPTION,
                "items": {"type": "string"},
            },
        },
        "required": ["queries"],
    },
}
Strict mode constraints. When strict: true, the JSON Schema must set additionalProperties: false and list every property in required. OpenAI rejects maxItems/minItems and similar constraints in strict mode — enforce the “max three queries” guidance through the description, then truncate defensively in your handler.

Tool handler

The handler is a thin wrapper around client.search.create. The Search API natively accepts an array of queries, so the array the model emits can be passed straight through.
from perplexity import Perplexity

perplexity = Perplexity()

def run_web_search(queries: list[str]) -> str:
    """Call Perplexity Search and format the results for the model."""
    # Defensive cap — the description asks for ≤3, but trust nothing.
    queries = queries[:3]

    response = perplexity.search.create(query=queries, max_results=5)

    lines = []
    for result in response.results:
        snippet = (result.snippet or "").strip().replace("\n", " ")
        if len(snippet) > 400:
            snippet = snippet[:400] + "…"
        lines.append(f"- {result.title}\n  {result.url}\n  {snippet}")
    return "\n\n".join(lines) if lines else "No results."

Tool-call loop

The Responses API returns response.output as a flat list of items. Walk it for items whose type is "function_call", execute each call, and append a paired function_call_output item to the running input array. Re-call responses.create until the response has no more function calls.
import json
from openai import OpenAI

openai_client = OpenAI()

def chat_with_search(user_prompt: str, model: str = "gpt-5.5") -> str:
    # The Responses API's `input` is an ordered list of items; we append
    # function_call_output items to it as the loop progresses.
    input_items: list[dict] = [{"role": "user", "content": user_prompt}]

    while True:
        response = openai_client.responses.create(
            model=model,
            input=input_items,
            tools=[WEB_SEARCH_TOOL],
        )

        function_calls = [
            item for item in response.output if item.type == "function_call"
        ]

        if not function_calls:
            return response.output_text

        # Persist the assistant's function_call items in the conversation.
        for item in response.output:
            input_items.append(item.model_dump())

        # Run each function call and append a paired function_call_output.
        for call in function_calls:
            args = json.loads(call.arguments)
            if call.name == "web_search":
                output = run_web_search(args["queries"])
            else:
                output = json.dumps({"error": f"unknown tool: {call.name}"})
            input_items.append({
                "type": "function_call_output",
                "call_id": call.call_id,
                "output": output,
            })


if __name__ == "__main__":
    answer = chat_with_search(
        "What were the major AI infrastructure announcements this week?"
    )
    print(answer)

Streaming

For streaming, use client.responses.stream(...). The SDK emits typed events: response.output_item.added when a function_call item starts, response.function_call_arguments.delta for each chunk of the arguments JSON, and response.function_call_arguments.done when the argument string is complete. The loop structure is otherwise identical to the non-streaming version.
with openai_client.responses.stream(
    model="gpt-5.5",
    input=input_items,
    tools=[WEB_SEARCH_TOOL],
) as stream:
    for event in stream:
        if event.type == "response.output_text.delta":
            print(event.delta, end="", flush=True)
    final = stream.get_final_response()

# Inspect final.output for function_call items, then resume the loop the
# same way as in the non-streaming version.

Notes

  • call_id pairs requests with results. Every function_call_output item must include the originating call_id. Multiple parallel function calls in one assistant turn each get their own paired output.
  • Server-side state. Instead of resending the whole input array on each turn, you can pass previous_response_id=<id> and only append new items. This is useful for long agent loops.
  • output_text shortcut. response.output_text flattens the assistant text content for you. If you need granular access (annotations, segments), iterate response.output and pull output_text blocks out of the message item.
  • Domains and dates. Pass search_domain_filter, country, and other Search API parameters inside run_web_search if you want fixed retrieval constraints. See the Search API quickstart for the full parameter list.

Next Steps

Use with Anthropic SDK

Wire Search API into the Anthropic Messages API.

Use with Gemini SDK

Wire Search API into Google’s google-genai SDK.

Search API Quickstart

Full Search API parameter reference.

Search Best Practices

Patterns for production search workloads.