Streaming allows you to receive partial responses from the Perplexity API as they are generated, rather than waiting for the complete response. This is particularly useful for real-time user experiences, long responses, and interactive applications.
Streaming is supported across all models available through the Agentic Research API.
To enable streaming, set stream=True (Python) or stream: true (TypeScript) when creating responses:
Copy
Ask AI
from perplexity import Perplexityclient = Perplexity()# Create streaming responsestream = client.responses.create( preset="fast-search", input="What is the latest in AI research?", stream=True)# Process streaming responsefor event in stream: if event.type == "response.output_text.delta": print(event.delta, end="") elif event.type == "response.completed": print(f"\n\nCompleted: {event.response.usage}")
import perplexityfrom perplexity import Perplexityclient = Perplexity()try: stream = client.responses.create( preset="fast-search", input="Explain machine learning concepts", stream=True ) for event in stream: if event.type == "response.output_text.delta": print(event.delta, end="") elif event.type == "response.completed": print(f"\n\nCompleted: {event.response.usage}")except perplexity.APIConnectionError as e: print(f"Network connection failed: {e}")except perplexity.RateLimitError as e: print(f"Rate limit exceeded, please retry later: {e}")except perplexity.APIStatusError as e: print(f"API error {e.status_code}: {e.response}")
If you need search results immediately for your user interface, consider using non-streaming requests for use cases where search result display is critical to the real-time user experience.
Structured outputs enable you to enforce specific response formats from Perplexity’s models, ensuring consistent, machine-readable data that can be directly integrated into your applications without manual parsing.We currently support JSON Schema structured outputs. To enable structured outputs, add a response_format field to your request:
The name field is required and must be 1-64 alphanumeric characters. The schema should be a valid JSON schema object. LLM responses will match the specified format unless the output exceeds max_tokens.
Improve Schema Compliance: Give the LLM some hints about the output format in your prompts to improve adherence to the structured format. For example, include phrases like “Please return the data as a JSON object with the following structure…” or “Extract the information and format it as specified in the schema.”
The first request with a new JSON Schema expects to incur delay on the first token. Typically, it takes 10 to 30 seconds to prepare the new schema, and may result in timeout errors. Once the schema has been prepared, the subsequent requests will not see such delay.
Links in JSON Responses: Requesting links as part of a JSON response may not always work reliably and can result in hallucinations or broken links. Models may generate invalid URLs when forced to include links directly in structured outputs.To ensure all links are valid, use the links returned in the citations or search_results fields from the API response. Never count on the model to return valid links directly as part of the JSON response content.