Streaming allows you to receive partial responses from the Perplexity API as they are generated, rather than waiting for the complete response. This is particularly useful for:
Real-time user experiences - Display responses as they’re generated
Long responses - Start showing content immediately for lengthy analyses
Interactive applications - Provide immediate feedback to users
Streaming is supported across all Perplexity models including Sonar, Sonar Pro, and reasoning models.
The easiest way to get started is with the Perplexity SDKs, which handle all the streaming parsing automatically.To enable streaming, set stream=True (Python) or stream: true (TypeScript) when creating completions:
Copy
Ask AI
from perplexity import Perplexity# Initialize the client (uses PERPLEXITY_API_KEY environment variable)client = Perplexity()# Create streaming completionstream = client.chat.completions.create( model="sonar", messages=[{"role": "user", "content": "What is the latest in AI research?"}], stream=True)# Process streaming responsefor chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="")
Implement reconnection logic for robust streaming applications.
Copy
Ask AI
import timeimport randomfrom perplexity import Perplexityimport perplexitydef robust_streaming(query: str, max_retries: int = 3): client = Perplexity() for attempt in range(max_retries): try: stream = client.chat.completions.create( model="sonar", messages=[{"role": "user", "content": query}], stream=True ) for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end='', flush=True) return # Success, exit retry loop except (perplexity.APIConnectionError, perplexity.APITimeoutError) as e: if attempt < max_retries - 1: delay = (2 ** attempt) + random.uniform(0, 1) print(f"\nConnection error, retrying in {delay:.1f}s...") time.sleep(delay) else: print(f"\nFailed after {max_retries} attempts: {e}") raiserobust_streaming("Explain quantum computing")
2
Implement proper buffering
Use appropriate buffering strategies for your application’s needs.
Copy
Ask AI
# For real-time chat applicationsbuffer_size = 1 # Character-by-character for immediate display# For document processingbuffer_size = 100 # Larger chunks for efficiency# For API responsesbuffer_size = 500 # Balance between responsiveness and efficiency
3
Handle metadata appropriately
Remember that search results and metadata arrive at the end of the stream.
Copy
Ask AI
# Don't expect search results until the stream is completeif chunk.choices[0].finish_reason == "stop": # Now search results and usage info are available process_search_results(chunk.search_results) log_usage_stats(chunk.usage)
4
Optimize for your use case
Choose streaming parameters based on your application requirements.
Important: If you need search results immediately for your user interface, consider using non-streaming requests for use cases where search result display is critical to the real-time user experience.