Streaming Responses Guide

Overview

Streaming allows you to receive partial responses from the Perplexity API as they are generated, rather than waiting for the complete response. This is particularly useful for:

Real-time user experiences - Display responses as they’re generated
Long responses - Start showing content immediately for lengthy analyses
Interactive applications - Provide immediate feedback to users

Streaming is supported across all Perplexity models including Sonar, Sonar Pro, and reasoning models.

Quick Start

To enable streaming, add "stream": true to your API request:

curl https://api.perplexity.ai/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sonar",
    "messages": [{"role": "user", "content": "Explain quantum computing"}],
    "stream": true
  }'

Using OpenAI Python Client

The OpenAI Python client provides the most convenient way to handle streaming responses.

Installation

Install OpenAI Client

pip install openai

Basic Streaming Example

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.perplexity.ai"
)

stream = client.chat.completions.create(
    model="sonar",
    messages=[{"role": "user", "content": "What is the latest in AI research?"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

Streaming Best Practices

Gather all your citations and metadata in the final chunk of the response neatly in an array.

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.perplexity.ai"
)

stream = client.chat.completions.create(
    model="sonar-pro",
    messages=[
        {"role": "user", "content": "Compare renewable energy technologies"}
    ],
    stream=True
)

content = ""
citations = []
usage_info = None

for chunk in stream:
    # Content arrives progressively
    if chunk.choices[0].delta.content is not None:
        content_chunk = chunk.choices[0].delta.content
        content += content_chunk
        print(content_chunk, end="")
    
    # Metadata arrives in final chunks
    if hasattr(chunk, 'citations') and chunk.citations:
        citations = chunk.citations
    
    if hasattr(chunk, 'usage') and chunk.usage:
        usage_info = chunk.usage
    
    # Handle completion
    if chunk.choices[0].finish_reason is not None:
        print(f"\n\nFinish reason: {chunk.choices[0].finish_reason}")
        print(f"Citations: {citations}")
        print(f"Usage: {usage_info}")

Using Requests Library

For more control or when working without the OpenAI client, you can use the requests library directly.

Basic Implementation

With this code snippet, you can stream responses from the Perplexity API using the requests library. However, you will need to parse the response manually to get the content, citations, and metadata.

import requests

# Set up the API endpoint and headers
url = "https://api.perplexity.ai/chat/completions"
headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
}

payload = {
    "model": "sonar-pro",
    "messages": [
        {"role": "user", "content": "Who are the top 5 tech influencers on X?"}
    ],
    "stream": True  # Enable streaming for real-time responses
}

response = requests.post(url, headers=headers, json=payload, stream=True)

# Process the streaming response (simplified example)
for line in response.iter_lines():
    if line:
        print(line.decode('utf-8'))

For production use, you should properly parse Server-Sent Events (SSE) format:

import requests
import json

def stream_with_proper_parsing():
    url = "https://api.perplexity.ai/chat/completions"
    headers = {
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json"
    }
    payload = {
        "model": "sonar",
        "messages": [{"role": "user", "content": "Explain quantum computing"}],
        "stream": True
    }
    
    response = requests.post(url, headers=headers, json=payload, stream=True)
    
    for line in response.iter_lines():
        if line:
            line = line.decode('utf-8')
            if line.startswith('data: '):
                data_str = line[6:]  # Remove 'data: ' prefix
                if data_str == '[DONE]':
                    break
                try:
                    chunk_data = json.loads(data_str)
                    content = chunk_data['choices'][0]['delta'].get('content', '')
                    if content:
                        print(content, end='')
                except json.JSONDecodeError:
                    continue

stream_with_proper_parsing()

Citations and Metadata During Streaming

Citations and metadata are delivered in the final chunk(s) of a streaming response, not progressively during the stream.

How Metadata Works with Streaming

When streaming, you receive:

Content chunks which arrive progressively in real-time
Citations
Search results
Usage stats

Using Requests Library for Metadata

import requests
import json

def stream_with_requests_metadata():
    url = "https://api.perplexity.ai/chat/completions"
    headers = {
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json"
    }
    payload = {
        "model": "sonar",
        "messages": [{"role": "user", "content": "Explain quantum computing"}],
        "stream": True
    }
    
    response = requests.post(url, headers=headers, json=payload, stream=True)
    
    content = ""
    metadata = {}
    
    for line in response.iter_lines():
        if line:
            line = line.decode('utf-8')
            if line.startswith('data: '):
                data_str = line[6:]
                if data_str == '[DONE]':
                    break
                try:
                    chunk = json.loads(data_str)
                    
                    # Process content
                    if 'choices' in chunk and chunk['choices'][0]['delta'].get('content'):
                        content_piece = chunk['choices'][0]['delta']['content']
                        content += content_piece
                        print(content_piece, end='', flush=True)
                    
                    # Collect metadata
                    for key in ['citations', 'search_results', 'usage']:
                        if key in chunk:
                            metadata[key] = chunk[key]
                            
                    # Check if streaming is complete
                    if chunk['choices'][0].get('finish_reason'):
                        print(f"\n\nMetadata: {metadata}")
                        
                except json.JSONDecodeError:
                    continue
    
    return content, metadata

stream_with_requests_metadata()

Important: If you need citations immediately for your user interface, consider using non-streaming requests for use cases where citation display is critical to the real-time user experience.

Getting Started

Guides

Admin

Help & Resources

Streaming Responses Guide

Overview

Quick Start

Using OpenAI Python Client

Installation

Streaming Best Practices

Using Requests Library

Basic Implementation

Citations and Metadata During Streaming

How Metadata Works with Streaming

Using Requests Library for Metadata

Getting Started

Guides

Admin

Help & Resources

​Overview

​Quick Start

​Using OpenAI Python Client

​Installation

​Streaming Best Practices

​Using Requests Library

​Basic Implementation

​Citations and Metadata During Streaming

​How Metadata Works with Streaming

​Using Requests Library for Metadata

Overview

Quick Start

Using OpenAI Python Client

Installation

Streaming Best Practices

Using Requests Library

Basic Implementation

Citations and Metadata During Streaming

How Metadata Works with Streaming

Using Requests Library for Metadata