Overview
The stream_mode parameter gives you control over how streaming responses are formatted. Choose between two modes:
full (default) - Traditional streaming format with complete message objects in each chunk
concise - Optimized streaming format with reduced redundancy and enhanced reasoning visibility
The concise mode is designed to minimize bandwidth usage and provide better visibility into the model’s reasoning process.
Quick Comparison
Feature Full Mode Concise Mode Message aggregation Server-side (includes choices.message) Client-side (delta only) Chunk types Single type (chat.completion.chunk) Multiple types for different stages Search results Multiple times during stream Only in done chunks Bandwidth Higher (includes redundant data) Lower (optimized for efficiency)
Using Concise Mode
Set stream_mode: "concise" when creating streaming completions:
Python SDK
TypeScript SDK
cURL
from perplexity import Perplexity
client = Perplexity()
stream = client.chat.completions.create(
model = "sonar-pro" ,
messages = [{ "role" : "user" , "content" : "What's the weather in Seattle?" }],
stream = True ,
stream_mode = "concise"
)
for chunk in stream:
print ( f "Chunk type: { chunk.object } " )
if chunk.choices[ 0 ].delta.content:
print (chunk.choices[ 0 ].delta.content, end = "" )
Understanding Chunk Types
In concise mode, you’ll receive four different types of chunks during the stream:
1. chat.reasoning
Streamed during the reasoning stage, containing real-time reasoning steps and search operations.
Structure
Python Handler
TypeScript Handler
{
"id" : "cfa38f9d-fdbc-4ac6-a5d2-a3010b6a33a6" ,
"model" : "sonar-pro" ,
"created" : 1759441590 ,
"object" : "chat.reasoning" ,
"choices" : [{
"index" : 0 ,
"finish_reason" : null ,
"message" : {
"role" : "assistant" ,
"content" : ""
},
"delta" : {
"role" : "assistant" ,
"content" : "" ,
"reasoning_steps" : [{
"thought" : "Searching the web for Seattle's current weather..." ,
"type" : "web_search" ,
"web_search" : {
"search_results" : [ ... ],
"search_keywords" : [ "Seattle current weather" ]
}
}]
}
}],
"type" : "message"
}
2. chat.reasoning.done
Marks the end of the reasoning stage and includes all search results (web, images, videos) and reasoning steps.
Structure
Python Handler
TypeScript Handler
{
"id" : "3dd9d463-0fef-47e3-af70-92f9fcc4db1f" ,
"model" : "sonar-pro" ,
"created" : 1759459505 ,
"object" : "chat.reasoning.done" ,
"usage" : {
"prompt_tokens" : 6 ,
"completion_tokens" : 0 ,
"total_tokens" : 6 ,
"search_context_size" : "low"
},
"search_results" : [ ... ],
"images" : [ ... ],
"choices" : [{
"index" : 0 ,
"finish_reason" : null ,
"message" : {
"role" : "assistant" ,
"content" : "" ,
"reasoning_steps" : [ ... ]
},
"delta" : {
"role" : "assistant" ,
"content" : ""
}
}]
}
3. chat.completion.chunk
Streamed during the response generation stage, containing the actual content being generated.
Structure
Python Handler
TypeScript Handler
{
"id" : "cfa38f9d-fdbc-4ac6-a5d2-a3010b6a33a6" ,
"model" : "sonar-pro" ,
"created" : 1759441592 ,
"object" : "chat.completion.chunk" ,
"choices" : [{
"index" : 0 ,
"finish_reason" : null ,
"message" : {
"role" : "assistant" ,
"content" : ""
},
"delta" : {
"role" : "assistant" ,
"content" : " tonight"
}
}]
}
4. chat.completion.done
Final chunk indicating the stream is complete, including final search results, usage statistics, and cost information.
Structure
Python Handler
TypeScript Handler
{
"id" : "cfa38f9d-fdbc-4ac6-a5d2-a3010b6a33a6" ,
"model" : "sonar-pro" ,
"created" : 1759441595 ,
"object" : "chat.completion.done" ,
"usage" : {
"prompt_tokens" : 6 ,
"completion_tokens" : 238 ,
"total_tokens" : 244 ,
"search_context_size" : "low" ,
"cost" : {
"input_tokens_cost" : 0.0 ,
"output_tokens_cost" : 0.004 ,
"request_cost" : 0.006 ,
"total_cost" : 0.01
}
},
"search_results" : [ ... ],
"images" : [ ... ],
"choices" : [{
"index" : 0 ,
"finish_reason" : "stop" ,
"message" : {
"role" : "assistant" ,
"content" : "## Seattle Weather Forecast \n\n Seattle is experiencing..." ,
"reasoning_steps" : [ ... ]
},
"delta" : {
"role" : "assistant" ,
"content" : ""
}
}]
}
Complete Implementation Examples
Full Concise Mode Handler
Python SDK
TypeScript SDK
Raw HTTP
from perplexity import Perplexity
class ConciseStreamHandler :
def __init__ ( self ):
self .content = ""
self .reasoning_steps = []
self .search_results = []
self .images = []
self .usage = None
def stream_query ( self , query : str , model : str = "sonar-pro" ):
"""Handle a complete concise streaming request"""
client = Perplexity()
stream = client.chat.completions.create(
model = model,
messages = [{ "role" : "user" , "content" : query}],
stream = True ,
stream_mode = "concise"
)
for chunk in stream:
self .process_chunk(chunk)
return self .get_result()
def process_chunk ( self , chunk ):
"""Route chunk to appropriate handler"""
chunk_type = chunk.object
if chunk_type == "chat.reasoning" :
self .handle_reasoning(chunk)
elif chunk_type == "chat.reasoning.done" :
self .handle_reasoning_done(chunk)
elif chunk_type == "chat.completion.chunk" :
self .handle_content(chunk)
elif chunk_type == "chat.completion.done" :
self .handle_done(chunk)
def handle_reasoning ( self , chunk ):
"""Process reasoning updates"""
delta = chunk.choices[ 0 ].delta
if hasattr (delta, 'reasoning_steps' ):
for step in delta.reasoning_steps:
self .reasoning_steps.append(step)
print ( f "💭 { step.thought } " )
def handle_reasoning_done ( self , chunk ):
"""Process end of reasoning"""
if hasattr (chunk, 'search_results' ):
self .search_results = chunk.search_results
print ( f " \n 🔍 Found { len ( self .search_results) } sources" )
if hasattr (chunk, 'images' ):
self .images = chunk.images
print ( f "🖼️ Found { len ( self .images) } images" )
print ( " \n 📝 Generating response... \n " )
def handle_content ( self , chunk ):
"""Process content chunks"""
delta = chunk.choices[ 0 ].delta
if hasattr (delta, 'content' ) and delta.content:
self .content += delta.content
print (delta.content, end = '' , flush = True )
def handle_done ( self , chunk ):
"""Process completion"""
if hasattr (chunk, 'usage' ):
self .usage = chunk.usage
print ( f " \n\n ✅ Complete | Tokens: { self .usage.total_tokens } " )
if hasattr ( self .usage, 'cost' ):
print ( f "💰 Cost: $ { self .usage.cost.total_cost :.4f} " )
def get_result ( self ):
"""Return complete result"""
return {
'content' : self .content,
'reasoning_steps' : self .reasoning_steps,
'search_results' : self .search_results,
'images' : self .images,
'usage' : self .usage
}
# Usage
handler = ConciseStreamHandler()
result = handler.stream_query( "What's the latest news in AI?" )
print ( f " \n\n Final content length: { len (result[ 'content' ]) } characters" )
print ( f "Sources used: { len (result[ 'search_results' ]) } " )
Best Practices
Aggregate content on the client side
In concise mode, choices.message is not incrementally updated. You must aggregate chunks yourself. # Track content yourself
content = ""
for chunk in stream:
if chunk.object == "chat.completion.chunk" :
if chunk.choices[ 0 ].delta.content:
content += chunk.choices[ 0 ].delta.content
Use reasoning steps for transparency
Display reasoning steps to users for better transparency and trust. def display_reasoning ( step ):
"""Show reasoning to users"""
print ( f "🔍 Searching for: { step.web_search.search_keywords } " )
print ( f "💭 { step.thought } " )
Handle search results from done chunks only
Search results and usage information only appear in chat.reasoning.done and chat.completion.done chunks. # Don't check for search_results in other chunk types
if chunk.object in [ "chat.reasoning.done" , "chat.completion.done" ]:
if hasattr (chunk, 'search_results' ):
process_search_results(chunk.search_results)
Implement proper type checking
Use the object field to route chunks to appropriate handlers. chunk_handlers = {
"chat.reasoning" : handle_reasoning,
"chat.reasoning.done" : handle_reasoning_done,
"chat.completion.chunk" : handle_content,
"chat.completion.done" : handle_done
}
handler = chunk_handlers.get(chunk.object)
if handler:
handler(chunk)
Track cost from the final chunk
Cost information is only available in the chat.completion.done chunk. if chunk.object == "chat.completion.done" :
if hasattr (chunk.usage, 'cost' ):
total_cost = chunk.usage.cost.total_cost
print ( f "Request cost: $ { total_cost :.4f} " )
Migration from Full Mode
If you’re migrating from full mode to concise mode, here are the key changes:
Before (Full Mode)
After (Concise Mode)
from perplexity import Perplexity
client = Perplexity()
stream = client.chat.completions.create(
model = "sonar-pro" ,
messages = [{ "role" : "user" , "content" : "What's the weather?" }],
stream = True
# stream_mode defaults to "full"
)
for chunk in stream:
# All chunks are chat.completion.chunk
if chunk.choices[ 0 ].delta.content:
print (chunk.choices[ 0 ].delta.content, end = "" )
# Search results may appear in multiple chunks
if hasattr (chunk, 'search_results' ):
print ( f "Sources: { len (chunk.search_results) } " )
When to Use Each Mode
Use Full Mode
Simple integrations where you want the SDK to handle aggregation
Backward compatibility with existing implementations
When you don’t need reasoning visibility
Use Concise Mode
Production applications optimizing for bandwidth
Applications that need reasoning transparency
Real-time chat interfaces with reasoning display
Cost-sensitive applications
Resources