Overview
Thestream_mode parameter gives you control over how streaming responses are formatted. Choose between two modes:
full(default) - Traditional streaming format with complete message objects in each chunkconcise- Optimized streaming format with reduced redundancy and enhanced reasoning visibility
The
concise mode is designed to minimize bandwidth usage and provide better visibility into the model’s reasoning process.Quick Comparison
| Feature | Full Mode | Concise Mode |
|---|---|---|
| Message aggregation | Server-side (includes choices.message) | Client-side (delta only) |
| Chunk types | Single type (chat.completion.chunk) | Multiple types for different stages |
| Search results | Multiple times during stream | Only in done chunks |
| Bandwidth | Higher (includes redundant data) | Lower (optimized for efficiency) |
Using Concise Mode
Setstream_mode: "concise" when creating streaming completions:
- Python SDK
- TypeScript SDK
- cURL
Copy
Ask AI
from perplexity import Perplexity
client = Perplexity()
stream = client.chat.completions.create(
model="sonar-pro",
messages=[{"role": "user", "content": "What's the weather in Seattle?"}],
stream=True,
stream_mode="concise"
)
for chunk in stream:
print(f"Chunk type: {chunk.object}")
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Copy
Ask AI
import Perplexity from '@perplexity-ai/perplexity_ai';
const client = new Perplexity();
const stream = await client.chat.completions.create({
model: "sonar-pro",
messages: [{ role: "user", content: "What's the weather in Seattle?" }],
stream: true,
stream_mode: "concise"
});
for await (const chunk of stream) {
console.log(`Chunk type: ${chunk.object}`);
if (chunk.choices[0]?.delta?.content) {
process.stdout.write(chunk.choices[0].delta.content);
}
}
Copy
Ask AI
curl -X POST "https://api.perplexity.ai/chat/completions" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "sonar-pro",
"messages": [{"role": "user", "content": "What is the weather in Seattle?"}],
"stream": true,
"stream_mode": "concise"
}'
Understanding Chunk Types
In concise mode, you’ll receive four different types of chunks during the stream:1. chat.reasoning
Streamed during the reasoning stage, containing real-time reasoning steps and search operations.
- Structure
- Python Handler
- TypeScript Handler
Copy
Ask AI
{
"id": "cfa38f9d-fdbc-4ac6-a5d2-a3010b6a33a6",
"model": "sonar-pro",
"created": 1759441590,
"object": "chat.reasoning",
"choices": [{
"index": 0,
"finish_reason": null,
"message": {
"role": "assistant",
"content": ""
},
"delta": {
"role": "assistant",
"content": "",
"reasoning_steps": [{
"thought": "Searching the web for Seattle's current weather...",
"type": "web_search",
"web_search": {
"search_results": [...],
"search_keywords": ["Seattle current weather"]
}
}]
}
}],
"type": "message"
}
Copy
Ask AI
def handle_reasoning_chunk(chunk):
"""Process reasoning stage updates"""
if chunk.object == "chat.reasoning":
delta = chunk.choices[0].delta
if hasattr(delta, 'reasoning_steps'):
for step in delta.reasoning_steps:
print(f"\n[Reasoning] {step.thought}")
if step.type == "web_search":
keywords = step.web_search.search_keywords
print(f"[Search] Keywords: {', '.join(keywords)}")
Copy
Ask AI
function handleReasoningChunk(chunk: any) {
if (chunk.object === "chat.reasoning") {
const delta = chunk.choices[0].delta;
if (delta.reasoning_steps) {
for (const step of delta.reasoning_steps) {
console.log(`\n[Reasoning] ${step.thought}`);
if (step.type === "web_search") {
const keywords = step.web_search.search_keywords;
console.log(`[Search] Keywords: ${keywords.join(', ')}`);
}
}
}
}
}
2. chat.reasoning.done
Marks the end of the reasoning stage and includes all search results (web, images, videos) and reasoning steps.
- Structure
- Python Handler
- TypeScript Handler
Copy
Ask AI
{
"id": "3dd9d463-0fef-47e3-af70-92f9fcc4db1f",
"model": "sonar-pro",
"created": 1759459505,
"object": "chat.reasoning.done",
"usage": {
"prompt_tokens": 6,
"completion_tokens": 0,
"total_tokens": 6,
"search_context_size": "low"
},
"search_results": [...],
"images": [...],
"choices": [{
"index": 0,
"finish_reason": null,
"message": {
"role": "assistant",
"content": "",
"reasoning_steps": [...]
},
"delta": {
"role": "assistant",
"content": ""
}
}]
}
Copy
Ask AI
def handle_reasoning_done(chunk):
"""Process end of reasoning stage"""
if chunk.object == "chat.reasoning.done":
print("\n[Reasoning Complete]")
# Access all search results
if hasattr(chunk, 'search_results'):
print(f"Found {len(chunk.search_results)} sources")
for result in chunk.search_results[:3]:
print(f" • {result['title']}")
# Access image results
if hasattr(chunk, 'images'):
print(f"Found {len(chunk.images)} images")
# Partial usage stats available
if hasattr(chunk, 'usage'):
print(f"Tokens used so far: {chunk.usage.total_tokens}")
Copy
Ask AI
function handleReasoningDone(chunk: any) {
if (chunk.object === "chat.reasoning.done") {
console.log("\n[Reasoning Complete]");
// Access all search results
if (chunk.search_results) {
console.log(`Found ${chunk.search_results.length} sources`);
chunk.search_results.slice(0, 3).forEach((result: any) => {
console.log(` • ${result.title}`);
});
}
// Access image results
if (chunk.images) {
console.log(`Found ${chunk.images.length} images`);
}
// Partial usage stats available
if (chunk.usage) {
console.log(`Tokens used so far: ${chunk.usage.total_tokens}`);
}
}
}
3. chat.completion.chunk
Streamed during the response generation stage, containing the actual content being generated.
- Structure
- Python Handler
- TypeScript Handler
Copy
Ask AI
{
"id": "cfa38f9d-fdbc-4ac6-a5d2-a3010b6a33a6",
"model": "sonar-pro",
"created": 1759441592,
"object": "chat.completion.chunk",
"choices": [{
"index": 0,
"finish_reason": null,
"message": {
"role": "assistant",
"content": ""
},
"delta": {
"role": "assistant",
"content": " tonight"
}
}]
}
Copy
Ask AI
def handle_completion_chunk(chunk):
"""Process content generation updates"""
if chunk.object == "chat.completion.chunk":
delta = chunk.choices[0].delta
if hasattr(delta, 'content') and delta.content:
# Stream content to user
print(delta.content, end='', flush=True)
return delta.content
return ""
Copy
Ask AI
function handleCompletionChunk(chunk: any): string {
if (chunk.object === "chat.completion.chunk") {
const delta = chunk.choices[0]?.delta;
if (delta?.content) {
// Stream content to user
process.stdout.write(delta.content);
return delta.content;
}
}
return "";
}
4. chat.completion.done
Final chunk indicating the stream is complete, including final search results, usage statistics, and cost information.
- Structure
- Python Handler
- TypeScript Handler
Copy
Ask AI
{
"id": "cfa38f9d-fdbc-4ac6-a5d2-a3010b6a33a6",
"model": "sonar-pro",
"created": 1759441595,
"object": "chat.completion.done",
"usage": {
"prompt_tokens": 6,
"completion_tokens": 238,
"total_tokens": 244,
"search_context_size": "low",
"cost": {
"input_tokens_cost": 0.0,
"output_tokens_cost": 0.004,
"request_cost": 0.006,
"total_cost": 0.01
}
},
"search_results": [...],
"images": [...],
"choices": [{
"index": 0,
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "## Seattle Weather Forecast\n\nSeattle is experiencing...",
"reasoning_steps": [...]
},
"delta": {
"role": "assistant",
"content": ""
}
}]
}
Copy
Ask AI
def handle_completion_done(chunk):
"""Process stream completion"""
if chunk.object == "chat.completion.done":
print("\n\n[Stream Complete]")
# Final aggregated message
full_message = chunk.choices[0].message.content
# Final search results
if hasattr(chunk, 'search_results'):
print(f"\nFinal sources: {len(chunk.search_results)}")
# Complete usage and cost information
if hasattr(chunk, 'usage'):
usage = chunk.usage
print(f"\nTokens: {usage.total_tokens}")
if hasattr(usage, 'cost'):
print(f"Cost: ${usage.cost.total_cost:.4f}")
return {
'content': full_message,
'search_results': getattr(chunk, 'search_results', []),
'images': getattr(chunk, 'images', []),
'usage': getattr(chunk, 'usage', None)
}
Copy
Ask AI
function handleCompletionDone(chunk: any) {
if (chunk.object === "chat.completion.done") {
console.log("\n\n[Stream Complete]");
// Final aggregated message
const fullMessage = chunk.choices[0].message.content;
// Final search results
if (chunk.search_results) {
console.log(`\nFinal sources: ${chunk.search_results.length}`);
}
// Complete usage and cost information
if (chunk.usage) {
console.log(`\nTokens: ${chunk.usage.total_tokens}`);
if (chunk.usage.cost) {
console.log(`Cost: $${chunk.usage.cost.total_cost.toFixed(4)}`);
}
}
return {
content: fullMessage,
search_results: chunk.search_results || [],
images: chunk.images || [],
usage: chunk.usage || null
};
}
}
Complete Implementation Examples
Full Concise Mode Handler
- Python SDK
- TypeScript SDK
- Raw HTTP
Copy
Ask AI
from perplexity import Perplexity
class ConciseStreamHandler:
def __init__(self):
self.content = ""
self.reasoning_steps = []
self.search_results = []
self.images = []
self.usage = None
def stream_query(self, query: str, model: str = "sonar-pro"):
"""Handle a complete concise streaming request"""
client = Perplexity()
stream = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": query}],
stream=True,
stream_mode="concise"
)
for chunk in stream:
self.process_chunk(chunk)
return self.get_result()
def process_chunk(self, chunk):
"""Route chunk to appropriate handler"""
chunk_type = chunk.object
if chunk_type == "chat.reasoning":
self.handle_reasoning(chunk)
elif chunk_type == "chat.reasoning.done":
self.handle_reasoning_done(chunk)
elif chunk_type == "chat.completion.chunk":
self.handle_content(chunk)
elif chunk_type == "chat.completion.done":
self.handle_done(chunk)
def handle_reasoning(self, chunk):
"""Process reasoning updates"""
delta = chunk.choices[0].delta
if hasattr(delta, 'reasoning_steps'):
for step in delta.reasoning_steps:
self.reasoning_steps.append(step)
print(f"💭 {step.thought}")
def handle_reasoning_done(self, chunk):
"""Process end of reasoning"""
if hasattr(chunk, 'search_results'):
self.search_results = chunk.search_results
print(f"\n🔍 Found {len(self.search_results)} sources")
if hasattr(chunk, 'images'):
self.images = chunk.images
print(f"🖼️ Found {len(self.images)} images")
print("\n📝 Generating response...\n")
def handle_content(self, chunk):
"""Process content chunks"""
delta = chunk.choices[0].delta
if hasattr(delta, 'content') and delta.content:
self.content += delta.content
print(delta.content, end='', flush=True)
def handle_done(self, chunk):
"""Process completion"""
if hasattr(chunk, 'usage'):
self.usage = chunk.usage
print(f"\n\n✅ Complete | Tokens: {self.usage.total_tokens}")
if hasattr(self.usage, 'cost'):
print(f"💰 Cost: ${self.usage.cost.total_cost:.4f}")
def get_result(self):
"""Return complete result"""
return {
'content': self.content,
'reasoning_steps': self.reasoning_steps,
'search_results': self.search_results,
'images': self.images,
'usage': self.usage
}
# Usage
handler = ConciseStreamHandler()
result = handler.stream_query("What's the latest news in AI?")
print(f"\n\nFinal content length: {len(result['content'])} characters")
print(f"Sources used: {len(result['search_results'])}")
Copy
Ask AI
import Perplexity from '@perplexity-ai/perplexity_ai';
interface StreamResult {
content: string;
reasoning_steps: any[];
search_results: any[];
images: any[];
usage: any;
}
class ConciseStreamHandler {
private content: string = "";
private reasoning_steps: any[] = [];
private search_results: any[] = [];
private images: any[] = [];
private usage: any = null;
async streamQuery(query: string, model: string = "sonar-pro"): Promise<StreamResult> {
const client = new Perplexity();
const stream = await client.chat.completions.create({
model,
messages: [{ role: "user", content: query }],
stream: true,
stream_mode: "concise"
});
for await (const chunk of stream) {
this.processChunk(chunk);
}
return this.getResult();
}
private processChunk(chunk: any) {
const chunkType = chunk.object;
switch (chunkType) {
case "chat.reasoning":
this.handleReasoning(chunk);
break;
case "chat.reasoning.done":
this.handleReasoningDone(chunk);
break;
case "chat.completion.chunk":
this.handleContent(chunk);
break;
case "chat.completion.done":
this.handleDone(chunk);
break;
}
}
private handleReasoning(chunk: any) {
const delta = chunk.choices[0].delta;
if (delta.reasoning_steps) {
for (const step of delta.reasoning_steps) {
this.reasoning_steps.push(step);
console.log(`💭 ${step.thought}`);
}
}
}
private handleReasoningDone(chunk: any) {
if (chunk.search_results) {
this.search_results = chunk.search_results;
console.log(`\n🔍 Found ${this.search_results.length} sources`);
}
if (chunk.images) {
this.images = chunk.images;
console.log(`🖼️ Found ${this.images.length} images`);
}
console.log("\n📝 Generating response...\n");
}
private handleContent(chunk: any) {
const delta = chunk.choices[0]?.delta;
if (delta?.content) {
this.content += delta.content;
process.stdout.write(delta.content);
}
}
private handleDone(chunk: any) {
if (chunk.usage) {
this.usage = chunk.usage;
console.log(`\n\n✅ Complete | Tokens: ${this.usage.total_tokens}`);
if (this.usage.cost) {
console.log(`💰 Cost: $${this.usage.cost.total_cost.toFixed(4)}`);
}
}
}
private getResult(): StreamResult {
return {
content: this.content,
reasoning_steps: this.reasoning_steps,
search_results: this.search_results,
images: this.images,
usage: this.usage
};
}
}
// Usage
const handler = new ConciseStreamHandler();
const result = await handler.streamQuery("What's the latest news in AI?");
console.log(`\n\nFinal content length: ${result.content.length} characters`);
console.log(`Sources used: ${result.search_results.length}`);
Copy
Ask AI
import requests
import json
def stream_concise_mode(query: str):
"""Handle concise streaming with raw HTTP"""
url = "https://api.perplexity.ai/chat/completions"
headers = {
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json"
}
payload = {
"model": "sonar-pro",
"messages": [{"role": "user", "content": query}],
"stream": True,
"stream_mode": "concise"
}
response = requests.post(url, headers=headers, json=payload, stream=True)
content = ""
search_results = []
usage = None
for line in response.iter_lines():
if line:
line = line.decode('utf-8')
if line.startswith('data: '):
data_str = line[6:]
if data_str == '[DONE]':
break
try:
chunk = json.loads(data_str)
chunk_type = chunk.get('object')
if chunk_type == 'chat.reasoning':
# Handle reasoning
delta = chunk['choices'][0]['delta']
if 'reasoning_steps' in delta:
for step in delta['reasoning_steps']:
print(f"💭 {step['thought']}")
elif chunk_type == 'chat.reasoning.done':
# Handle reasoning completion
if 'search_results' in chunk:
search_results = chunk['search_results']
print(f"\n🔍 Found {len(search_results)} sources\n")
elif chunk_type == 'chat.completion.chunk':
# Handle content
delta = chunk['choices'][0]['delta']
if 'content' in delta and delta['content']:
content += delta['content']
print(delta['content'], end='', flush=True)
elif chunk_type == 'chat.completion.done':
# Handle completion
if 'usage' in chunk:
usage = chunk['usage']
print(f"\n\n✅ Tokens: {usage['total_tokens']}")
except json.JSONDecodeError:
continue
return {
'content': content,
'search_results': search_results,
'usage': usage
}
# Usage
result = stream_concise_mode("What's the latest news in AI?")
Best Practices
1
Aggregate content on the client side
In concise mode,
choices.message is not incrementally updated. You must aggregate chunks yourself.Copy
Ask AI
# Track content yourself
content = ""
for chunk in stream:
if chunk.object == "chat.completion.chunk":
if chunk.choices[0].delta.content:
content += chunk.choices[0].delta.content
2
Use reasoning steps for transparency
Display reasoning steps to users for better transparency and trust.
Copy
Ask AI
def display_reasoning(step):
"""Show reasoning to users"""
print(f"🔍 Searching for: {step.web_search.search_keywords}")
print(f"💭 {step.thought}")
3
Handle search results from done chunks only
Search results and usage information only appear in
chat.reasoning.done and chat.completion.done chunks.Copy
Ask AI
# Don't check for search_results in other chunk types
if chunk.object in ["chat.reasoning.done", "chat.completion.done"]:
if hasattr(chunk, 'search_results'):
process_search_results(chunk.search_results)
4
Implement proper type checking
Use the
object field to route chunks to appropriate handlers.Copy
Ask AI
chunk_handlers = {
"chat.reasoning": handle_reasoning,
"chat.reasoning.done": handle_reasoning_done,
"chat.completion.chunk": handle_content,
"chat.completion.done": handle_done
}
handler = chunk_handlers.get(chunk.object)
if handler:
handler(chunk)
5
Track cost from the final chunk
Cost information is only available in the
chat.completion.done chunk.Copy
Ask AI
if chunk.object == "chat.completion.done":
if hasattr(chunk.usage, 'cost'):
total_cost = chunk.usage.cost.total_cost
print(f"Request cost: ${total_cost:.4f}")
Migration from Full Mode
If you’re migrating from full mode to concise mode, here are the key changes:- Before (Full Mode)
- After (Concise Mode)
Copy
Ask AI
from perplexity import Perplexity
client = Perplexity()
stream = client.chat.completions.create(
model="sonar-pro",
messages=[{"role": "user", "content": "What's the weather?"}],
stream=True
# stream_mode defaults to "full"
)
for chunk in stream:
# All chunks are chat.completion.chunk
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
# Search results may appear in multiple chunks
if hasattr(chunk, 'search_results'):
print(f"Sources: {len(chunk.search_results)}")
Copy
Ask AI
from perplexity import Perplexity
client = Perplexity()
stream = client.chat.completions.create(
model="sonar-pro",
messages=[{"role": "user", "content": "What's the weather?"}],
stream=True,
stream_mode="concise" # Enable concise mode
)
for chunk in stream:
# Multiple chunk types - route appropriately
if chunk.object == "chat.reasoning":
# New: Handle reasoning steps
if chunk.choices[0].delta.reasoning_steps:
print("Reasoning in progress...")
elif chunk.object == "chat.reasoning.done":
# New: Reasoning complete, search results available
if hasattr(chunk, 'search_results'):
print(f"Sources: {len(chunk.search_results)}")
elif chunk.object == "chat.completion.chunk":
# Content chunks (similar to full mode)
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
elif chunk.object == "chat.completion.done":
# Final chunk with complete metadata
print(f"\nTotal tokens: {chunk.usage.total_tokens}")
When to Use Each Mode
Use Full Mode
- Simple integrations where you want the SDK to handle aggregation
- Backward compatibility with existing implementations
- When you don’t need reasoning visibility
Use Concise Mode
- Production applications optimizing for bandwidth
- Applications that need reasoning transparency
- Real-time chat interfaces with reasoning display
- Cost-sensitive applications
Resources
- Streaming Responses Guide - General streaming documentation
- Chat Completions SDK - Complete SDK guide
- API Reference - Chat Completions - API documentation