Stream Mode: Concise vs Full

Overview

The stream_mode parameter gives you control over how streaming responses are formatted. Choose between two modes:

full (default) - Traditional streaming format with complete message objects in each chunk
concise - Optimized streaming format with reduced redundancy and enhanced reasoning visibility

The concise mode is designed to minimize bandwidth usage and provide better visibility into the model’s reasoning process.

Quick Comparison

Feature	Full Mode	Concise Mode
Message aggregation	Server-side (includes `choices.message`)	Client-side (delta only)
Chunk types	Single type (`chat.completion.chunk`)	Multiple types for different stages
Search results	Multiple times during stream	Only in `done` chunks
Bandwidth	Higher (includes redundant data)	Lower (optimized for efficiency)

Using Concise Mode

Set stream_mode: "concise" when creating streaming completions:

Python SDK
Typescript SDK
cURL

from perplexity import Perplexity

client = Perplexity()

stream = client.chat.completions.create(
    model="sonar-pro",
    messages=[{"role": "user", "content": "What's the weather in Seattle?"}],
    stream=True,
    stream_mode="concise"
)

for chunk in stream:
    print(f"Chunk type: {chunk.object}")
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

import Perplexity from '@perplexity-ai/perplexity_ai';

const client = new Perplexity();

const stream = await client.chat.completions.create({
  model: "sonar-pro",
  messages: [{ role: "user", content: "What's the weather in Seattle?" }],
  stream: true,
  stream_mode: "concise"
});

for await (const chunk of stream) {
  console.log(`Chunk type: ${chunk.object}`);
  if (chunk.choices[0]?.delta?.content) {
    process.stdout.write((chunk.choices[0]?.delta?.content ?? '') as string);
  }
}

curl -X POST "https://api.perplexity.ai/v1/sonar" \
  -H "Authorization: Bearer $PERPLEXITY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sonar-pro",
    "messages": [{"role": "user", "content": "What is the weather in Seattle?"}],
    "stream": true,
    "stream_mode": "concise"
  }'

Understanding Chunk Types

In concise mode, you’ll receive four different types of chunks during the stream:

1. `chat.reasoning`

Streamed during the reasoning stage, containing real-time reasoning steps and search operations.

Structure
Python Handler
Typescript Handler

{
  "id": "cfa38f9d-fdbc-4ac6-a5d2-a3010b6a33a6",
  "model": "sonar-pro",
  "created": 1759441590,
  "object": "chat.reasoning",
  "choices": [{
    "index": 0,
    "finish_reason": null,
    "message": {
      "role": "assistant",
      "content": ""
    },
    "delta": {
      "role": "assistant",
      "content": "",
      "reasoning_steps": [{
        "thought": "Searching the web for Seattle's current weather...",
        "type": "web_search",
        "web_search": {
          "search_results": [...],
          "search_keywords": ["Seattle current weather"]
        }
      }]
    }
  }],
  "type": "message"
}

def handle_reasoning_chunk(chunk):
    """Process reasoning stage updates"""
    if chunk.object == "chat.reasoning":
        delta = chunk.choices[0].delta

        if hasattr(delta, 'reasoning_steps'):
            for step in delta.reasoning_steps:
                print(f"\n[Reasoning] {step.thought}")

                if step.type == "web_search":
                    keywords = step.web_search.search_keywords
                    print(f"[Search] Keywords: {', '.join(keywords)}")

function handleReasoningChunk(chunk: any) {
  if (chunk.object === "chat.reasoning") {
    const delta = chunk.choices[0].delta;

    if (delta.reasoning_steps) {
      for (const step of delta.reasoning_steps) {
        console.log(`\n[Reasoning] ${step.thought}`);

        if (step.type === "web_search") {
          const keywords = step.web_search.search_keywords;
          console.log(`[Search] Keywords: ${keywords.join(', ')}`);
        }
      }
    }
  }
}

2. `chat.reasoning.done`

Marks the end of the reasoning stage and includes all search results (web, images, videos) and reasoning steps.

Structure
Python Handler
Typescript Handler

{
  "id": "3dd9d463-0fef-47e3-af70-92f9fcc4db1f",
  "model": "sonar-pro",
  "created": 1759459505,
  "object": "chat.reasoning.done",
  "usage": {
    "prompt_tokens": 6,
    "completion_tokens": 0,
    "total_tokens": 6,
    "search_context_size": "low"
  },
  "search_results": [...],
  "images": [...],
  "choices": [{
    "index": 0,
    "finish_reason": null,
    "message": {
      "role": "assistant",
      "content": "",
      "reasoning_steps": [...]
    },
    "delta": {
      "role": "assistant",
      "content": ""
    }
  }]
}

def handle_reasoning_done(chunk):
    """Process end of reasoning stage"""
    if chunk.object == "chat.reasoning.done":
        print("\n[Reasoning Complete]")

        # Access all search results
        if hasattr(chunk, 'search_results'):
            print(f"Found {len(chunk.search_results)} sources")
            for result in chunk.search_results[:3]:
                print(f"  • {result['title']}")

        # Access image results
        if hasattr(chunk, 'images'):
            print(f"Found {len(chunk.images)} images")

        # Partial usage stats available
        if hasattr(chunk, 'usage'):
            print(f"Tokens used so far: {chunk.usage.total_tokens}")

function handleReasoningDone(chunk: any) {
  if (chunk.object === "chat.reasoning.done") {
    console.log("\n[Reasoning Complete]");

    // Access all search results
    if (chunk.search_results) {
      console.log(`Found ${chunk.search_results.length} sources`);
      chunk.search_results.slice(0, 3).forEach((result: any) => {
        console.log(`  • ${result.title}`);
      });
    }

    // Access image results
    if (chunk.images) {
      console.log(`Found ${chunk.images.length} images`);
    }

    // Partial usage stats available
    if (chunk.usage) {
      console.log(`Tokens used so far: ${chunk.usage.total_tokens}`);
    }
  }
}

3. `chat.completion.chunk`

Streamed during the response generation stage, containing the actual content being generated.

Structure
Python Handler
Typescript Handler

{
  "id": "cfa38f9d-fdbc-4ac6-a5d2-a3010b6a33a6",
  "model": "sonar-pro",
  "created": 1759441592,
  "object": "chat.completion.chunk",
  "choices": [{
    "index": 0,
    "finish_reason": null,
    "message": {
      "role": "assistant",
      "content": ""
    },
    "delta": {
      "role": "assistant",
      "content": " tonight"
    }
  }]
}

def handle_completion_chunk(chunk):
    """Process content generation updates"""
    if chunk.object == "chat.completion.chunk":
        delta = chunk.choices[0].delta

        if hasattr(delta, 'content') and delta.content:
            # Stream content to user
            print(delta.content, end='', flush=True)
            return delta.content

    return ""

function handleCompletionChunk(chunk: any): string {
  if (chunk.object === "chat.completion.chunk") {
    const delta = chunk.choices[0]?.delta;

    if (delta?.content) {
      // Stream content to user
      process.stdout.write(delta.content);
      return delta.content;
    }
  }

  return "";
}

4. `chat.completion.done`

Final chunk indicating the stream is complete, including final search results, usage statistics, and cost information.

Structure
Python Handler
Typescript Handler

{
  "id": "cfa38f9d-fdbc-4ac6-a5d2-a3010b6a33a6",
  "model": "sonar-pro",
  "created": 1759441595,
  "object": "chat.completion.done",
  "usage": {
    "prompt_tokens": 6,
    "completion_tokens": 238,
    "total_tokens": 244,
    "search_context_size": "low",
    "cost": {
      "input_tokens_cost": 0.0,
      "output_tokens_cost": 0.004,
      "request_cost": 0.006,
      "total_cost": 0.01
    }
  },
  "search_results": [...],
  "images": [...],
  "choices": [{
    "index": 0,
    "finish_reason": "stop",
    "message": {
      "role": "assistant",
      "content": "## Seattle Weather Forecast\n\nSeattle is experiencing...",
      "reasoning_steps": [...]
    },
    "delta": {
      "role": "assistant",
      "content": ""
    }
  }]
}

def handle_completion_done(chunk):
    """Process stream completion"""
    if chunk.object == "chat.completion.done":
        print("\n\n[Stream Complete]")

        # Final aggregated message
        full_message = chunk.choices[0].message.content

        # Final search results
        if hasattr(chunk, 'search_results'):
            print(f"\nFinal sources: {len(chunk.search_results)}")

        # Complete usage and cost information
        if hasattr(chunk, 'usage'):
            usage = chunk.usage
            print(f"\nTokens: {usage.total_tokens}")

            if hasattr(usage, 'cost'):
                print(f"Cost: ${usage.cost.total_cost:.4f}")

        return {
            'content': full_message,
            'search_results': getattr(chunk, 'search_results', []),
            'images': getattr(chunk, 'images', []),
            'usage': getattr(chunk, 'usage', None)
        }

function handleCompletionDone(chunk: any) {
  if (chunk.object === "chat.completion.done") {
    console.log("\n\n[Stream Complete]");

    // Final aggregated message
    const fullMessage = chunk.choices[0].message.content;

    // Final search results
    if (chunk.search_results) {
      console.log(`\nFinal sources: ${chunk.search_results.length}`);
    }

    // Complete usage and cost information
    if (chunk.usage) {
      console.log(`\nTokens: ${chunk.usage.total_tokens}`);

      if (chunk.usage.cost) {
        console.log(`Cost: $${chunk.usage.cost.total_cost.toFixed(4)}`);
      }
    }

    return {
      content: fullMessage,
      search_results: chunk.search_results || [],
      images: chunk.images || [],
      usage: chunk.usage || null
    };
  }
}

Complete Implementation Examples

Full Concise Mode Handler

Python SDK
Typescript SDK
Raw HTTP

from perplexity import Perplexity

class ConciseStreamHandler:
    def __init__(self):
        self.content = ""
        self.reasoning_steps = []
        self.search_results = []
        self.images = []
        self.usage = None

    def stream_query(self, query: str, model: str = "sonar-pro"):
        """Handle a complete concise streaming request"""
        client = Perplexity()

        stream = client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": query}],
            stream=True,
            stream_mode="concise"
        )

        for chunk in stream:
            self.process_chunk(chunk)

        return self.get_result()

    def process_chunk(self, chunk):
        """Route chunk to appropriate handler"""
        chunk_type = chunk.object

        if chunk_type == "chat.reasoning":
            self.handle_reasoning(chunk)
        elif chunk_type == "chat.reasoning.done":
            self.handle_reasoning_done(chunk)
        elif chunk_type == "chat.completion.chunk":
            self.handle_content(chunk)
        elif chunk_type == "chat.completion.done":
            self.handle_done(chunk)

    def handle_reasoning(self, chunk):
        """Process reasoning updates"""
        delta = chunk.choices[0].delta

        if hasattr(delta, 'reasoning_steps'):
            for step in delta.reasoning_steps:
                self.reasoning_steps.append(step)
                print(f"💭 {step.thought}")

    def handle_reasoning_done(self, chunk):
        """Process end of reasoning"""
        if hasattr(chunk, 'search_results'):
            self.search_results = chunk.search_results
            print(f"\n🔍 Found {len(self.search_results)} sources")

        if hasattr(chunk, 'images'):
            self.images = chunk.images
            print(f"🖼️  Found {len(self.images)} images")

        print("\n📝 Generating response...\n")

    def handle_content(self, chunk):
        """Process content chunks"""
        delta = chunk.choices[0].delta

        if hasattr(delta, 'content') and delta.content:
            self.content += delta.content
            print(delta.content, end='', flush=True)

    def handle_done(self, chunk):
        """Process completion"""
        if hasattr(chunk, 'usage'):
            self.usage = chunk.usage
            print(f"\n\n✅ Complete | Tokens: {self.usage.total_tokens}")

            if hasattr(self.usage, 'cost'):
                print(f"💰 Cost: ${self.usage.cost.total_cost:.4f}")

    def get_result(self):
        """Return complete result"""
        return {
            'content': self.content,
            'reasoning_steps': self.reasoning_steps,
            'search_results': self.search_results,
            'images': self.images,
            'usage': self.usage
        }

# Usage
handler = ConciseStreamHandler()
result = handler.stream_query("What's the latest news in AI?")

print(f"\n\nFinal content length: {len(result['content'])} characters")
print(f"Sources used: {len(result['search_results'])}")

import Perplexity from '@perplexity-ai/perplexity_ai';

interface StreamResult {
  content: string;
  reasoning_steps: any[];
  search_results: any[];
  images: any[];
  usage: any;
}

class ConciseStreamHandler {
  private content: string = "";
  private reasoning_steps: any[] = [];
  private search_results: any[] = [];
  private images: any[] = [];
  private usage: any = null;

  async streamQuery(query: string, model: string = "sonar-pro"): Promise<StreamResult> {
    const client = new Perplexity();

    const stream = await client.chat.completions.create({
      model,
      messages: [{ role: "user", content: query }],
      stream: true,
      stream_mode: "concise"
    });

    for await (const chunk of stream) {
      this.processChunk(chunk);
    }

    return this.getResult();
  }

  private processChunk(chunk: any) {
    const chunkType = chunk.object;

    switch (chunkType) {
      case "chat.reasoning":
        this.handleReasoning(chunk);
        break;
      case "chat.reasoning.done":
        this.handleReasoningDone(chunk);
        break;
      case "chat.completion.chunk":
        this.handleContent(chunk);
        break;
      case "chat.completion.done":
        this.handleDone(chunk);
        break;
    }
  }

  private handleReasoning(chunk: any) {
    const delta = chunk.choices[0].delta;

    if (delta.reasoning_steps) {
      for (const step of delta.reasoning_steps) {
        this.reasoning_steps.push(step);
        console.log(`💭 ${step.thought}`);
      }
    }
  }

  private handleReasoningDone(chunk: any) {
    if (chunk.search_results) {
      this.search_results = chunk.search_results;
      console.log(`\n🔍 Found ${this.search_results.length} sources`);
    }

    if (chunk.images) {
      this.images = chunk.images;
      console.log(`🖼️  Found ${this.images.length} images`);
    }

    console.log("\n📝 Generating response...\n");
  }

  private handleContent(chunk: any) {
    const delta = chunk.choices[0]?.delta;

    if (delta?.content) {
      this.content += delta.content;
      process.stdout.write(delta.content);
    }
  }

  private handleDone(chunk: any) {
    if (chunk.usage) {
      this.usage = chunk.usage;
      console.log(`\n\n✅ Complete | Tokens: ${this.usage.total_tokens}`);

      if (this.usage.cost) {
        console.log(`💰 Cost: $${this.usage.cost.total_cost.toFixed(4)}`);
      }
    }
  }

  private getResult(): StreamResult {
    return {
      content: this.content,
      reasoning_steps: this.reasoning_steps,
      search_results: this.search_results,
      images: this.images,
      usage: this.usage
    };
  }
}

// Usage
const handler = new ConciseStreamHandler();
const result = await handler.streamQuery("What's the latest news in AI?");

console.log(`\n\nFinal content length: ${result.content.length} characters`);
console.log(`Sources used: ${result.search_results.length}`);

import os
import requests
import json

def stream_concise_mode(query: str):
    """Handle concise streaming with raw HTTP"""
    url = "https://api.perplexity.ai/v1/sonar"
    headers = {
        "Authorization": f"Bearer {os.environ.get('PERPLEXITY_API_KEY')}",
        "Content-Type": "application/json"
    }
    payload = {
        "model": "sonar-pro",
        "messages": [{"role": "user", "content": query}],
        "stream": True,
        "stream_mode": "concise"
    }

    response = requests.post(url, headers=headers, json=payload, stream=True)

    content = ""
    search_results = []
    usage = None

    for line in response.iter_lines():
        if line:
            line = line.decode('utf-8')
            if line.startswith('data: '):
                data_str = line[6:]
                if data_str == '[DONE]':
                    break

                try:
                    chunk = json.loads(data_str)
                    chunk_type = chunk.get('object')

                    if chunk_type == 'chat.reasoning':
                        # Handle reasoning
                        delta = chunk['choices'][0]['delta']
                        if 'reasoning_steps' in delta:
                            for step in delta['reasoning_steps']:
                                print(f"💭 {step['thought']}")

                    elif chunk_type == 'chat.reasoning.done':
                        # Handle reasoning completion
                        if 'search_results' in chunk:
                            search_results = chunk['search_results']
                            print(f"\n🔍 Found {len(search_results)} sources\n")

                    elif chunk_type == 'chat.completion.chunk':
                        # Handle content
                        delta = chunk['choices'][0]['delta']
                        if 'content' in delta and delta['content']:
                            content += delta['content']
                            print(delta['content'], end='', flush=True)

                    elif chunk_type == 'chat.completion.done':
                        # Handle completion
                        if 'usage' in chunk:
                            usage = chunk['usage']
                            print(f"\n\n✅ Tokens: {usage['total_tokens']}")

                except json.JSONDecodeError:
                    continue

    return {
        'content': content,
        'search_results': search_results,
        'usage': usage
    }

# Usage
result = stream_concise_mode("What's the latest news in AI?")

Best Practices

Aggregate content on the client side

In concise mode, choices.message is not incrementally updated. You must aggregate chunks yourself.

# Track content yourself
content = ""
for chunk in stream:
    if chunk.object == "chat.completion.chunk":
        if chunk.choices[0].delta.content:
            content += chunk.choices[0].delta.content

Use reasoning steps for transparency

Display reasoning steps to users for better transparency and trust.

def display_reasoning(step):
    """Show reasoning to users"""
    print(f"🔍 Searching for: {step.web_search.search_keywords}")
    print(f"💭 {step.thought}")

Handle search results from done chunks only

Search results and usage information only appear in chat.reasoning.done and chat.completion.done chunks.

# Don't check for search_results in other chunk types
if chunk.object in ["chat.reasoning.done", "chat.completion.done"]:
    if hasattr(chunk, 'search_results'):
        process_search_results(chunk.search_results)

Implement proper type checking

Use the object field to route chunks to appropriate handlers.

chunk_handlers = {
    "chat.reasoning": handle_reasoning,
    "chat.reasoning.done": handle_reasoning_done,
    "chat.completion.chunk": handle_content,
    "chat.completion.done": handle_done
}

handler = chunk_handlers.get(chunk.object)
if handler:
    handler(chunk)

Track cost from the final chunk

Cost information is only available in the chat.completion.done chunk.

if chunk.object == "chat.completion.done":
    if hasattr(chunk.usage, 'cost'):
        total_cost = chunk.usage.cost.total_cost
        print(f"Request cost: ${total_cost:.4f}")

Migration from Full Mode

If you’re migrating from full mode to concise mode, here are the key changes:

Before (Full Mode)
After (Concise Mode)

from perplexity import Perplexity

client = Perplexity()

stream = client.chat.completions.create(
    model="sonar-pro",
    messages=[{"role": "user", "content": "What's the weather?"}],
    stream=True
    # stream_mode defaults to "full"
)

for chunk in stream:
    # All chunks are chat.completion.chunk
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

    # Search results may appear in multiple chunks
    if hasattr(chunk, 'search_results'):
        print(f"Sources: {len(chunk.search_results)}")

from perplexity import Perplexity

client = Perplexity()

stream = client.chat.completions.create(
    model="sonar-pro",
    messages=[{"role": "user", "content": "What's the weather?"}],
    stream=True,
    stream_mode="concise"  # Enable concise mode
)

for chunk in stream:
    # Multiple chunk types - route appropriately
    if chunk.object == "chat.reasoning":
        # New: Handle reasoning steps
        if chunk.choices[0].delta.reasoning_steps:
            print("Reasoning in progress...")

    elif chunk.object == "chat.reasoning.done":
        # New: Reasoning complete, search results available
        if hasattr(chunk, 'search_results'):
            print(f"Sources: {len(chunk.search_results)}")

    elif chunk.object == "chat.completion.chunk":
        # Content chunks (similar to full mode)
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="")

    elif chunk.object == "chat.completion.done":
        # Final chunk with complete metadata
        print(f"\nTotal tokens: {chunk.usage.total_tokens}")

When to Use Each Mode

Use Full Mode

Simple integrations where you want the SDK to handle aggregation
Backward compatibility with existing implementations
When you don’t need reasoning visibility

Use Concise Mode

Production applications optimizing for bandwidth
Applications that need reasoning transparency
Real-time chat interfaces with reasoning display
Cost-sensitive applications

Resources

Streaming Responses Guide - General streaming documentation
Sonar API Guide - Complete Sonar API guide
API Reference - Sonar API - API documentation

Getting Started

Agent API

Search API

Sonar API

Embeddings API

Perplexity SDK

Admin & Management

Resources

Stream Mode: Concise vs Full

Overview

Quick Comparison

Using Concise Mode

Understanding Chunk Types

1. `chat.reasoning`

2. `chat.reasoning.done`

3. `chat.completion.chunk`

4. `chat.completion.done`

Complete Implementation Examples

Full Concise Mode Handler

Best Practices

Migration from Full Mode

When to Use Each Mode

Use Full Mode

Use Concise Mode

Resources

Getting Started

Agent API

Search API

Sonar API

Embeddings API

Perplexity SDK

Admin & Management

Resources

Documentation Index

​Overview

​Quick Comparison

​Using Concise Mode

​Understanding Chunk Types

​1. chat.reasoning

​2. chat.reasoning.done

​3. chat.completion.chunk

​4. chat.completion.done

​Complete Implementation Examples

​Full Concise Mode Handler

​Best Practices

​Migration from Full Mode

​When to Use Each Mode

Use Full Mode

Use Concise Mode

​Resources

Overview

Quick Comparison

Using Concise Mode

Understanding Chunk Types

1. `chat.reasoning`

2. `chat.reasoning.done`

3. `chat.completion.chunk`

4. `chat.completion.done`

Complete Implementation Examples

Full Concise Mode Handler

Best Practices

Migration from Full Mode

When to Use Each Mode

Resources