Stream Mode: Concise vs Full

Overview

The stream_mode parameter gives you control over how streaming responses are formatted. Choose between two modes:

full (default) - Traditional streaming format with complete message objects in each chunk
concise - Optimized streaming format with reduced redundancy and enhanced reasoning visibility

The concise mode is designed to minimize bandwidth usage and provide better visibility into the model’s reasoning process.

Quick Comparison

Feature	Full Mode	Concise Mode
Message aggregation	Server-side (includes `choices.message`)	Client-side (delta only)
Chunk types	Single type (`chat.completion.chunk`)	Multiple types for different stages
Search results	Multiple times during stream	Only in `done` chunks
Bandwidth	Higher (includes redundant data)	Lower (optimized for efficiency)

Using Concise Mode

Set stream_mode: "concise" when creating streaming completions:

Python SDK
TypeScript SDK
cURL

from perplexity import Perplexity

client = Perplexity()

stream = client.chat.completions.create(
    model="sonar-pro",
    messages=[{"role": "user", "content": "What's the weather in Seattle?"}],
    stream=True,
    stream_mode="concise"
)

for chunk in stream:
    print(f"Chunk type: {chunk.object}")
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Understanding Chunk Types

In concise mode, you’ll receive four different types of chunks during the stream:

1. `chat.reasoning`

Streamed during the reasoning stage, containing real-time reasoning steps and search operations.

Structure
Python Handler
TypeScript Handler

{
  "id": "cfa38f9d-fdbc-4ac6-a5d2-a3010b6a33a6",
  "model": "sonar-pro",
  "created": 1759441590,
  "object": "chat.reasoning",
  "choices": [{
    "index": 0,
    "finish_reason": null,
    "message": {
      "role": "assistant",
      "content": ""
    },
    "delta": {
      "role": "assistant",
      "content": "",
      "reasoning_steps": [{
        "thought": "Searching the web for Seattle's current weather...",
        "type": "web_search",
        "web_search": {
          "search_results": [...],
          "search_keywords": ["Seattle current weather"]
        }
      }]
    }
  }],
  "type": "message"
}

2. `chat.reasoning.done`

Marks the end of the reasoning stage and includes all search results (web, images, videos) and reasoning steps.

Structure
Python Handler
TypeScript Handler

{
  "id": "3dd9d463-0fef-47e3-af70-92f9fcc4db1f",
  "model": "sonar-pro",
  "created": 1759459505,
  "object": "chat.reasoning.done",
  "usage": {
    "prompt_tokens": 6,
    "completion_tokens": 0,
    "total_tokens": 6,
    "search_context_size": "low"
  },
  "search_results": [...],
  "images": [...],
  "choices": [{
    "index": 0,
    "finish_reason": null,
    "message": {
      "role": "assistant",
      "content": "",
      "reasoning_steps": [...]
    },
    "delta": {
      "role": "assistant",
      "content": ""
    }
  }]
}

3. `chat.completion.chunk`

Streamed during the response generation stage, containing the actual content being generated.

Structure
Python Handler
TypeScript Handler

{
  "id": "cfa38f9d-fdbc-4ac6-a5d2-a3010b6a33a6",
  "model": "sonar-pro",
  "created": 1759441592,
  "object": "chat.completion.chunk",
  "choices": [{
    "index": 0,
    "finish_reason": null,
    "message": {
      "role": "assistant",
      "content": ""
    },
    "delta": {
      "role": "assistant",
      "content": " tonight"
    }
  }]
}

4. `chat.completion.done`

Final chunk indicating the stream is complete, including final search results, usage statistics, and cost information.

Structure
Python Handler
TypeScript Handler

{
  "id": "cfa38f9d-fdbc-4ac6-a5d2-a3010b6a33a6",
  "model": "sonar-pro",
  "created": 1759441595,
  "object": "chat.completion.done",
  "usage": {
    "prompt_tokens": 6,
    "completion_tokens": 238,
    "total_tokens": 244,
    "search_context_size": "low",
    "cost": {
      "input_tokens_cost": 0.0,
      "output_tokens_cost": 0.004,
      "request_cost": 0.006,
      "total_cost": 0.01
    }
  },
  "search_results": [...],
  "images": [...],
  "choices": [{
    "index": 0,
    "finish_reason": "stop",
    "message": {
      "role": "assistant",
      "content": "## Seattle Weather Forecast\n\nSeattle is experiencing...",
      "reasoning_steps": [...]
    },
    "delta": {
      "role": "assistant",
      "content": ""
    }
  }]
}

Complete Implementation Examples

Full Concise Mode Handler

Python SDK
TypeScript SDK
Raw HTTP

from perplexity import Perplexity

class ConciseStreamHandler:
    def __init__(self):
        self.content = ""
        self.reasoning_steps = []
        self.search_results = []
        self.images = []
        self.usage = None

    def stream_query(self, query: str, model: str = "sonar-pro"):
        """Handle a complete concise streaming request"""
        client = Perplexity()

        stream = client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": query}],
            stream=True,
            stream_mode="concise"
        )

        for chunk in stream:
            self.process_chunk(chunk)

        return self.get_result()

    def process_chunk(self, chunk):
        """Route chunk to appropriate handler"""
        chunk_type = chunk.object

        if chunk_type == "chat.reasoning":
            self.handle_reasoning(chunk)
        elif chunk_type == "chat.reasoning.done":
            self.handle_reasoning_done(chunk)
        elif chunk_type == "chat.completion.chunk":
            self.handle_content(chunk)
        elif chunk_type == "chat.completion.done":
            self.handle_done(chunk)

    def handle_reasoning(self, chunk):
        """Process reasoning updates"""
        delta = chunk.choices[0].delta

        if hasattr(delta, 'reasoning_steps'):
            for step in delta.reasoning_steps:
                self.reasoning_steps.append(step)
                print(f"💭 {step.thought}")

    def handle_reasoning_done(self, chunk):
        """Process end of reasoning"""
        if hasattr(chunk, 'search_results'):
            self.search_results = chunk.search_results
            print(f"\n🔍 Found {len(self.search_results)} sources")

        if hasattr(chunk, 'images'):
            self.images = chunk.images
            print(f"🖼️  Found {len(self.images)} images")

        print("\n📝 Generating response...\n")

    def handle_content(self, chunk):
        """Process content chunks"""
        delta = chunk.choices[0].delta

        if hasattr(delta, 'content') and delta.content:
            self.content += delta.content
            print(delta.content, end='', flush=True)

    def handle_done(self, chunk):
        """Process completion"""
        if hasattr(chunk, 'usage'):
            self.usage = chunk.usage
            print(f"\n\n✅ Complete | Tokens: {self.usage.total_tokens}")

            if hasattr(self.usage, 'cost'):
                print(f"💰 Cost: ${self.usage.cost.total_cost:.4f}")

    def get_result(self):
        """Return complete result"""
        return {
            'content': self.content,
            'reasoning_steps': self.reasoning_steps,
            'search_results': self.search_results,
            'images': self.images,
            'usage': self.usage
        }

# Usage
handler = ConciseStreamHandler()
result = handler.stream_query("What's the latest news in AI?")

print(f"\n\nFinal content length: {len(result['content'])} characters")
print(f"Sources used: {len(result['search_results'])}")

Best Practices

Aggregate content on the client side

In concise mode, choices.message is not incrementally updated. You must aggregate chunks yourself.

# Track content yourself
content = ""
for chunk in stream:
    if chunk.object == "chat.completion.chunk":
        if chunk.choices[0].delta.content:
            content += chunk.choices[0].delta.content

Use reasoning steps for transparency

Display reasoning steps to users for better transparency and trust.

def display_reasoning(step):
    """Show reasoning to users"""
    print(f"🔍 Searching for: {step.web_search.search_keywords}")
    print(f"💭 {step.thought}")

Handle search results from done chunks only

Search results and usage information only appear in chat.reasoning.done and chat.completion.done chunks.

# Don't check for search_results in other chunk types
if chunk.object in ["chat.reasoning.done", "chat.completion.done"]:
    if hasattr(chunk, 'search_results'):
        process_search_results(chunk.search_results)

Implement proper type checking

Use the object field to route chunks to appropriate handlers.

chunk_handlers = {
    "chat.reasoning": handle_reasoning,
    "chat.reasoning.done": handle_reasoning_done,
    "chat.completion.chunk": handle_content,
    "chat.completion.done": handle_done
}

handler = chunk_handlers.get(chunk.object)
if handler:
    handler(chunk)

Track cost from the final chunk

Cost information is only available in the chat.completion.done chunk.

if chunk.object == "chat.completion.done":
    if hasattr(chunk.usage, 'cost'):
        total_cost = chunk.usage.cost.total_cost
        print(f"Request cost: ${total_cost:.4f}")

Migration from Full Mode

If you’re migrating from full mode to concise mode, here are the key changes:

Before (Full Mode)
After (Concise Mode)

from perplexity import Perplexity

client = Perplexity()

stream = client.chat.completions.create(
    model="sonar-pro",
    messages=[{"role": "user", "content": "What's the weather?"}],
    stream=True
    # stream_mode defaults to "full"
)

for chunk in stream:
    # All chunks are chat.completion.chunk
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

    # Search results may appear in multiple chunks
    if hasattr(chunk, 'search_results'):
        print(f"Sources: {len(chunk.search_results)}")

When to Use Each Mode

Use Full Mode

Simple integrations where you want the SDK to handle aggregation
Backward compatibility with existing implementations
When you don’t need reasoning visibility

Use Concise Mode

Production applications optimizing for bandwidth
Applications that need reasoning transparency
Real-time chat interfaces with reasoning display
Cost-sensitive applications

Resources

Streaming Responses Guide - General streaming documentation
Chat Completions SDK - Complete SDK guide
API Reference - Chat Completions - API documentation

Getting Started

Perplexity SDK

Pro Search

Search

Grounded LLM

Admin

Help & Resources

Stream Mode: Concise vs Full

Overview

Quick Comparison

Using Concise Mode

Understanding Chunk Types

1. `chat.reasoning`

2. `chat.reasoning.done`

3. `chat.completion.chunk`

4. `chat.completion.done`

Complete Implementation Examples

Full Concise Mode Handler

Best Practices

Migration from Full Mode

When to Use Each Mode

Use Full Mode

Use Concise Mode

Resources

Getting Started

Perplexity SDK

Pro Search

Search

Grounded LLM

Admin

Help & Resources

​Overview

​Quick Comparison

​Using Concise Mode

​Understanding Chunk Types

​1. chat.reasoning

​2. chat.reasoning.done

​3. chat.completion.chunk

​4. chat.completion.done

​Complete Implementation Examples

​Full Concise Mode Handler

​Best Practices

​Migration from Full Mode

​When to Use Each Mode

Use Full Mode

Use Concise Mode

​Resources

Overview

Quick Comparison

Using Concise Mode

Understanding Chunk Types

1. `chat.reasoning`

2. `chat.reasoning.done`

3. `chat.completion.chunk`

4. `chat.completion.done`

Complete Implementation Examples

Full Concise Mode Handler

Best Practices

Migration from Full Mode

When to Use Each Mode

Resources