Skip to main content

Overview

The stream_mode parameter gives you control over how streaming responses are formatted. Choose between two modes:
  • full (default) - Traditional streaming format with complete message objects in each chunk
  • concise - Optimized streaming format with reduced redundancy and enhanced reasoning visibility
The concise mode is designed to minimize bandwidth usage and provide better visibility into the model’s reasoning process.

Quick Comparison

FeatureFull ModeConcise Mode
Message aggregationServer-side (includes choices.message)Client-side (delta only)
Chunk typesSingle type (chat.completion.chunk)Multiple types for different stages
Search resultsMultiple times during streamOnly in done chunks
BandwidthHigher (includes redundant data)Lower (optimized for efficiency)

Using Concise Mode

Set stream_mode: "concise" when creating streaming completions:
  • Python SDK
  • TypeScript SDK
  • cURL
from perplexity import Perplexity

client = Perplexity()

stream = client.chat.completions.create(
    model="sonar-pro",
    messages=[{"role": "user", "content": "What's the weather in Seattle?"}],
    stream=True,
    stream_mode="concise"
)

for chunk in stream:
    print(f"Chunk type: {chunk.object}")
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Understanding Chunk Types

In concise mode, you’ll receive four different types of chunks during the stream:

1. chat.reasoning

Streamed during the reasoning stage, containing real-time reasoning steps and search operations.
  • Structure
  • Python Handler
  • TypeScript Handler
{
  "id": "cfa38f9d-fdbc-4ac6-a5d2-a3010b6a33a6",
  "model": "sonar-pro",
  "created": 1759441590,
  "object": "chat.reasoning",
  "choices": [{
    "index": 0,
    "finish_reason": null,
    "message": {
      "role": "assistant",
      "content": ""
    },
    "delta": {
      "role": "assistant",
      "content": "",
      "reasoning_steps": [{
        "thought": "Searching the web for Seattle's current weather...",
        "type": "web_search",
        "web_search": {
          "search_results": [...],
          "search_keywords": ["Seattle current weather"]
        }
      }]
    }
  }],
  "type": "message"
}

2. chat.reasoning.done

Marks the end of the reasoning stage and includes all search results (web, images, videos) and reasoning steps.
  • Structure
  • Python Handler
  • TypeScript Handler
{
  "id": "3dd9d463-0fef-47e3-af70-92f9fcc4db1f",
  "model": "sonar-pro",
  "created": 1759459505,
  "object": "chat.reasoning.done",
  "usage": {
    "prompt_tokens": 6,
    "completion_tokens": 0,
    "total_tokens": 6,
    "search_context_size": "low"
  },
  "search_results": [...],
  "images": [...],
  "choices": [{
    "index": 0,
    "finish_reason": null,
    "message": {
      "role": "assistant",
      "content": "",
      "reasoning_steps": [...]
    },
    "delta": {
      "role": "assistant",
      "content": ""
    }
  }]
}

3. chat.completion.chunk

Streamed during the response generation stage, containing the actual content being generated.
  • Structure
  • Python Handler
  • TypeScript Handler
{
  "id": "cfa38f9d-fdbc-4ac6-a5d2-a3010b6a33a6",
  "model": "sonar-pro",
  "created": 1759441592,
  "object": "chat.completion.chunk",
  "choices": [{
    "index": 0,
    "finish_reason": null,
    "message": {
      "role": "assistant",
      "content": ""
    },
    "delta": {
      "role": "assistant",
      "content": " tonight"
    }
  }]
}

4. chat.completion.done

Final chunk indicating the stream is complete, including final search results, usage statistics, and cost information.
  • Structure
  • Python Handler
  • TypeScript Handler
{
  "id": "cfa38f9d-fdbc-4ac6-a5d2-a3010b6a33a6",
  "model": "sonar-pro",
  "created": 1759441595,
  "object": "chat.completion.done",
  "usage": {
    "prompt_tokens": 6,
    "completion_tokens": 238,
    "total_tokens": 244,
    "search_context_size": "low",
    "cost": {
      "input_tokens_cost": 0.0,
      "output_tokens_cost": 0.004,
      "request_cost": 0.006,
      "total_cost": 0.01
    }
  },
  "search_results": [...],
  "images": [...],
  "choices": [{
    "index": 0,
    "finish_reason": "stop",
    "message": {
      "role": "assistant",
      "content": "## Seattle Weather Forecast\n\nSeattle is experiencing...",
      "reasoning_steps": [...]
    },
    "delta": {
      "role": "assistant",
      "content": ""
    }
  }]
}

Complete Implementation Examples

Full Concise Mode Handler

  • Python SDK
  • TypeScript SDK
  • Raw HTTP
from perplexity import Perplexity

class ConciseStreamHandler:
    def __init__(self):
        self.content = ""
        self.reasoning_steps = []
        self.search_results = []
        self.images = []
        self.usage = None

    def stream_query(self, query: str, model: str = "sonar-pro"):
        """Handle a complete concise streaming request"""
        client = Perplexity()

        stream = client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": query}],
            stream=True,
            stream_mode="concise"
        )

        for chunk in stream:
            self.process_chunk(chunk)

        return self.get_result()

    def process_chunk(self, chunk):
        """Route chunk to appropriate handler"""
        chunk_type = chunk.object

        if chunk_type == "chat.reasoning":
            self.handle_reasoning(chunk)
        elif chunk_type == "chat.reasoning.done":
            self.handle_reasoning_done(chunk)
        elif chunk_type == "chat.completion.chunk":
            self.handle_content(chunk)
        elif chunk_type == "chat.completion.done":
            self.handle_done(chunk)

    def handle_reasoning(self, chunk):
        """Process reasoning updates"""
        delta = chunk.choices[0].delta

        if hasattr(delta, 'reasoning_steps'):
            for step in delta.reasoning_steps:
                self.reasoning_steps.append(step)
                print(f"💭 {step.thought}")

    def handle_reasoning_done(self, chunk):
        """Process end of reasoning"""
        if hasattr(chunk, 'search_results'):
            self.search_results = chunk.search_results
            print(f"\n🔍 Found {len(self.search_results)} sources")

        if hasattr(chunk, 'images'):
            self.images = chunk.images
            print(f"🖼️  Found {len(self.images)} images")

        print("\n📝 Generating response...\n")

    def handle_content(self, chunk):
        """Process content chunks"""
        delta = chunk.choices[0].delta

        if hasattr(delta, 'content') and delta.content:
            self.content += delta.content
            print(delta.content, end='', flush=True)

    def handle_done(self, chunk):
        """Process completion"""
        if hasattr(chunk, 'usage'):
            self.usage = chunk.usage
            print(f"\n\n✅ Complete | Tokens: {self.usage.total_tokens}")

            if hasattr(self.usage, 'cost'):
                print(f"💰 Cost: ${self.usage.cost.total_cost:.4f}")

    def get_result(self):
        """Return complete result"""
        return {
            'content': self.content,
            'reasoning_steps': self.reasoning_steps,
            'search_results': self.search_results,
            'images': self.images,
            'usage': self.usage
        }

# Usage
handler = ConciseStreamHandler()
result = handler.stream_query("What's the latest news in AI?")

print(f"\n\nFinal content length: {len(result['content'])} characters")
print(f"Sources used: {len(result['search_results'])}")

Best Practices

1

Aggregate content on the client side

In concise mode, choices.message is not incrementally updated. You must aggregate chunks yourself.
# Track content yourself
content = ""
for chunk in stream:
    if chunk.object == "chat.completion.chunk":
        if chunk.choices[0].delta.content:
            content += chunk.choices[0].delta.content
2

Use reasoning steps for transparency

Display reasoning steps to users for better transparency and trust.
def display_reasoning(step):
    """Show reasoning to users"""
    print(f"🔍 Searching for: {step.web_search.search_keywords}")
    print(f"💭 {step.thought}")
3

Handle search results from done chunks only

Search results and usage information only appear in chat.reasoning.done and chat.completion.done chunks.
# Don't check for search_results in other chunk types
if chunk.object in ["chat.reasoning.done", "chat.completion.done"]:
    if hasattr(chunk, 'search_results'):
        process_search_results(chunk.search_results)
4

Implement proper type checking

Use the object field to route chunks to appropriate handlers.
chunk_handlers = {
    "chat.reasoning": handle_reasoning,
    "chat.reasoning.done": handle_reasoning_done,
    "chat.completion.chunk": handle_content,
    "chat.completion.done": handle_done
}

handler = chunk_handlers.get(chunk.object)
if handler:
    handler(chunk)
5

Track cost from the final chunk

Cost information is only available in the chat.completion.done chunk.
if chunk.object == "chat.completion.done":
    if hasattr(chunk.usage, 'cost'):
        total_cost = chunk.usage.cost.total_cost
        print(f"Request cost: ${total_cost:.4f}")

Migration from Full Mode

If you’re migrating from full mode to concise mode, here are the key changes:
  • Before (Full Mode)
  • After (Concise Mode)
from perplexity import Perplexity

client = Perplexity()

stream = client.chat.completions.create(
    model="sonar-pro",
    messages=[{"role": "user", "content": "What's the weather?"}],
    stream=True
    # stream_mode defaults to "full"
)

for chunk in stream:
    # All chunks are chat.completion.chunk
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

    # Search results may appear in multiple chunks
    if hasattr(chunk, 'search_results'):
        print(f"Sources: {len(chunk.search_results)}")

When to Use Each Mode

Use Full Mode

  • Simple integrations where you want the SDK to handle aggregation
  • Backward compatibility with existing implementations
  • When you don’t need reasoning visibility

Use Concise Mode

  • Production applications optimizing for bandwidth
  • Applications that need reasoning transparency
  • Real-time chat interfaces with reasoning display
  • Cost-sensitive applications

Resources