> ## Documentation Index
> Fetch the complete documentation index at: https://docs.perplexity.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Core Features

> Streaming and structured outputs for the Sonar API

## Overview

The Sonar API provides powerful features for building production-ready applications. This guide covers two core capabilities: streaming responses for real-time output and structured outputs for consistent data formats. For prompting guidance, see the [Prompt Guide](/docs/sonar/prompt-guide).

## Streaming Responses

Streaming allows you to receive partial responses from the Sonar API as they are generated, rather than waiting for the complete response. This is particularly useful for real-time user experiences, long responses, and interactive applications.

<Info>
  Streaming is supported across all Sonar models.
</Info>

### How Streaming Works

When streaming, you receive:

1. **Content chunks** which arrive progressively in real-time
2. **Search results** (delivered in the final chunk(s))
3. **Usage stats** and other metadata

<Warning>
  Search results and metadata are delivered in the **final chunk(s)** of a streaming response, not progressively during the stream.
</Warning>

### Example

```python theme={null}
from perplexity import Perplexity

client = Perplexity()

# Create streaming completion
stream = client.chat.completions.create(
    model="sonar",
    messages=[{"role": "user", "content": "Explain the architecture of retrieval-augmented generation (RAG) and the tradeoffs vs fine-tuning for domain adaptation."}],
    stream=True
)

# Process streaming response
content = ""
for chunk in stream:
    if chunk.choices[0].delta.content:
        content_piece = chunk.choices[0].delta.content
        content += content_piece
        print(content_piece, end="", flush=True)
    
    # Collect metadata from final chunks
    if hasattr(chunk, 'search_results') and chunk.search_results:
        search_results = chunk.search_results
    
    if hasattr(chunk, 'usage') and chunk.usage:
        usage_info = chunk.usage
```

<Accordion title="Response">
  ```json theme={null}
  {
    "id": "d85813fc-9af2-4ded-b192-bb58f1bc1c76",
    "choices": [
      {
        "delta": {
          "content": "",
          "role": "assistant",
          "reasoning_steps": null,
          "tool_call_id": null,
          "tool_calls": null
        },
        "index": 0,
        "message": {
          "content": "Retrieval-augmented generation (RAG) is an architecture that pairs an LLM with an external knowledge source and a retrieval system, so the model can “look things up” at inference time instead of relying only on what’s in its weights.\n\nBelow is a compact walkthrough of the architecture and then a comparison with fine‑tuning for domain adaptation: when to use which, and why.\n\n---\n\n## 1. RAG Architecture\n\nConceptually, RAG has two big pieces:\n\n1. **Retrieval** – Find relevant information from external data sources.\n2. **Generation** – Use the LLM to answer, grounded in that retrieved information.\n\nA typical RAG pipeline looks like this:\n\n### 1.1 Data Ingestion & Indexing (Offline)\n\nBefore any user queries, you prepare your knowledge base.\n\n1. **Source data**\n   - Documents, PDFs, wikis, tickets, APIs, databases, etc.\n\n2. **Preprocessing**\n   - Cleaning (strip boilerplate, remove duplicates).\n   - **Chunking** into passages (e.g., 200–1,000 tokens) with overlap so context isn’t cut mid‑section.\n   - Optional: enrich with metadata (titles, timestamps, doc type, permissions).\n\n3. **Embedding & Indexing**\n   - Use an **embedding model** to convert each chunk into a vector.\n   - Store vectors + metadata in a **vector database** (Pinecone, FAISS, Chroma, Azure AI Search, etc.).\n   - Often combined with **hybrid search**:\n     - Vector similarity (semantic)\n     - Keyword / BM25\n     - Optionally “semantic ranking” to re-rank results using an LLM or specialized ranker.\n\nResult: a searchable index that maps queries to relevant text chunks.\n\n---\n\n### 1.2 Retrieval (Online, per query)\n\nWhen a user asks something:\n\n1. **Query understanding**\n   - (Optional) LLM reformulates or expands the query (e.g., agentic retrieval, multi-query).\n   - Convert the query to an embedding vector.\n\n2. **Search**\n   - Retrieve top‑k chunks from the index via:\n     - Vector similarity (cosine, dot product, etc.).\n     - Possibly hybrid scoring (combine lexical + vector + metadata filters).\n\n3. **Context selection**\n   - Filter / re-rank results.\n   - Trim to fit within the **context window** budget.\n   - Possibly structure as:\n     - “Context blocks” with citations.\n     - Grouped by document or section.\n\nOutput: a set of relevant passages plus metadata that will be given to the LLM.\n\n---\n\n### 1.3 Augmentation & Generation\n\n1. **Prompt construction (“augmentation”)**\n   - Build a prompt with:\n     - System / instruction message (e.g., “Answer only using the context; if unknown, say so.”)\n     - User query.\n     - Retrieved context (often with explicit separators and IDs for citations).\n\n2. **LLM response generation**\n   - The LLM reads the augmented prompt and generates an answer.\n   - Good prompts make the LLM:\n     - Prefer retrieved facts over prior knowledge.\n     - Use citations / references.\n     - Refuse to guess when context is insufficient.\n\n3. **Post-processing (optional)**\n   - Add citations from retrieved chunks.\n   - Apply formatting, guardrails, or validation.\n   - Possibly store conversation state or retrieved info in a **memory** store for future turns.\n\n---\n\n### 1.4 Variants / Architectures\n\nBeyond “simple RAG”, there are common extensions:\n\n- **Classic single‑query RAG**  \n  One query → one retrieval → one generation. Simple orchestration.\n\n- **Agentic / multi‑step RAG**\n  - LLM decomposes a complex query into sub‑queries.\n  - Runs multiple retrieval calls (possibly across different tools/indices/APIs).\n  - Synthesizes a final answer from multiple sources.\n\n- **RAG with short‑term memory**\n  - Stores conversation history / past retrieved items.\n  - Uses them as additional context in subsequent turns.\n\n- **Advanced fusion strategies**\n  - **Fusion‑in‑Decoder (FiD)** style: LLM encodes separate retrieved passages and fuses them during decoding for better grounding.\n\n---\n\n## 2. Fine‑Tuning for Domain Adaptation\n\n**Fine‑tuning** changes the model’s weights using domain-specific examples. Types:\n\n- **Instruction / supervised fine‑tuning (SFT)**  \n  Train on (input, output) pairs showing desired behavior in your domain (e.g., “Given a contract section, extract obligations as JSON”).\n\n- **Continued pretraining / domain-adaptive pretraining**  \n  Train the model further on large volumes of *unlabeled* domain text to align its internal representations with domain language.\n\nResults: the model “bakes in” the domain style, terminology, and typical answers into its parameters.\n\n---\n\n## 3. RAG vs Fine‑Tuning for Domain Adaptation: Tradeoffs\n\n### 3.1 What they each do best\n\n**RAG is strongest for:**\n\n- **Fresh, changing information**\n  - Policies, prices, inventories, knowledge bases that change weekly/daily.\n- **Large proprietary corpora**\n  - Millions of documents that you can’t feasibly bake into weights.\n- **Traceability & compliance**\n  - Need to show *where* an answer came from (citations).\n- **Access control**\n  - Different users see different subsets of data; you can filter retrieval by permissions.\n\n**Fine‑tuning is strongest for:**\n\n- **Capability and behavior adaptation**\n  - New tasks or formats: classification, structured extraction, reasoning patterns.\n  - Domain style, tone, “how to talk like us.”\n- **Low-latency, low-dependency deployments**\n  - No retrieval infrastructure; just the model.\n\n---\n\n### 3.2 Data and maintenance\n\n**RAG:**\n\n- **Data requirements**\n  - Mostly raw text; no labels required.\n  - Basic cleaning and chunking needed.\n- **Maintenance**\n  - Update index when docs change.\n  - Swap or improve embedding model, retriever, ranker without retraining the LLM.\n- **Scalability of updates**\n  - Adding a new manual or policy: just ingest and re-index.\n\n**Fine‑tuning:**\n\n- **Data requirements**\n  - High‑quality labeled examples (for SFT).\n  - Larger unlabeled corpora (for continued pretraining).\n- **Maintenance**\n  - When domain changes, you may need more fine‑tuning.\n  - Updating knowledge in weights is slower and more expensive than updating an index.\n\n---\n\n### 3.3 Cost and complexity\n\n**RAG:**\n\n- **Inference cost**\n  - Extra cost for:\n    - Vector search / retrieval.\n    - Larger prompts (context tokens).\n  - But can reuse a *single* base model across many domains.\n- **Infrastructure complexity**\n  - Requires:\n    - Vector DB or search engine.\n    - Ingestion pipelines.\n    - Orchestrator / application layer.\n- **Training cost**\n  - Very low; mostly retrieval configuration (no or minimal model training).\n\n**Fine‑tuning:**\n\n- **Training cost**\n  - GPU time, engineering, and experimentation for each domain/task.\n- **Inference cost**\n  - Similar or slightly higher per token vs base model; no retrieval overhead.\n  - If fine‑tuned model is smaller, can *reduce* inference cost vs using a big general model + RAG.\n- **Operational simplicity**\n  - Serving a single model endpoint is simple; but you might end up with many specialized models.\n\n---\n\n### 3.4 Accuracy, hallucinations, and robustness\n\n**RAG:**\n\n- **Accuracy on factual, document‑answerable queries**\n  - Often higher, because the model is explicitly grounded in the correct passages.\n- **Hallucination**\n  - Substantially reduced if:\n    - Retrieval is high‑quality.\n    - Prompts enforce “answer only from context”.\n  - Still possible if retrieval returns wrong/irrelevant chunks.\n- **Robustness to domain shifts**\n  - If docs are up to date, RAG answers stay current; no retraining needed.\n\n**Fine‑tuning:**\n\n- **Accuracy**\n  - Great when:\n    - The domain is stable.\n    - The task is repeatable and captured in training data.\n  - But knowledge is “fuzzy” inside weights; the model may still fabricate details.\n- **Hallucinations**\n  - Not eliminated; model may confidently invent facts if training didn’t cover specific items or if they changed.\n- **Robustness to domain shifts**\n  - Degrades as real-world domain diverges from training data; needs fresh fine‑tuning.\n\n---\n\n### 3.5 Latency and UX\n\n**RAG:**\n\n- Additional network hops (search + LLM).\n- Index size and retrieval strategy affect speed.\n- Can often be optimized (caching, pre‑fetching, approximate nearest neighbor search), but still higher latency than “model‑only” in many setups.\n\n**Fine‑tuning:**\n\n- Single call to the model → lower baseline latency.\n- Useful for low-latency scenarios where a retrieval step is too slow or too brittle.\n\n---\n\n### 3.6 Security and governance\n\n**RAG:**\n\n- **Pros**\n  - Data stays in your controlled stores; you can:\n    - Enforce row‑level, document‑level access.\n    - Log exactly which documents influenced an answer.\n  - Easier to comply with “right to be forgotten” (remove/alter docs and re-index).\n- **Cons**\n  - Need to secure search infrastructure and any caching layers.\n  - Must integrate authorization into retrieval (e.g., filter by user permissions).\n\n**Fine‑tuning:**\n\n- **Pros**\n  - After fine‑tuning, no need to expose live databases to the model.\n- **Cons**\n  - Hard to remove specific facts once baked into weights.\n  - Harder to prove what data influenced which answer.\n\n---\n\n## 4. When to Use RAG, Fine‑Tuning, or Both\n\n### Use RAG primarily when:\n\n- Your knowledge changes frequently (support docs, product specs, policies).\n- You care about traceability, citations, and grounding.\n- You have lots of text but little or no labeled data.\n- Different users should see different slices of the knowledge base.\n\n### Use fine‑tuning primarily when:\n\n- You need **new capabilities / tasks** rather than just access to new facts:\n  - E.g., extract structured fields, follow complex internal workflows, reason in a specific way.\n- Domain is **relatively stable** and you can collect representative training data.\n- Latency and simplicity of runtime stack matter more than dynamic knowledge.\n\n### Combine both (common in production systems):\n\n- **RAG + fine‑tuned model**:\n  - Fine‑tune for:\n    - Domain style, tone, and task formats.\n    - Better adherence to instructions and constraints.\n  - Use RAG for:\n    - Live, fine‑grained factual grounding.\n    - Access to large knowledge bases.\n- This often gives:\n  - Better reasoning + better grounding than either alone.\n\n---\n\n## 5. Practical heuristic\n\nFor “domain adaptation” specifically:\n\n- If your main need is:  \n  “The model must accurately answer questions based on our docs and stay up to date” → **Start with RAG.**\n- If your main need is:  \n  “The model must perform a specialized task or format complex outputs that we can demonstrate via examples” → **Add fine‑tuning.**\n- As your system matures:  \n  “We already have RAG, but answers are stylistically off / brittle on tricky tasks” → **Fine‑tune the model that sits behind your RAG pipeline.**\n\nIf you share your concrete domain (e.g., legal, medical, customer support) and constraints (latency, budget, data volume), I can outline a recommended architecture and whether to invest first in RAG, fine‑tuning, or both.",
          "role": "assistant",
          "reasoning_steps": null,
          "tool_call_id": null,
          "tool_calls": null
        },
        "finish_reason": "stop"
      }
    ],
    "created": 1779391571,
    "model": "sonar-pro",
    "citations": [
      "https://humanloop.com/blog/rag-architectures",
      "https://learn.microsoft.com/en-us/azure/search/retrieval-augmented-generation-overview",
      "https://aws.amazon.com/what-is/retrieval-augmented-generation/",
      "https://www.geeksforgeeks.org/nlp/rag-architecture/",
      "https://www.youtube.com/watch?v=fZM3oX4xEyg",
      "https://www.databricks.com/blog/what-is-retrieval-augmented-generation",
      "https://www.ibm.com/think/topics/retrieval-augmented-generation",
      "https://www.k2view.com/what-is-retrieval-augmented-generation",
      "https://www.nvidia.com/en-us/glossary/retrieval-augmented-generation/"
    ],
    "object": "chat.completion",
    "search_results": [
      {
        "title": "8 Retrieval Augmented Generation (RAG) Architectures You Should ...",
        "url": "https://humanloop.com/blog/rag-architectures",
        "date": "2025-02-01",
        "last_updated": "2026-05-19",
        "snippet": "Unlike traditional models that rely solely on their pre-trained knowledge, RAG allows models to search external databases or documents during ...",
        "source": "web"
      },
      {
        "title": "RAG and Generative AI - Azure AI Search - Microsoft Learn",
        "url": "https://learn.microsoft.com/en-us/azure/search/retrieval-augmented-generation-overview",
        "date": "2026-01-15",
        "last_updated": "2026-03-19",
        "snippet": "Retrieval-augmented generation (RAG) is a pattern that extends LLM capabilities by grounding responses in your proprietary content. While ...",
        "source": "web"
      },
      {
        "title": "What is RAG? - Retrieval-Augmented Generation AI Explained - AWS",
        "url": "https://aws.amazon.com/what-is/retrieval-augmented-generation/",
        "date": "2026-05-13",
        "last_updated": "2026-05-17",
        "snippet": "RAG is the process of optimizing the output of a large language model, so it references an authoritative knowledge base outside of its training data sources ...",
        "source": "web"
      },
      {
        "title": "RAG Architecture - GeeksforGeeks",
        "url": "https://www.geeksforgeeks.org/nlp/rag-architecture/",
        "date": "2026-05-09",
        "last_updated": "2026-05-19",
        "snippet": "Retrieval-Augmented Generation (RAG) is an architecture that enhances LLMs by combining them with external knowledge sources, ...",
        "source": "web"
      },
      {
        "title": "Introduction To Undertsanding RAG(Retrieval-Augmented Generation)",
        "url": "https://www.youtube.com/watch?v=fZM3oX4xEyg",
        "date": "2025-08-31",
        "last_updated": "2026-05-21",
        "snippet": "Retrieval-Augmented Generation (RAG) is the process of optimizing the output of a large language model, so it references an authoritative ...",
        "source": "web"
      },
      {
        "title": "What is Retrieval Augmented Generation (RAG)? | Databricks",
        "url": "https://www.databricks.com/blog/what-is-retrieval-augmented-generation",
        "date": "2023-10-18",
        "last_updated": "2026-05-20",
        "snippet": "Retrieval augmented generation is an AI pattern that improves large language model answers by first retrieving relevant documents from external data sources ...",
        "source": "web"
      },
      {
        "title": "What is RAG (Retrieval Augmented Generation)? - IBM",
        "url": "https://www.ibm.com/think/topics/retrieval-augmented-generation",
        "date": "2024-10-21",
        "last_updated": "2026-04-24",
        "snippet": "RAG is an architecture for optimizing the performance of an artificial intelligence (AI) model by connecting it with external knowledge bases.",
        "source": "web"
      },
      {
        "title": "What is Retrieval-Augmented Generation (RAG)? A Practical Guide",
        "url": "https://www.k2view.com/what-is-retrieval-augmented-generation",
        "date": null,
        "last_updated": "2026-04-18",
        "snippet": "RAG is a Generative AI (GenAI) architecture that augments a Large Language Model (LLM) with fresh, trusted data retrieved from authoritative internal knowledge ...",
        "source": "web"
      },
      {
        "title": "What is Retrieval-Augmented Generation (RAG)? | NVIDIA Glossary",
        "url": "https://www.nvidia.com/en-us/glossary/retrieval-augmented-generation/",
        "date": "2026-03-16",
        "last_updated": "2026-05-21",
        "snippet": "RAG is an AI technique where an external data source is connected to a large language model (LLM) to generate domain-specific or the most up-to-date responses ...",
        "source": "web"
      }
    ],
    "status": null,
    "type": null,
    "usage": {
      "completion_tokens": 2538,
      "cost": {
        "input_tokens_cost": 8e-05,
        "output_tokens_cost": 0.03807,
        "total_cost": 0.04415,
        "citation_tokens_cost": null,
        "reasoning_tokens_cost": null,
        "request_cost": 0.006,
        "search_queries_cost": null
      },
      "prompt_tokens": 27,
      "total_tokens": 2565,
      "citation_tokens": null,
      "num_search_queries": null,
      "reasoning_tokens": null,
      "search_context_size": "low"
    }
  }
  ```
</Accordion>

## Structured Outputs

Structured outputs enable you to enforce specific response formats from Perplexity's models, ensuring consistent, machine-readable data that can be directly integrated into your applications without manual parsing.

We support **JSON Schema** structured outputs. To enable structured outputs, add a `response_format` field to your request with the following structure:

```json theme={null}
{
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "schema": { /* your JSON schema object */ }
    }
  }
}
```

<Tip>
  **Improve Schema Compliance**: Give the LLM hints about the output format in your prompts to improve adherence to the structured format. Include phrases like "Please return the data as a JSON object with the following structure..."
</Tip>

<Info>
  The first request with a new JSON Schema may incur a delay on the first token (typically 10-30 seconds) as the schema is prepared. Subsequent requests will not see this delay.
</Info>

### Example: Financial Analysis

```python theme={null}
from perplexity import Perplexity
from typing import List, Optional
from pydantic import BaseModel

class FinancialMetrics(BaseModel):
    company: str
    quarter: str
    revenue: float
    net_income: float
    eps: float
    revenue_growth_yoy: Optional[float] = None
    key_highlights: Optional[List[str]] = None

client = Perplexity()

completion = client.chat.completions.create(
    model="sonar-pro",
    messages=[
        {
            "role": "user",
            "content": "Summarize Apple's most recent annual 10-K filing: revenue by segment, gross margin trend, and operating cash flow."
        }
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "schema": FinancialMetrics.model_json_schema()
        }
    }
)

metrics = FinancialMetrics.model_validate_json(completion.choices[0].message.content)
print(f"Revenue: ${metrics.revenue}B")
```

<Accordion title="Response">
  ```json theme={null}
  {
    "id": "3edb88c4-495d-4767-ae43-c38d21c7ad30",
    "choices": [
      {
        "delta": {
          "content": "",
          "role": "assistant",
          "reasoning_steps": null,
          "tool_call_id": null,
          "tool_calls": null
        },
        "index": 0,
        "message": {
          "content": "Apple’s **most recent annual 10-K** in the search results is the **fiscal 2024 Form 10-K** for the year ended **September 28, 2024**.[2][3]\n\n- **Revenue by segment:** Apple reported **net sales by reportable segment** of **Americas: $112.0B**, **Europe: $101.3B**, **Greater China: $66.9B**, **Japan: $25.0B**, and **Rest of Asia Pacific: $32.1B** in fiscal 2024.[2]\n- **Gross margin trend:** Gross margin rose from **$169.1B in fiscal 2023** to **$180.7B in fiscal 2024**; as a percentage of net sales, that is **~44.1%** in 2024 versus **~44.1%** in 2023, so the *level* increased but the margin rate was essentially flat year over year.[2]\n- **Operating cash flow:** Apple generated **$118.3B** of **net cash from operating activities** in fiscal 2024, up from **$110.5B** in fiscal 2023.[2]\n\nIf you want, I can also break this down into a compact table with **three-year trends** from the 10-K.",
          "role": "assistant",
          "reasoning_steps": null,
          "tool_call_id": null,
          "tool_calls": null
        },
        "finish_reason": "stop"
      }
    ],
    "created": 1779895999,
    "model": "sonar-pro",
    "citations": [
      "https://www.apple.com/newsroom/pdfs/fy2025-q2/FY25_Q2_Consolidated_Financial_Statements.pdf",
      "https://www.sec.gov/Archives/edgar/data/320193/000032019324000123/aapl-20240928.htm",
      "https://investor.apple.com/sec-filings/sec-filings-details/default.aspx?FilingId=17933082",
      "https://investor.apple.com/sec-filings/default.aspx",
      "https://investor.apple.com/investor-relations/default.aspx",
      "https://www.annualreports.com/Company/apple-inc",
      "https://www.sec.gov/Archives/edgar/data/320193/000119312515356351/d17062d10k.htm"
    ],
    "object": "chat.completion",
    "search_results": [
      {
        "title": "[PDF] Consolidated Financial Statements - Apple",
        "url": "https://www.apple.com/newsroom/pdfs/fy2025-q2/FY25_Q2_Consolidated_Financial_Statements.pdf",
        "date": null,
        "last_updated": "2025-08-29",
        "snippet": "(1) Net sales by reportable segment: Americas. $. 40,315 $. 37,273 $. 92,963 $. 87,703. Europe. 24,454. 24,123. 58,315. 54,520. Greater China. 16,002. 16,372.",
        "source": "web"
      },
      {
        "title": "aapl-20240928 - SEC.gov",
        "url": "https://www.sec.gov/Archives/edgar/data/320193/000032019324000123/aapl-20240928.htm",
        "date": "2024-09-28",
        "last_updated": "2026-03-30",
        "snippet": "Operating income for each segment consists of net sales to third parties, related cost of sales, and operating ... Apple Inc. | 2024 Form 10-K | 57.",
        "source": "web"
      },
      {
        "title": "SEC Filings Details - Apple Investor Relations",
        "url": "https://investor.apple.com/sec-filings/sec-filings-details/default.aspx?FilingId=17933082",
        "date": "2024-11-01",
        "last_updated": "2026-05-15",
        "snippet": "SEC Filings · Leadership and Governance · Our Values · FAQ · Contact. SEC Filings Details. Form 10-K. Nov 01, 2024. Annual Report. HTML Format ...",
        "source": "web"
      },
      {
        "title": "SEC Filings - Apple Investor Relations",
        "url": "https://investor.apple.com/sec-filings/default.aspx",
        "date": null,
        "last_updated": "2026-05-15",
        "snippet": "SEC Filings. SEC Groupings. All Filings, Annual Filings, Quarterly Filings, Current Reports, Proxy Filings, Registration Statements, Section 16 Filings ...",
        "source": "web"
      },
      {
        "title": "Investor Relations - Apple",
        "url": "https://investor.apple.com/investor-relations/default.aspx",
        "date": null,
        "last_updated": "2026-04-17",
        "snippet": "2024 10-K · 2023 10-K · 2022 10-K. Additional Reports. Net Sales by Category. Reclassification of FY18 net sales ... Green Bond Report. Annual green bond impact ...",
        "source": "web"
      },
      {
        "title": "Apple Inc. - AnnualReports.com",
        "url": "https://www.annualreports.com/Company/apple-inc",
        "date": "2024-01-01",
        "last_updated": "2025-12-16",
        "snippet": "Apple Inc. MOST RECENT 2024 Annual Report and Form 10K. View PDF View Form 10K (HTML).",
        "source": "web"
      },
      {
        "title": "Form 10-K - SEC.gov",
        "url": "https://www.sec.gov/Archives/edgar/data/320193/000119312515356351/d17062d10k.htm",
        "date": "2010-09-25",
        "last_updated": "2026-03-08",
        "snippet": "The following table shows net sales by operating segment and net sales and unit sales ... Includes sales of Apple TV, Apple Watch, Beats products, iPod and ...",
        "source": "web"
      }
    ],
    "status": null,
    "type": null,
    "usage": {
      "completion_tokens": 282,
      "cost": {
        "input_tokens_cost": 8e-05,
        "output_tokens_cost": 0.00423,
        "total_cost": 0.01031,
        "citation_tokens_cost": null,
        "reasoning_tokens_cost": null,
        "request_cost": 0.006,
        "search_queries_cost": null
      },
      "prompt_tokens": 27,
      "total_tokens": 309,
      "citation_tokens": null,
      "num_search_queries": null,
      "reasoning_tokens": null,
      "search_context_size": "low"
    }
  }
  ```
</Accordion>

<Warning>
  **Links in JSON Responses**: Requesting links as part of a JSON response may not always work reliably. Use the links returned in the `citations` or `search_results` fields from the API response instead.
</Warning>

## Next Steps

<CardGroup cols={2}>
  <Card title="Sonar Prompt Guide" icon="book" href="/docs/sonar/prompt-guide">
    Sonar-specific prompting caveats and best practices.
  </Card>

  <Card title="Pro Search for Sonar Pro" icon="bolt" href="/docs/sonar/pro-search/quickstart">
    Enhanced search with automated tools, multi-step reasoning, and real-time thought streaming.
  </Card>

  <Card title="Sonar API Search Filters" icon="filter" href="/docs/sonar/filters">
    Learn how to control search behavior with filters and parameters.
  </Card>

  <Card title="Sonar API Media Attachments" icon="photo" href="/docs/sonar/media">
    Send and receive images, videos, and files with the Sonar API.
  </Card>
</CardGroup>
