Overview

Sonar models support document analysis through file uploads. You can provide files either as URLs to publicly accessible documents or as base64 encoded bytes. Ask questions about document content, get summaries, extract information, and perform detailed analysis of uploaded files in multiple formats including PDF, DOC, DOCX, TXT, and RTF.
SDK Installation Required: Install the official SDK first - pip install perplexityai for Python or npm install @perplexity-ai/perplexity_ai for TypeScript/JavaScript.
Document files can be provided as:
  • A public URL pointing to the file
  • Base64 encoded bytes (without any prefix)
Supported formats: PDF, DOC, DOCX, TXT, RTF.
The maximum file size is 50MB. Files larger than this limit will not be processed.

Supported Features

  • Document Summarization: Get concise summaries of document content
  • Question Answering: Ask specific questions about the document
  • Content Extraction: Extract key information, data, and insights
  • Multi-language Support: Analyze documents in various languages
  • Large Document Handling: Process lengthy documents efficiently
  • Multiple Formats: Support for PDF, DOC, DOCX, TXT, and RTF files

Basic Usage

Simple Document Analysis

Using a Public URL

curl -X POST "https://api.perplexity.ai/chat/completions" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "content": [
          {
            "type": "text",
            "text": "Summarize this document"
          },
          {
            "type": "file_url",
            "file_url": {
              "url": "https://example.com/document.pdf",
              "file_name": "document.pdf"
            }
          }
        ],
        "role": "user"
      }
    ],
    "model": "sonar-pro"
  }'

Using Base64 Encoded Bytes

curl -X POST "https://api.perplexity.ai/chat/completions" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "content": [
          {
            "type": "text",
            "text": "Summarize this document"
          },
          {
            "type": "file_url",
            "file_url": {
              "url": "JVBERi0xLjQKJeLjz9MKNCAwIG9iago...",
              "file_name": "report.pdf"
            }
          }
        ],
        "role": "user"
      }
    ],
    "model": "sonar-pro"
  }'
curl -X POST "https://api.perplexity.ai/chat/completions" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "content": [
          {
            "type": "text",
            "text": "What are the key findings in this research paper? Provide additional context from recent studies."
          },
          {
            "type": "file_url",
            "file_url": {
              "url": "https://example.com/research-paper.pdf",
              "file_name": "research-paper.pdf"
            }
          }
        ],
        "role": "user"
      }
    ],
    "model": "sonar-pro",
    "web_search_options": {"search_type": "pro"}
  }'

File Requirements

Format Support

  • PDF files (.pdf extension)
  • Word documents (.doc, .docx extensions)
  • Text files (.txt extension)
  • Rich Text Format (.rtf extension)
  • Text-based documents (not scanned images)
  • Base64 encoded file bytes
  • Password-protected files (if publicly accessible)

Size Limits

  • Maximum file size: 50MB
  • Recommended: Under 50MB for optimal performance
  • Maximum processing time: 60 seconds
  • Large files may take longer to analyze

Common Use Cases

Academic Research

question = "What methodology was used in this study and what were the main conclusions?"
question = "Extract the key terms and conditions from this contract"

Financial Reports

question = "What are the revenue trends and key financial metrics mentioned?"

Technical Documentation

question = "Explain the implementation details and provide a step-by-step guide"

Best Practices

  • Be specific about what information you need
  • Ask one focused question per request for best results
  • Use follow-up questions to dive deeper into specific sections
  • Ensure documents are text-based, not scanned images
  • For URLs: Use publicly accessible URLs (Google Drive, Dropbox, etc.)
  • For URLs: Verify the URL returns the document directly, not a preview page
  • For base64: Encode the entire file content properly
  • For base64: Provide only the base64 string without any prefix (no data: URI scheme)
  • Break down complex questions into smaller parts
  • Consider processing large documents in sections
  • Use streaming for real-time responses on lengthy analyses

Error Handling

Common Issues

ErrorCauseSolution
Invalid URLURL not accessible or invalid base64Verify URL returns file directly or check base64 encoding
File too largeFile exceeds 50MB limitCompress or split the document
Processing timeoutDocument too complexSimplify question or use smaller sections
Invalid base64Malformed base64 stringEnsure proper base64 encoding without prefix

Pricing

PDF analysis follows standard Sonar pricing based on:
  • Input tokens (document content + question)
  • Output tokens (AI response)
  • Web search usage (if enabled)
Large documents consume more input tokens. Consider the document size when estimating costs.