Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.perplexity.ai/llms.txt

Use this file to discover all available pages before exploring further.

This guide covers search domain filtering on the Agent API. You will learn how to use allowlists to restrict search to trusted domains, denylists to exclude unwanted sources, and practical patterns for common use cases like news-only search, government data, and competitor exclusion.
Domain filtering is configured per-tool under the tools array via tools[].filters.search_domain_filter. For the full reference, see Agent API Filters.

Prerequisites

Install the Perplexity SDK:
pip install perplexityai
If you don’t have an API key yet:

Get your Perplexity API Key

Navigate to the API Keys tab in the API Portal and generate a new key.
Then export your API key as an environment variable:
export PERPLEXITY_API_KEY="your-api-key"

How Domain Filtering Works

The search_domain_filter parameter accepts a list of domain strings:
  • Allowlist (no prefix): Include only results from these domains. ["reuters.com", "apnews.com"] means search only Reuters and AP News.
  • Denylist (- prefix): Exclude results from these domains. ["-reddit.com", "-twitter.com"] means exclude Reddit and Twitter.
Never mix allowlist and denylist entries in the same request. The API does not support combining "reuters.com" and "-reddit.com" in the same array. Use either all allowlist or all denylist entries.

Basic Domain Filtering

Domain filters are configured per-tool under the tools array.
from perplexity import Perplexity

client = Perplexity()

# Allowlist: search only specific domains
response = client.responses.create(
    model="openai/gpt-5.4",
    input="What are the latest developments in AI regulation?",
    tools=[{
        "type": "web_search",
        "filters": {
            "search_domain_filter": ["reuters.com", "apnews.com", "bbc.com"],
        },
    }],
)
print(response.output_text)

Pattern: Denylist Filtering

Use the - prefix to exclude specific domains from search results.
from perplexity import Perplexity

client = Perplexity()

# Denylist: exclude social media and user-generated content
response = client.responses.create(
    model="openai/gpt-5.4",
    input="What are the latest developments in AI regulation?",
    tools=[{
        "type": "web_search",
        "filters": {
            "search_domain_filter": ["-reddit.com", "-twitter.com", "-quora.com", "-medium.com"],
        },
    }],
)
print(response.output_text)
Restrict results to major news outlets for current events and breaking news.
from perplexity import Perplexity

client = Perplexity()

NEWS_DOMAINS = [
    "reuters.com",
    "apnews.com",
    "bbc.com",
    "nytimes.com",
    "washingtonpost.com",
    "theguardian.com",
    "bloomberg.com",
    "ft.com",
]

response = client.responses.create(
    model="openai/gpt-5.4",
    input="What happened in global markets today?",
    tools=[{
        "type": "web_search",
        "filters": {
            "search_domain_filter": NEWS_DOMAINS,
            "search_recency_filter": "day",
        },
    }],
)
print(response.output_text)
Combine search_domain_filter with search_recency_filter for time-sensitive queries. Options are day, week, month, and year.

Pattern: Government and Official Sources

Restrict to government domains for policy, regulation, and official statistics.
from perplexity import Perplexity

client = Perplexity()

GOV_DOMAINS = [
    ".gov",          # US federal and state
    ".gov.uk",       # UK government
    ".europa.eu",    # EU institutions
    "who.int",       # World Health Organization
    "worldbank.org", # World Bank
]

response = client.responses.create(
    model="openai/gpt-5.4",
    input="What are the current US federal guidelines on AI usage in healthcare?",
    tools=[{
        "type": "web_search",
        "filters": {
            "search_domain_filter": GOV_DOMAINS,
        },
    }],
)
print(response.output_text)

Pattern: Academic and Research Filtering

Target educational and research institutions.
from perplexity import Perplexity

client = Perplexity()

ACADEMIC_DOMAINS = [
    ".edu",
    "arxiv.org",
    "scholar.google.com",
    "pubmed.ncbi.nlm.nih.gov",
    "nature.com",
    "science.org",
    "ieee.org",
]

response = client.responses.create(
    model="openai/gpt-5.4",
    input="What are recent advances in protein structure prediction?",
    tools=[{
        "type": "web_search",
        "filters": {
            "search_domain_filter": ACADEMIC_DOMAINS,
        },
    }],
)
print(response.output_text)

Pattern: Competitor Exclusion

Use denylists to exclude competitor websites from search results when building customer-facing content.
from perplexity import Perplexity

client = Perplexity()

# Exclude competitor domains from product research
EXCLUDED_DOMAINS = [
    "-competitor-a.com",
    "-competitor-b.io",
    "-competitor-c.ai",
]

response = client.responses.create(
    model="openai/gpt-5.4",
    input="What are the best practices for building real-time data pipelines?",
    tools=[{
        "type": "web_search",
        "filters": {
            "search_domain_filter": EXCLUDED_DOMAINS,
        },
    }],
)
print(response.output_text)

Configurable Filter Builder

A reusable helper that builds domain filter configurations from named presets.
from perplexity import Perplexity

client = Perplexity()

# Named filter presets
FILTER_PRESETS = {
    "news": ["reuters.com", "apnews.com", "bbc.com", "bloomberg.com", "ft.com"],
    "academic": [".edu", "arxiv.org", "nature.com", "science.org", "pubmed.ncbi.nlm.nih.gov"],
    "government": [".gov", ".gov.uk", ".europa.eu", "who.int"],
    "tech": ["techcrunch.com", "arstechnica.com", "theverge.com", "wired.com"],
    "no_social": ["-reddit.com", "-twitter.com", "-facebook.com", "-tiktok.com", "-quora.com"],
    "no_seo_spam": ["-pinterest.com", "-medium.com", "-hubspot.com"],
}


def search_with_preset(query: str, preset: str, recency: str = None) -> str:
    """Run a search with a named domain filter preset."""
    if preset not in FILTER_PRESETS:
        raise ValueError(f"Unknown preset: {preset}. Options: {list(FILTER_PRESETS.keys())}")

    filters = {"search_domain_filter": FILTER_PRESETS[preset]}
    if recency:
        filters["search_recency_filter"] = recency

    response = client.responses.create(
        model="openai/gpt-5.4",
        input=query,
        tools=[{"type": "web_search", "filters": filters}],
    )
    return response.output_text


# Usage
print("--- News Search ---")
print(search_with_preset("Latest AI regulation news", "news", recency="week"))

print("\n--- Academic Search ---")
print(search_with_preset("CRISPR gene editing recent papers", "academic"))

print("\n--- Clean Search (no social media) ---")
print(search_with_preset("Best Python testing frameworks", "no_social"))

Common Pitfalls

Mixing allowlist and denylist

# ❌ WRONG: mixing allowlist and denylist
search_domain_filter=["reuters.com", "-reddit.com"]

# ✅ CORRECT: use only allowlist
search_domain_filter=["reuters.com", "apnews.com", "bbc.com"]

# ✅ CORRECT: use only denylist
search_domain_filter=["-reddit.com", "-twitter.com"]

Using wildcards incorrectly

# ❌ WRONG: wildcards are not supported
search_domain_filter=["*.gov"]

# ✅ CORRECT: use the TLD directly
search_domain_filter=[".gov"]

Empty filter arrays

# ❌ WRONG: empty array has undefined behavior
search_domain_filter=[]

# ✅ CORRECT: omit the parameter to search all domains
# (simply don't include search_domain_filter)

Tips and Best Practices

  1. Keep allowlists focused. 5-10 domains is usually sufficient. Too many domains dilutes the filter’s purpose.
  2. Use denylists for broad exclusion. When you want to exclude a few noisy sources but otherwise search the full web, denylists are more practical than trying to allowlist everything else.
  3. Combine with recency filters. For time-sensitive queries, add search_recency_filter alongside domain filters.
  4. Test your filters. Run the same query with and without filters to verify that results change as expected.
  5. TLD filters work broadly. Using .gov matches any domain ending in .gov, including whitehouse.gov, irs.gov, and state domains like ca.gov.
  6. Store presets in configuration. Define filter presets in your app configuration rather than hardcoding them in every request.

Next Steps

Agent API Filters

Full reference for domain, date range, and location filters on the Agent API.

Search API Filters

Domain filtering on the raw Search API for result-level control.

Academic Search

Specialized academic search with domain filtering.

Migration Guide

Migrate from Sonar to the Agent API for multi-provider access and tools.