Academic and Scholarly Search

This guide shows how to use the Agent API’s search_domain_filter to restrict search results to academic and scholarly sources. You will learn how to extract paper metadata (DOIs, authors, publication dates), build citation chains across related papers, and produce properly attributed research summaries.

The search_domain_filter parameter on the Agent API’s web_search tool controls which domains the search draws from. By filtering to academic domains like arxiv.org, nature.com, and .edu, you restrict results to peer-reviewed journals, preprint servers, and academic databases. For more on filtering, see the Agent API Filters docs.

Prerequisites

Install the Perplexity SDK:

pip install perplexityai

If you don’t have an API key yet:

Get your Perplexity API Key

Navigate to the API Keys tab in the API Portal and generate a new key.

Then export your API key as an environment variable:

export PERPLEXITY_API_KEY="your-api-key"

Basic Academic Search

Use search_domain_filter to restrict the Agent API’s web_search tool to academic sources only.

from perplexity import Perplexity

client = Perplexity()

ACADEMIC_DOMAINS = [
    "arxiv.org",
    "pubmed.ncbi.nlm.nih.gov",
    "nature.com",
    "science.org",
    ".edu",
    "scholar.google.com",
    "semanticscholar.org",
]

response = client.responses.create(
    model="openai/gpt-5.4",
    input="What are the latest findings on the relationship between gut microbiome and mental health?",
    tools=[{
        "type": "web_search",
        "filters": {
            "search_domain_filter": ACADEMIC_DOMAINS,
        },
    }],
    instructions="Focus on peer-reviewed academic sources. Cite papers with authors and publication years when possible.",
)

print(response.output_text)

Academic domain filtering targets papers from PubMed, arXiv, Google Scholar, Semantic Scholar, and major journal publishers. Combine search_domain_filter with clear instructions to ensure the model focuses on peer-reviewed or pre-print academic content.

Extracting Paper Metadata

Use structured outputs to extract detailed paper metadata from academic search results.

import json
from perplexity import Perplexity

client = Perplexity()

# Use Agent API with web_search for structured extraction
response = client.responses.create(
    model="openai/gpt-5.4",
    input="Find the 5 most cited recent papers on transformer architectures in computer vision (Vision Transformers).",
    tools=[{"type": "web_search"}],
    instructions=(
        "Search for academic papers only. For each paper, extract the title, authors, "
        "publication year, journal or venue, DOI if available, and a one-sentence summary of the key contribution."
    ),
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "academic_papers",
            "schema": {
                "type": "object",
                "properties": {
                    "query": {"type": "string"},
                    "papers": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "title": {"type": "string"},
                                "authors": {"type": "string"},
                                "year": {"type": "integer"},
                                "venue": {"type": "string"},
                                "doi": {"type": "string"},
                                "key_contribution": {"type": "string"},
                            },
                            "required": ["title", "authors", "year", "venue", "doi", "key_contribution"],
                            "additionalProperties": false,
                        },
                    },
                },
                "required": ["query", "papers"],
                "additionalProperties": false,
            },
        },
    },
)

data = json.loads(response.output_text)
print(f"Query: {data['query']}\n")

for paper in data["papers"]:
    print(f"  {paper['title']}")
    print(f"    Authors: {paper['authors']}")
    print(f"    Venue: {paper['venue']} ({paper['year']})")
    if paper["doi"]:
        print(f"    DOI: {paper['doi']}")
    print(f"    Contribution: {paper['key_contribution']}")
    print()

Building Citation Chains

Trace how papers cite each other to understand the evolution of an idea across the literature.

import json
from perplexity import Perplexity

client = Perplexity()


def find_citing_papers(paper_title: str, depth: int = 0, max_depth: int = 2) -> dict:
    """Recursively find papers that cite a given paper."""
    indent = "  " * depth
    print(f"{indent}Searching citations for: {paper_title}...")

    response = client.responses.create(
        model="openai/gpt-5.4",
        input=f"What are the 3 most important papers that directly cite or build upon '{paper_title}'?",
        tools=[{"type": "web_search"}],
        instructions="Focus on academic papers only. Return papers that explicitly reference or extend the given work.",
        response_format={
            "type": "json_schema",
            "json_schema": {
                "name": "citing_papers",
                "schema": {
                    "type": "object",
                    "properties": {
                        "source_paper": {"type": "string"},
                        "citing_papers": {
                            "type": "array",
                            "items": {
                                "type": "object",
                                "properties": {
                                    "title": {"type": "string"},
                                    "authors": {"type": "string"},
                                    "year": {"type": "integer"},
                                    "relationship": {"type": "string"},
                                },
                                "required": ["title", "authors", "year", "relationship"],
                                "additionalProperties": false,
                            },
                        },
                    },
                    "required": ["source_paper", "citing_papers"],
                    "additionalProperties": false,
                },
            },
        },
    )

    data = json.loads(response.output_text)
    result = {
        "paper": paper_title,
        "cited_by": [],
    }

    for citing in data["citing_papers"]:
        entry = {
            "title": citing["title"],
            "authors": citing["authors"],
            "year": citing["year"],
            "relationship": citing["relationship"],
        }

        # Recurse for deeper citation chains
        if depth < max_depth:
            entry["cited_by"] = find_citing_papers(citing["title"], depth + 1, max_depth).get("cited_by", [])

        result["cited_by"].append(entry)

    return result


# Start with a foundational paper
chain = find_citing_papers("Attention Is All You Need", max_depth=1)
print(json.dumps(chain, indent=2))

Citation chain depth grows exponentially. Keep max_depth low (1-2) to avoid excessive API calls. For comprehensive citation graphs, use dedicated tools like Semantic Scholar’s API alongside Perplexity for summaries.

Research Summary with Attribution

Generate a research summary that properly attributes each claim to its source paper.

from perplexity import Perplexity

client = Perplexity()

ACADEMIC_DOMAINS = [
    "arxiv.org", "pubmed.ncbi.nlm.nih.gov", "nature.com",
    "science.org", ".edu", "scholar.google.com",
]


def academic_research_summary(topic: str) -> str:
    """Generate an academic research summary with proper citations."""
    response = client.responses.create(
        model="openai/gpt-5.4",
        input=(
            f"Provide a comprehensive academic literature review on: {topic}. "
            "Include specific findings, methodologies, and conclusions from recent papers. "
            "Cite each claim with its source."
        ),
        tools=[{
            "type": "web_search",
            "filters": {
                "search_domain_filter": ACADEMIC_DOMAINS,
            },
        }],
        instructions=(
            "Search for peer-reviewed academic sources only. For each claim, "
            "attribute it to the specific paper with author names and year. "
            "Format the output as a structured literature review with a references section."
        ),
    )

    return f"# Literature Review: {topic}\n\n{response.output_text}"


report = academic_research_summary(
    "the effectiveness of large language models for automated code review"
)
print(report)

Multi-Field Academic Search

Use field-specific domain filters to search across different academic disciplines.

from perplexity import Perplexity

client = Perplexity()

ACADEMIC_DOMAINS = {
    "biomedical": ["pubmed.ncbi.nlm.nih.gov", "nih.gov", "thelancet.com", "nejm.org"],
    "computer_science": ["arxiv.org", "dl.acm.org", "ieee.org", "openreview.net"],
    "social_science": ["jstor.org", "ssrn.com", "journals.sagepub.com"],
}


def field_specific_search(query: str, field: str) -> dict:
    """Search academic literature within a specific field."""
    domains = ACADEMIC_DOMAINS.get(field, [])

    response = client.responses.create(
        model="openai/gpt-5.4",
        input=query,
        tools=[{
            "type": "web_search",
            "filters": {
                "search_domain_filter": domains,
            },
        }] if domains else [{"type": "web_search"}],
        instructions=f"Search for peer-reviewed academic sources in the {field.replace('_', ' ')} field. Cite papers with authors and years.",
    )

    return {
        "field": field,
        "content": response.output_text,
    }


# Search across multiple fields
query = "What are the ethical implications of AI-generated content?"
fields = ["computer_science", "social_science"]

for field in fields:
    result = field_specific_search(query, field)
    print(f"\n{'='*60}")
    print(f"Field: {result['field']}")
    print(f"{'='*60}")
    print(result["content"][:500])

Tips and Best Practices

Use search_domain_filter with academic domains to restrict results to peer-reviewed sources. Target domains like arxiv.org, nature.com, pubmed.ncbi.nlm.nih.gov, and .edu.
Use instructions to guide academic focus. Tell the model to prioritize peer-reviewed papers, cite authors and years, and focus on specific fields.
Use field-specific domain lists to narrow results to specific publishers or databases (e.g., PubMed for biomedical, arXiv for CS).
Use structured outputs for metadata extraction. JSON schemas ensure consistent paper metadata across queries.
Request specific details in your prompt. Ask for “authors, year, journal, and key findings” to get more complete metadata in the response.
Combine search_domain_filter with search_recency_filter for time-sensitive research. Use "week", "month", or "year" to find recent publications.

Next Steps

Agent API Filters

Full reference for domain, recency, and location filters on the Agent API.

Structured Outputs

Extract typed JSON for paper metadata and research findings.

Domain Filtering

Control which domains the search includes or excludes.

​Prerequisites

Get your Perplexity API Key

​Basic Academic Search

​Extracting Paper Metadata

​Building Citation Chains

​Research Summary with Attribution

​Multi-Field Academic Search

​Tips and Best Practices

​Next Steps

Agent API Filters

Structured Outputs

Domain Filtering

Prerequisites

Basic Academic Search

Extracting Paper Metadata

Building Citation Chains

Research Summary with Attribution

Multi-Field Academic Search

Tips and Best Practices

Next Steps