Skip to main content

Overview

This guide builds an agent that takes a company name and returns a citation-grounded VC investment memo with seven sections: Snapshot, Team, Financials, Product, Market, Risks, and a Thesis ending in a one-line recommendation. Every claim is traced back to a primary source. It runs on the Perplexity Agent API and its built-in web_search and finance_search tools, orchestrated with LangGraph, and evaluated in LangSmith. The whole build runs in about ninety seconds for roughly $0.40 per memo. The design lesson generalizes beyond finance: separating search from synthesis is a structural reliability fix for a research agent. Four research nodes fan out in parallel, each calling the Agent API with its own tools. A final synthesizer node has no tools and can only cite evidence the research nodes already gathered, so the memo cannot invent a source. Company name flows into START, fans out to four parallel research nodes (team, financials, product, market), which all feed a single synthesizer node that produces the memo at END.

Features

  • Parallel research fan-out. Four focused research nodes (team, financials, product, market) run concurrently, each with its own tools and search budget.
  • Tool-less synthesizer. The final memo is composed only from upstream evidence, a structural guard against fabricated citations.
  • Built-in Agent API tools. web_search and finance_search work out of the box; no client-side search plumbing for the core agent.
  • Auditable in LangSmith. Every node’s tool calls and outputs are captured, so any claim traces back to the search result that produced it.
  • Provider eval harness. A LangSmith comparison that scores search providers on primary-source rate, financial-concept coverage, latency, and cost.

Prerequisites

Setup

pip install "langchain-perplexity>=1.4.0" langgraph langsmith
# ChatPerplexity reads PPLX_API_KEY.
export PPLX_API_KEY="pplx-..."
export LANGSMITH_API_KEY="ls__..."
export LANGSMITH_TRACING="true"   # capture every node's tool calls end-to-end

Build the agent

Everything in this section goes in one file. Paste the blocks in order into memo.py and you have the complete agent.

Graph state

Each research node reads company from the shared state and writes its findings into research_output; a reducer merges the parallel writes.
from __future__ import annotations

from datetime import datetime, timezone
from typing import Annotated, Any, TypedDict

from langchain_core.messages import AIMessage
from langchain_perplexity import ChatPerplexity
from langgraph.graph import END, START, StateGraph


def merge_research_output(left: dict[str, str], right: dict[str, str]) -> dict[str, str]:
    """Each research node returns {"<section>": "..."}; merge into one dict."""
    return {**(left or {}), **(right or {})}


class MemoState(TypedDict):
    company: str
    research_output: Annotated[dict[str, str], merge_research_output]
    memo: str

Models and tools

The Agent API exposes Perplexity’s built-in tools directly. The financials node adds finance_search; the rest use web_search. max_steps caps each node’s internal search loop, which is the per-node search budget.
SUBNODE_MODEL_NAME = "openai/gpt-5.5"
SYNTHESIZER_MODEL_NAME = "openai/gpt-5.5"


def _agent_model(model: str) -> ChatPerplexity:
    """Build a ChatPerplexity client wired to the Responses API."""
    # The Responses (Agent) API ignores sampling params like temperature, so we omit it.
    return ChatPerplexity(model=model, use_responses_api=True)


SUBNODE_MODEL = _agent_model(SUBNODE_MODEL_NAME)
SYNTHESIZER_MODEL = _agent_model(SYNTHESIZER_MODEL_NAME)


# Per-research-node tool specs.
TEAM_TOOLS = [{
    "type": "web_search",
    "filters": {"search_recency_filter": "year"},
}]

PRODUCT_TOOLS = [{"type": "web_search"}]

MARKET_TOOLS = [{"type": "web_search"}]

FINANCIALS_TOOLS = [{"type": "finance_search"}, {"type": "web_search"}]


# Per-research-node max_steps caps the Perplexity Agent API's internal search loop.
RESEARCH_MAX_STEPS = {
    "team": 2, "financials": 5, "product": 2, "market": 2,
}

Research prompts

One prompt template serves all four nodes; per-section guidance steers what each node hunts for and which sources to prefer.
RESEARCH_PROMPT = """You are a VC analyst writing the {section} section of the research output for {company}.

{guidance}

Return a markdown section, then end the document with a "### Citations" header \
followed by a markdown list of:

  - <url> — one-sentence evidence quoted from the source

Cite only URLs that came back from your tool calls; never fabricate URLs. \
Keep the section focused — 250-400 words is appropriate for the body."""


GUIDANCE = {
    "team": (
        "Search for the founders, CEO, and other named executives. Capture each "
        "leader's prior roles and education. Prioritize the company's own About/Team "
        "page and professional-network sources."
    ),
    "financials": (
        "If the company is public, use finance_search for revenue, margins, and analyst "
        "estimates. If private, use web_search for funding rounds, valuation, and "
        "disclosed revenue. Cross-check structured data against recent news."
    ),
    "product": (
        "Describe the company's flagship product, recent launches, and technical "
        "differentiators. Cite the company's own product or engineering pages where "
        "possible, plus tech-press coverage for context."
    ),
    "market": (
        "Map the competitive landscape, name direct competitors, and surface market "
        "sizing. Your web_search is scoped to analyst and trade-press sources."
    ),
}

Research nodes

All four nodes share one runner: a single Agent API call with that node’s tools and search budget. The API runs the search loop server-side, so there is no client-side tool plumbing here.
def _run_research(
    state: MemoState,
    *,
    section: str,
    tools: list[dict[str, Any]],
    max_steps: int,
) -> dict[str, dict[str, str]]:
    """Run one research section with the given tools and return its output."""
    msg: AIMessage = SUBNODE_MODEL.invoke(
        [
            {"role": "system", "content": RESEARCH_PROMPT.format(
                section=section, company=state["company"], guidance=GUIDANCE[section],
            )},
            {"role": "user", "content": f"Research the {section} of {state['company']}."},
        ],
        tools=tools,
        extra_body={"max_steps": max_steps},
    )
    return {"research_output": {section: msg.content}}


def team_node(state):
    """Research the founders and leadership team."""
    return _run_research(state, section="team",
        tools=TEAM_TOOLS, max_steps=RESEARCH_MAX_STEPS["team"])

def financials_node(state):
    """Research revenue, funding, and financial metrics."""
    return _run_research(state, section="financials",
        tools=FINANCIALS_TOOLS, max_steps=RESEARCH_MAX_STEPS["financials"])

def product_node(state):
    """Research the product, launches, and technical differentiators."""
    return _run_research(state, section="product",
        tools=PRODUCT_TOOLS, max_steps=RESEARCH_MAX_STEPS["product"])

def market_node(state):
    """Research the competitive landscape and market sizing."""
    return _run_research(state, section="market",
        tools=MARKET_TOOLS, max_steps=RESEARCH_MAX_STEPS["market"])

The synthesizer

The synthesizer has no tools. It composes all seven memo sections from the four nodes’ research outputs, so every cited claim is grounded in research one of the nodes actually did. Sections 1–6 each end with a ### Citations list pairing every source URL with the evidence it supports. The Thesis is the one analysis-only section, with no citations.
SYNTH_PROMPT = """You are a senior VC partner writing the final memo for {company}.

You may only cite evidence that appears in the research outputs below. You have no \
tools; do not browse or fabricate sources.

Produce a markdown memo with these seven sections, in order:

  1. Snapshot — what the company is, founded, valuation, positioning (3-4 sentences)
  2. Team — founders, leadership, recent senior hires
  3. Financials — revenue, growth, funding history, comparables
  4. Product — what they sell, technology, distribution
  5. Market — TAM, direct competitors, category dynamics
  6. Risks — top 3-5 risks with brief reasoning
  7. Thesis — 1-2 paragraphs of analysis, ending with a single line:
     "Recommendation: <PASS | TRACK | ADVANCE | LEAD>"

Each section's H2 heading must be exactly `## <N> · <Section Name>` \
(e.g. `## 1 · Snapshot`), using a middle-dot separator — the evaluator depends \
on this format.

Each of sections 1-6 must end with a `### Citations` subsection listing the \
<url> — <evidence> pairs drawn from the research outputs. Section 7 (Thesis) does \
not need its own citations.

If a research output lacks evidence for a section, write "Insufficient evidence in \
research outputs." in that section's body instead of guessing."""


def synthesizer_node(state: MemoState) -> dict[str, str]:
    """Combine all research outputs into the final memo. No tools attached."""
    research_output_block = "\n\n".join(
        f"## Research output: {name}\n\n{body}"
        for name, body in sorted(state["research_output"].items())
    )
    msg: AIMessage = SYNTHESIZER_MODEL.invoke([
        {"role": "system", "content": SYNTH_PROMPT.format(company=state["company"])},
        {"role": "user", "content": (
            f"Company: {state['company']}\n"
            f"As-of: {datetime.now(timezone.utc).isoformat(timespec='seconds')}\n\n"
            f"Research outputs:\n\n{research_output_block}"
        )},
    ])
    return {"memo": msg.content}

Wiring the graph

Four research nodes fan out from START in parallel and converge on the synthesizer. The wiring is short:
def build_graph():
    """Wire the four research nodes in parallel from START into the synthesizer, then END."""
    g = StateGraph(MemoState)
    g.add_node("team", team_node)
    g.add_node("financials", financials_node)
    g.add_node("product", product_node)
    g.add_node("market", market_node)
    g.add_node("synthesizer", synthesizer_node)

    for section in ("team", "financials", "product", "market"):
        g.add_edge(START, section)
        g.add_edge(section, "synthesizer")

    g.add_edge("synthesizer", END)
    return g.compile()

Running it

import argparse
import asyncio


async def run_memo(company: str) -> str:
    """Run the full memo agent for one company and return the final markdown memo."""
    graph = build_graph()
    final = await graph.ainvoke({"company": company, "research_output": {}, "memo": ""})
    return final["memo"]


def main() -> None:
    """CLI entrypoint: parse `--company` and print the generated memo."""
    parser = argparse.ArgumentParser(description="VC investment memo agent.")
    parser.add_argument("--company", required=True)
    args = parser.parse_args()
    print(asyncio.run(run_memo(args.company)))


if __name__ == "__main__":
    main()
python memo.py --company "Anthropic"
A memo takes about ninety seconds and costs roughly $0.40. With LANGSMITH_TRACING="true", the full run appears in LangSmith with every node’s tool calls. Here is a public trace of one run to explore. A LangSmith trace for one memo run: the four parallel research nodes feed the synthesizer, and the memo output shows its Citations section with primary-source URLs.

Choosing a search provider

Which search provider should back the agent? memo/profiles.py runs the same graph with three swappable client-side search tools (PerplexitySearchResults, ParallelSearchTool, ExaSearchResults), and memo/compare.py scores them in LangSmith so the same metrics apply to each. Two custom evaluators score memo quality, alongside LangSmith’s built-in latency and cost:
  • primary_source_rate: share of citations from primary sources (IR pages, SEC, official press) rather than aggregators.
  • financial_concept_coverage: whether the Financials section covers valuation, revenue, funding, and operating metrics.
The harness is a small package rather than a single file, so it lives in the api-cookbook repository:
git clone https://github.com/ppl-ai/api-cookbook.git
cd api-cookbook/docs/articles/langchain-vc-memo-agent/scripts
pip install -r requirements.txt
python -m memo.compare

Results

Scored across ten public and private companies on openai/gpt-5.5: A LangSmith comparison of the Perplexity, Parallel, and Exa experiments across feedback scores, latency, and cost.
MetricPerplexityParallelExa
Primary-source rate1.000.820.85
Financial-concept coverage0.700.700.50
Latency p50 (s/memo)91192143
Cost (USD/memo)$0.38$0.60$0.67
Perplexity posted a perfect primary-source rate, the fastest memos, the lowest cost per run, and tied for the best financial-concept coverage on this run. Re-score the providers on your own dataset to see how they compare for your use case.

Directory structure

Inside docs/articles/langchain-vc-memo-agent/ in api-cookbook:
scripts/
├── requirements.txt
├── .env.example
└── memo/
    ├── graph.py          # typed state, parallel research nodes, tool-less synthesizer, build_graph()
    ├── main.py           # CLI entrypoint (python -m memo --company "...")
    ├── profiles.py       # the three swappable provider profiles
    ├── evaluators.py     # LangSmith evaluators
    ├── eval_dataset.py   # companies used as eval inputs
    └── compare.py        # runs the LangSmith comparison

Limitations

  • Know where it falls short. The agent is only as strong as the primary sources it can find: solid for well-documented companies, shakier for thinly-covered private startups where little has been published.
  • The section template is just a convention. The seven sections and the PASS / TRACK / ADVANCE / LEAD scale are the format we picked; swap in whatever your team uses.