Talent Sourcer
Some tasks aren’t hard. They’re just big, and that’s what trips up agents. Sourcing every engineer who fits a hiring brief - the right skill, the right city, enough tenure to be worth a call - with each person’s current role and their public links, is a hundred near-identical lookups, not a hard reasoning problem. This example is a small CLI for that job. Give it a role, a skill, a location, and a minimum tenure, and it returns a verified shortlist as an HTML table: name, current role, company, location, years at the company, a relevance score, public links (GitHub, profile, a notable publication), and a verified flag per person. It runs as a single Agent API request built around one tool,sandbox, whose
code calls people_search and web_search from inside the run. You don’t write
the collection logic. The agent writes and runs it.
Why one call isn’t enough
The obvious first try is onepeople_search call on “engineers who work on LLM
inference in NYC”. It returns the obvious dozen names and nothing else: no
long-tail coverage, no check that anyone’s role is current, no confirmation of
location or tenure, no links, no scoring. Wide collection needs many searches,
then per-row verification and bookkeeping the model can’t hold in its head. The
fix isn’t a smarter search. It’s collection discipline, and that’s what
sandbox provides.
The sandbox does the work
sandbox is an
isolated container where the agent writes and runs its own Python inside the
request. You describe the job in plain language, and the model writes the code:
the segment list, the loop, the hard-filter checks, dedup by (name, company),
scoring, the sort, the file write. A loop doesn’t forget candidate #47, and code
only writes rows it actually has, so there’s nothing to hallucinate. The run
ends with a real file, returned via share_file, instead of text you still have
to parse.
The run leans on three Agent API tools, sandbox first and the other two called
from inside its code:
sandbox- code execution in an isolated container. The engine: the agent writes the collection loop and runs it server-side.people_search- a dedicated people-finding tool, not a generic web search. It returns professional details (name, title, company) from public sources, queried the way a recruiter thinks (role, company, seniority, skill, education, location), as structured data the code can dedupe and score.web_search- the verification pass: confirm each candidate’s current role, location, and tenure, and collect their public links - a GitHub profile, a social or professional profile, a notable publication or talk - each with a real URL. Its domain, recency, and date filters let the agent lean on fresh or trusted sources.
sandbox in the request. From inside the run its code reaches
people_search and web_search with no separate declaration, each still billed
per call. That’s what lets the whole loop live in one request.
people_search returns publicly available professional information only.
Keep the task framed that way: recruiting, sourcing, or org mapping over broad
professional criteria, not a private dossier on one named individual.Installation
Keeptalent_sourcer.py and requirements.txt in the same directory.
- Install the dependencies, just the Perplexity Python SDK,
pinned in
requirements.txt:
requirements.txt
- Set your Perplexity API key:
The
sandbox tool is in preview and needs Agent API access. See the
Sandbox docs for
current availability.Usage
--role- the kind of person to source, e.g."engineers"(defaultengineers).--skill- the experience to require, e.g."LLM inference".--location- where the candidate must be based, e.g."NYC"(optional).--min-tenure- minimum years at the current company, e.g.3(0to skip the filter).--target- exactly how many candidates to return, the top N by score (default 25).--output- HTML path (defaultcandidate-shortlist-<time>.html).
talent_sourcer.py in this folder.
A full run takes a few minutes (often 2-5), not seconds. The wait is the
verification: dozens of sequential
people_search and web_search calls are what
buy completeness. Because the run streams, you watch that progress live instead of
staring at a blank terminal.How it works
The whole job is one Agent API request with thesandbox tool, run with
stream=True so events arrive as the work happens:
response.output_text.delta carries the model’s reply
token by token. response.sandbox.results fires once per sandbox execution while
the run is still going, which is the live progress you watch. response.completed
returns the finished response, which we keep for the file download and cost:
verified=false instead of an invented role or URL,
and dedup, scoring, and rendering as code so the accumulation is a program, not a
memory exercise.
Full code
The whole tool is one short file.Full code - talent_sourcer.py
Full code - talent_sourcer.py
Example Output
A real run ofpython talent_sourcer.py --role "engineers" --skill "LLM inference" --location "NYC" --min-tenure 3 --target 25 (results vary with live
coverage):
Because the run streams, each phase line appears as it happens, first the segment
sweep, then the per-candidate verification, so you watch the work instead of
waiting on a blank terminal (progress abridged):
--target, then
keeps the top N by relevance score. Here it swept two segment rounds, verified
about 40 people in all, and returned the best 25.
The shared candidates.html is a styled table, one row per candidate, with name,
title, company, location, years at the company, relevance, public links, and a
verified flag. The candidates are real people surfaced via People Search, each
with source links. They’re sourcing leads for outreach, not endorsements, so
always confirm before reaching out.
On the run above, the agent returned 25 verified candidates for about $1.88.
That covers model tokens over the sandbox loop, one $0.03 sandbox session, and
the people_search / web_search calls billed per invocation. That’s a list a
recruiter would spend half a day assembling, done in minutes. Depth is the dial:
--target, verification breadth, and the model all move the cost.
Limitations
- Cost scales with depth. Each run pays for model tokens, a
$0.03sandbox session, and one billed call perpeople_search/web_searchinvocation. A thorough run is dollars, not cents. sandboxis in preview. Availability, quotas, and pricing may change.- Coverage varies. Output depends on live results. Not every candidate has a
public GitHub or confirmable tenure - the
verifiedflag andLinkscolumn reflect what could actually be sourced. - Keep it professional and wide.
people_searchreturns public professional information, so frame the task as recruiting, sourcing, or org mapping, not a deep dossier on one person.