The Perplexity SDKs provide several features to optimize performance for high-throughput applications. This guide covers async operations, connection pooling, raw response access, and other performance optimization techniques.
Process large datasets efficiently with streaming and pagination:
import asynciofrom perplexity import AsyncPerplexity, DefaultAioHttpClientasync def process_large_dataset(queries, process_fn): """Process queries in batches to manage memory usage""" async with AsyncPerplexity( http_client=DefaultAioHttpClient() ) as client: async def process_single(query): try: result = await client.search.create(query=query) # Process immediately to avoid storing in memory processed = process_fn(result) # Clear the original result from memory del result return processed except Exception as e: return f"Error processing {query}: {e}" # Process in small batches batch_size = 5 for i in range(0, len(queries), batch_size): batch = queries[i:i + batch_size] # Process batch tasks = [process_single(query) for query in batch] batch_results = await asyncio.gather(*tasks) # Yield results instead of accumulating for result in batch_results: yield result # Optional: Small delay to prevent overwhelming the API await asyncio.sleep(0.1)# Usageasync def summarize_result(search_result): """Process function that extracts only what we need""" return { "query": search_result.query, "result_count": len(search_result.results), "top_title": search_result.results[0].title if search_result.results else None }async def main(): queries = [f"query {i}" for i in range(100)] async for processed_result in process_large_dataset(queries, summarize_result): print(processed_result)asyncio.run(main())
Always use async clients when you need to process multiple requests simultaneously.
For CPU-bound processing after API calls, consider using worker threads or processes.
2
Implement connection pooling
Configure appropriate connection limits based on your application’s needs.
# Good: Optimized for your use caselimits = httpx.Limits( max_keepalive_connections=20, # Based on expected concurrency max_connections=50, keepalive_expiry=30.0)
3
Monitor and tune performance
Use metrics to identify bottlenecks and optimize accordingly.
Don’t optimize prematurely - measure first, then optimize based on actual performance data.
4
Handle backpressure
Implement proper rate limiting and backpressure handling for high-throughput applications.
# Use semaphores to limit concurrent requestssemaphore = asyncio.Semaphore(10) # Max 10 concurrent requestsasync def rate_limited_request(client, query): async with semaphore: return await client.search.create(query=query)