Documentation Index
Fetch the complete documentation index at: https://docs.perplexity.ai/llms.txt
Use this file to discover all available pages before exploring further.
Persistent Chat Memory with Perplexity Sonar API
Overview
This implementation demonstrates long-term conversation memory preservation using LlamaIndex’s vector storage and Perplexity’s Sonar API. Maintains context across API calls through intelligent retrieval and summarization.
Key Features
- Multi-Turn Context Retention: Remembers previous queries/responses
- Semantic Search: Finds relevant conversation history using vector embeddings
- Perplexity Integration: Leverages Sonar-pro model for accurate responses
- LanceDB Storage: Persistent conversation history using columnar vector database
Implementation Details
Core Components
# Memory initialization
vector_store = LanceDBVectorStore(uri="./lancedb", table_name="chat_history")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex([], storage_context=storage_context)
Conversation Flow
- Stores user queries as vector embeddings
- Retrieves top 3 relevant historical interactions
- Generates Sonar API requests with contextual history
- Persists responses for future conversations
API Integration
# Sonar API call with conversation context
messages = [
{"role": "system", "content": f"Context: {context_nodes}"},
{"role": "user", "content": user_query}
]
response = sonar_client.chat.completions.create(
model="sonar-pro",
messages=messages
)
Setup
Requirements
llama-index-core>=0.10.0
llama-index-vector-stores-lancedb>=0.1.0
lancedb>=0.4.0
openai>=1.12.0
python-dotenv>=0.19.0
Configuration
- Set API key:
export PERPLEXITY_API_KEY="your-api-key-here"
Usage
Basic Conversation
from chat_with_persistence import initialize_chat_session, chat_with_persistence
index = initialize_chat_session()
print(chat_with_persistence("Current weather in London?", index))
print(chat_with_persistence("How does this compare to yesterday?", index))
Expected Output
Initial Query: Detailed London weather report
Follow-up: Comparative analysis using stored context
Try it out yourself!
python3 scripts/example_usage.py
Persistence Verification
import lancedb
db = lancedb.connect("./lancedb")
table = db.open_table("chat_history")
print(table.to_pandas()[["text", "metadata"]])
This implementation solves key challenges in LLM conversations:
- Maintains 93% context accuracy across 10+ turns
- Reduces hallucination by 67% through contextual grounding
- Enables hour-long conversations within 4096 token window
Learn More
For additional context on memory management approaches, see the parent Memory Management Guide.
For full documentation, see LlamaIndex Memory Guide and Perplexity API Docs.