Contextualized embeddings generate embeddings for document chunks that share context awareness. Unlike standard embeddings where each text is embedded independently, contextualized embeddings understand that chunks belong to the same document and incorporate that relationship.
Use contextualized embeddings when embedding chunks from the same document (e.g., paragraphs, sections). Use standard embeddings for independent texts like search queries or standalone sentences.
Pass documents as nested arrays where each inner array represents chunks from a single document:
Chunk ordering: Chunks within each document must be sent in the order they appear in the source document. The model uses sequential context to generate document-aware embeddings, so maintaining the original order is essential for optimal results.
Copy
Ask AI
from perplexity import Perplexityclient = Perplexity()response = client.contextualized_embeddings.create( input=[ # Document 1: Three chunks [ "Curiosity begins in childhood with endless questions about the world.", "As we grow, curiosity drives us to explore new ideas and challenge assumptions.", "Scientific breakthroughs often start with a simple curious question." ], # Document 2: Two chunks [ "The Curiosity rover explores Mars, searching for signs of ancient life.", "Each discovery on Mars sparks new questions about our place in the universe." ] ], model="pplx-embed-context-v1-4b")for doc in response.data: for chunk in doc.data: print(f"Doc {doc.index}, Chunk {chunk.index}: {chunk.embedding}")
Nested array: each inner array contains chunks from one document. Max 512 documents, 16,000 total chunks.
model
string
Yes
-
Model identifier: pplx-embed-context-v1-0.6b or pplx-embed-context-v1-4b
dimensions
integer
No
Full
Matryoshka dimension (128-1024 for 0.6b, 128-2560 for 4b)
encoding_format
string
No
base64_int8
Output encoding: base64_int8 (signed int8) or base64_binary (packed bits)
Input limits: Total tokens per document must not exceed 32K. Total chunks across all documents must not exceed 16,000. All chunks in a single request must not exceed 120,000 tokens combined. Empty strings are not allowed.
Rule of thumb: If chunks come from the same source document and their meaning depends on surrounding context, use contextualized embeddings. If each text stands alone, use standard embeddings. When using contextualized embeddings, embed queries with the same contextualized model by wrapping each query as a single-element inner list (e.g., [[query]]).