Skip to main content

Overview

Use standard embeddings for independent text embedding (queries, documents, and semantic search) where each text is self-contained.

Models

ModelDimensionsContextMRLQuantizationPrice ($/1M tokens)
pplx-embed-v1-0.6b102432KYesINT8/BINARY$0.004
pplx-embed-v1-4b256032KYesINT8/BINARY$0.03

Basic Usage

Generate embeddings for a list of texts:
from perplexity import Perplexity

client = Perplexity()

response = client.embeddings.create(
    input=[
        "Scientists explore the universe driven by curiosity.",
        "Curiosity compels us to seek explanations, not just observations.",
        "Historical discoveries began with curious questions.",
        "The pursuit of knowledge distinguishes human curiosity from mere stimulus response.",
        "Philosophy examines the nature of curiosity."
    ],
    model="pplx-embed-v1-4b"
)

for emb in response.data:
    print(f"Index {emb.index}: {emb.embedding}")
{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": "/* base64-encoded signed int8 values */"
    },
    {
      "object": "embedding",
      "index": 1,
      "embedding": "/* base64-encoded signed int8 values */"
    },
    {
      "object": "embedding",
      "index": 2,
      "embedding": "/* base64-encoded signed int8 values */"
    },
    {
      "object": "embedding",
      "index": 3,
      "embedding": "/* base64-encoded signed int8 values */"
    },
    {
      "object": "embedding",
      "index": 4,
      "embedding": "/* base64-encoded signed int8 values */"
    }
  ],
  "model": "pplx-embed-v1-4b",
  "usage": {
    "prompt_tokens": 42,
    "total_tokens": 42,
    "cost": {
      "input_cost": 0.0000013,
      "total_cost": 0.0000013,
      "currency": "USD"
    }
  }
}

Semantic Search Example

Build a simple semantic search system:
import base64
import numpy as np
from perplexity import Perplexity

client = Perplexity()

def decode_embedding(b64_string):
    """Decode a base64-encoded int8 embedding."""
    return np.frombuffer(base64.b64decode(b64_string), dtype=np.int8).astype(np.float32)

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# 1. Embed your documents
documents = [
    "Python is a versatile programming language",
    "Machine learning automates analytical model building",
    "The Eiffel Tower is located in Paris, France"
]

doc_response = client.embeddings.create(input=documents, model="pplx-embed-v1-4b")
doc_embeddings = [decode_embedding(emb.embedding) for emb in doc_response.data]

# 2. Embed a search query
query = "What programming languages are good for data science?"
query_response = client.embeddings.create(input=[query], model="pplx-embed-v1-4b")
query_embedding = decode_embedding(query_response.data[0].embedding)

# 3. Find most similar documents
scores = [
    (i, cosine_similarity(query_embedding, doc_emb))
    for i, doc_emb in enumerate(doc_embeddings)
]
ranked = sorted(scores, key=lambda x: x[1], reverse=True)

print("Search results:")
for idx, score in ranked:
    print(f"  {score:.4f}: {documents[idx]}")

Parameters

ParameterTypeRequiredDefaultDescription
inputstring | array[string]Yes-Text(s) to embed. Max 512 texts per request. Each input must not exceed 32K tokens. Total tokens must not exceed 120,000. Empty strings are not allowed.
modelstringYes-Model identifier: pplx-embed-v1-0.6b or pplx-embed-v1-4b
dimensionsintegerNoFullMatryoshka dimension (128-1024 for 0.6b, 128-2560 for 4b)
encoding_formatstringNobase64_int8Output encoding: base64_int8 (signed int8) or base64_binary (packed bits)
Input limits: Each text must not exceed 32K tokens. Requests exceeding this limit will be rejected. All inputs in a single request must not exceed 120,000 tokens combined.