The Agent API supports image analysis through direct image uploads. Images can be provided either as base64 encoded strings within a data URI or as standard HTTPS URLs.
When using base64 encoding, the API currently only supports images up to 50 MB per image.
Supported formats for base64 encoded images: PNG (image/png), JPEG (image/jpeg), WEBP (image/webp), and GIF (image/gif).
When using an HTTPS URL, the model will attempt to fetch the image from the provided URL. Ensure the URL is publicly accessible.
Use this method when you have the image file locally and want to embed it directly into the request payload. Remember the 50MB size limit and supported formats (PNG, JPEG, WEBP, GIF).
import base64from perplexity import Perplexityclient = Perplexity()# Read and encode image as base64def encode_image(image_path): with open(image_path, "rb") as image_file: return base64.b64encode(image_file.read()).decode("utf-8")image_path = "image.png"base64_image = encode_image(image_path)# Analyze the imageresponse = client.responses.create( model="openai/gpt-5.4", input=[ { "role": "user", "content": [ {"type": "input_text", "text": "what's in this image?"}, { "type": "input_image", "image_url": f"data:image/png;base64,{base64_image}", }, ], } ],)print(response.output_text)
Use this method when you have a publicly accessible image URL. The model will fetch the image from the provided URL.
from perplexity import Perplexityclient = Perplexity()image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"# Analyze the imageresponse = client.responses.create( model="openai/gpt-5.4", input=[ { "role": "user", "content": [ {"type": "input_text", "text": "Can you describe the image at this URL?"}, { "type": "input_image", "image_url": image_url, }, ], } ],)print(response.output_text)
Images are tokenized based on their pixel dimensions using the following formula:
tokens = (width px × height px) / 750
Examples:
A 1024×768 image would consume: (1024 × 768) / 750 = 1,048 tokens
A 512×512 image would consume: (512 × 512) / 750 = 349 tokens
These image tokens are then priced according to the input token pricing of the model you’re using. The image tokens are added to your total token count for the request alongside any text tokens.