Streaming
Streaming delivers the response incrementally as Server-Sent Events instead of one final payload — the right default for chat UIs, long answers, and anything interactive. Setstream: true and iterate the events, discriminating on each event’s type.
type. The ones you’ll handle most:
Event type | Meaning |
|---|---|
response.created | The initial response object |
response.output_text.delta | A chunk of streaming text — append delta |
response.output_text.done | The text item is complete |
response.reasoning.search_queries / response.reasoning.search_results | Search activity during the run |
response.sandbox.results | A sandbox invocation finished |
response.completed | Terminal success — final response and usage |
response.failed / error | Terminal failure |
stream_options: {"include_usage": true} to ensure token usage rides along on the final response.completed event. For the complete event catalog and per-event payloads, see the Agent API reference.
Within a run, search activity is reported before the text that uses it: the response.reasoning.search_queries and response.reasoning.search_results events arrive ahead of the response.output_text.delta events.
Background runs
Streaming keeps a connection open for the lifetime of the run. For runs that take minutes — deep research, heavy sandbox work — submit withbackground: true, then poll for the result by ID. The run continues server-side even if your client disconnects.
Reconnecting to a durable stream
Background runs are durable, so you can also stream one live and reconnect after a drop. RequestGET /v1/responses/{id}?stream=true&starting_after=N to resume from the event after sequence number N. Reconnect is only valid within the response’s reconnect window; once that window expires, the endpoint returns 400, and you fall back to a plain GET /v1/responses/{id} for the final snapshot. See the Agent API reference.
Control response length
Two levers shape how much the model writes:max_output_tokens— a hard cap on generated tokens. The run stops when it’s hit. When generation is cut short this way, the response carries anincomplete_detailsobject whosereasonexplains why. Use it to bound cost and latency.text.verbosity—low,medium, orhigh, for OpenAI models that support it. A soft preference for terse vs. expansive answers, without a hard cutoff.
Structured output
When another system consumes the result, free-form prose is hard to work with — you end up writing brittle parsers. Structured output makes the model return JSON that conforms to a schema you define, so you can deserialize it directly. Setresponse_format to a json_schema:
name is required (1–64 characters; letters, numbers, underscores, and dashes). schema is a valid JSON Schema object. The response text will conform to the schema unless the output is cut off by max_output_tokens.
Next steps
Keep context
Carry state across multiple turns.
Output Control reference
Every stream event, error handling, and full structured-output examples.