Endpoint
Request Headers
| Header | Required | Value |
|---|---|---|
Authorization | Yes | Bearer YOUR_API_KEY |
Content-Type | Yes | application/json |
Request Body Parameters
The ID of the model to use. For example:
deepseek-chat, glm-5, kimi-k2.5. See List Models for the full list of available IDs.An ordered array of message objects representing the conversation history. Each object must contain:
The maximum number of tokens to generate in the response. Defaults vary by model. Setting a lower value reduces cost and latency.
Sampling temperature between
0 and 2. Higher values produce more varied output; lower values produce more deterministic output. Defaults to 1.When
true, the API streams the response as server-sent events (SSE) rather than returning a single JSON object. Defaults to false. See Streaming for usage details.Nucleus sampling threshold. The model considers only the tokens comprising the top
top_p probability mass. Defaults to 1. Use either temperature or top_p, not both.Example Request
curl — basic chat completion request
Example Response
200 OK — successful completion response
Streaming
Set"stream": true in your request body to receive the response incrementally as server-sent events. Each event contains a delta with a partial content string. The stream ends with a [DONE] message.
OpenAI SDK — streaming chat completion
Response Fields
A unique identifier for this completion, prefixed with
chatcmpl-.Always
"chat.completion" for non-streaming responses.The Unix timestamp (seconds) at which the completion was created.
The model ID that generated the response, confirming which model handled the request.
An array of completion choices. Most requests return a single choice at index
0.The number of tokens in the input messages.
The number of tokens in the generated response.
The sum of
prompt_tokens and completion_tokens. This is the value used for billing.