Chat Completions Endpoint — POST /v1/chat/completions

The chat completions endpoint is the primary way to interact with A2Agent’s models. It accepts the same request format as the OpenAI Chat Completions API, so any code that already works with OpenAI will work here after a single base URL change — no request restructuring required.

Endpoint

POST https://api.a2agent.me/v1/chat/completions

Request Headers

Header	Required	Value
`Authorization`	Yes	`Bearer YOUR_API_KEY`
`Content-Type`	Yes	`application/json`

Request Body Parameters

model

string

required

The ID of the model to use. For example: deepseek-chat, glm-5, kimi-k2.5. See List Models for the full list of available IDs.

messages

array

required

An ordered array of message objects representing the conversation history. Each object must contain:

Show Message object fields

role

string

required

The role of the message author. One of "system", "user", or "assistant".

content

string

required

The text content of the message.

max_tokens

integer

The maximum number of tokens to generate in the response. Defaults vary by model. Setting a lower value reduces cost and latency.

temperature

number

Sampling temperature between 0 and 2. Higher values produce more varied output; lower values produce more deterministic output. Defaults to 1.

stream

boolean

When true, the API streams the response as server-sent events (SSE) rather than returning a single JSON object. Defaults to false. See Streaming for usage details.

top_p

number

Nucleus sampling threshold. The model considers only the tokens comprising the top top_p probability mass. Defaults to 1. Use either temperature or top_p, not both.

Example Request

curl — basic chat completion request

curl https://api.a2agent.me/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-chat",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is 2 + 2?"}
    ],
    "max_tokens": 256
  }'

Example Response

200 OK — successful completion response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1720000000,
  "model": "deepseek-chat",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "2 + 2 equals 4."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 10,
    "total_tokens": 30
  }
}

Streaming

Set "stream": true in your request body to receive the response incrementally as server-sent events. Each event contains a delta with a partial content string. The stream ends with a [DONE] message.

OpenAI SDK — streaming chat completion

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.a2agent.me/v1"
)

stream = client.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": "Tell me a short story."}],
    stream=True
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

Response Fields

string

A unique identifier for this completion, prefixed with chatcmpl-.

object

string

Always "chat.completion" for non-streaming responses.

created

integer

The Unix timestamp (seconds) at which the completion was created.

model

string

The model ID that generated the response, confirming which model handled the request.

choices

array

An array of completion choices. Most requests return a single choice at index 0.

Show Choice object fields

choices[].message.role

string

The role of the response author. Always "assistant" for model-generated messages.

choices[].message.content

string

The generated text produced by the model.

choices[].finish_reason

string

The reason the model stopped generating. Possible values:

stop — the model reached a natural stopping point
length — the max_tokens limit was reached

usage.prompt_tokens

integer

The number of tokens in the input messages.

usage.completion_tokens

integer

The number of tokens in the generated response.

usage.total_tokens

integer

The sum of prompt_tokens and completion_tokens. This is the value used for billing.

​Endpoint

​Request Headers

​Request Body Parameters

​Example Request

​Example Response

​Streaming

​Response Fields

Endpoint

Request Headers

Request Body Parameters

Example Request

Example Response

Streaming

Response Fields