Skip to main content
The chat completions endpoint is the primary way to interact with A2Agent’s models. It accepts the same request format as the OpenAI Chat Completions API, so any code that already works with OpenAI will work here after a single base URL change — no request restructuring required.

Endpoint

POST https://api.a2agent.me/v1/chat/completions

Request Headers

HeaderRequiredValue
AuthorizationYesBearer YOUR_API_KEY
Content-TypeYesapplication/json

Request Body Parameters

model
string
required
The ID of the model to use. For example: deepseek-chat, glm-5, kimi-k2.5. See List Models for the full list of available IDs.
messages
array
required
An ordered array of message objects representing the conversation history. Each object must contain:
max_tokens
integer
The maximum number of tokens to generate in the response. Defaults vary by model. Setting a lower value reduces cost and latency.
temperature
number
Sampling temperature between 0 and 2. Higher values produce more varied output; lower values produce more deterministic output. Defaults to 1.
stream
boolean
When true, the API streams the response as server-sent events (SSE) rather than returning a single JSON object. Defaults to false. See Streaming for usage details.
top_p
number
Nucleus sampling threshold. The model considers only the tokens comprising the top top_p probability mass. Defaults to 1. Use either temperature or top_p, not both.

Example Request

curl — basic chat completion request
curl https://api.a2agent.me/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-chat",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is 2 + 2?"}
    ],
    "max_tokens": 256
  }'

Example Response

200 OK — successful completion response
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1720000000,
  "model": "deepseek-chat",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "2 + 2 equals 4."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 10,
    "total_tokens": 30
  }
}

Streaming

Set "stream": true in your request body to receive the response incrementally as server-sent events. Each event contains a delta with a partial content string. The stream ends with a [DONE] message.
OpenAI SDK — streaming chat completion
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.a2agent.me/v1"
)

stream = client.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": "Tell me a short story."}],
    stream=True
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

Response Fields

id
string
A unique identifier for this completion, prefixed with chatcmpl-.
object
string
Always "chat.completion" for non-streaming responses.
created
integer
The Unix timestamp (seconds) at which the completion was created.
model
string
The model ID that generated the response, confirming which model handled the request.
choices
array
An array of completion choices. Most requests return a single choice at index 0.
usage.prompt_tokens
integer
The number of tokens in the input messages.
usage.completion_tokens
integer
The number of tokens in the generated response.
usage.total_tokens
integer
The sum of prompt_tokens and completion_tokens. This is the value used for billing.