Published on
- 3 min read
OpenAI Compatible API Chat Completion Message Structure
Introduction
Instead of treating AI model APIs as black boxes, it’s essential to understand their internal structure for better utilization. Let’s approach large language model APIs as we would any other API. Understanding the parameters’ meanings will give us a better grasp of the AI model’s capabilities.
For more information about OpenAI’s API, you can refer to: OpenAI’s Complete Interface Standard Definition Document
Why Study OpenAI’s Interface? The Reasons are Simple:
- OpenAI is an industry pioneer, and their interface design has become a standard
- Many language models in the market are now compatible with OpenAI’s interface, making it a universal key
- Understanding each parameter helps better control AI model behavior
Chat Completion Interface: The Most Commonly Used Dialogue Interface
OpenAI’s complete message Schema definition can be found in the link above. Since there’s a lot of content, let’s focus on some important aspects.
Here’s the Body message structure for the /chat/completions
interface:
Field Name | Type | Required | Default | Description |
---|---|---|---|---|
model | string | Yes | - | Model name, available values in specific model documentation |
messages | object[] | Yes | - | Dialogue message list, including role (user /assistant /system /tools ) and content |
stream | boolean | No | false | Enable streaming return, tokens returned as Server-Sent Events when enabled |
max_tokens | integer | No | 512 | Maximum number of tokens to generate, range: 1 < x < 8192 |
stop | string[]/null | No | ["null"] | Sequences to stop generation (max 4), returned text excludes these sequences |
temperature | number | No | 0.7 | Controls output randomness, higher values increase randomness (typically 0-2) |
top_p | number | No | 0.7 | Nucleus sampling parameter, dynamically adjusts token selection range |
top_k | number | No | 50 | Number of top-k tokens to consider during sampling |
frequency_penalty | number | No | 0.5 | Frequency penalty, suppresses repeated token generation |
n | integer | No | 1 | Number of completions to generate |
response_format | object | No | {"type": "text"} | Output format object |
tools | object[] | No | - | Tool call list (function calls), includes type: "function" and function metadata |
Example Request Body
{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "What opportunities and challenges will 2025 bring?"
}
],
"stream": false,
"max_tokens": 512,
"temperature": 0.7,
"tools": [
{
"type": "function",
"function": {
"name": "analyze_industry_trend",
"description": "Analyze AI industry trends",
"parameters": {
"year": 2025,
"region": "Global"
}
}
}
]
}
Core Parameter Analysis
- Choose Your AI Partner (model)
"model": "gpt-3.5-turbo"
This is like choosing different levels of teachers - some excel at creative writing, others at code analysis. Different models have varying capabilities and prices, choose based on your needs.
- Conversation History (messages)
This is your chat history with AI, including several roles:
- user: The questioner
- assistant: The AI responding
- system: Sets rules for the AI
- tools: Results returned after AI uses tools
- Control AI’s Creative Freedom
-
temperature (Range 0-2)
- Set to 0: AI becomes conservative, answers are very certain
- Set to 1: AI shows appropriate creativity
- Set to 2: AI becomes highly imaginative
-
max_tokens (Word limit):
- Think of it as setting a word count limit for AI
- One English word is typically 1-2 tokens
- Setting appropriate values avoids waste and overages
- Make Conversations More Fluid (stream)
"stream": true
Enabling this option makes AI responses appear character by character, like human typing, instead of all at once, creating a better experience.
- Avoid Repetition (frequency_penalty and presence_penalty)
-
frequency_penalty:
- Positive (0.1 to 2.0): Discourages AI from using repeated words
- Negative (-2.0 to -0.1): Encourages word repetition
- 0: Neutral, no intervention
-
presence_penalty:
- Positive: Encourages new topics
- Negative: Keeps AI focused on current topic
- 0: Natural transition
- Sampling Control (top_p and top_k)
-
top_p (Nucleus sampling):
- Range 0-1, default 0.7
- Lower values make AI more conservative
- Higher values increase response diversity
- Avoid adjusting alongside temperature
-
top_k (Top-K sampling):
- Default value 50
- Controls number of candidate words considered
- Lower values make responses more conservative
- Output Diversity (n)
"n": 3
- Makes AI provide multiple different answers at once
- Default value is 1
- Higher values increase API costs
- Best used with high temperature
- Format Control (response_format)
"response_format": {"type": "json_object"}
- Controls AI response format
- text: Plain text (default)
- json_object: JSON format
- Stop Sequences (stop)
"stop": ["end", "complete"]
- Sets specific words as response termination markers
- Maximum of 4 stop sequences
- AI stops generating when encountering these words
- Tool Calls (tools)
The tools parameter allows AI to call external tools for specific tasks. It’s like equipping AI with a toolbox that it can use when needed.
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather information for specified city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name"
},
"date": {
"type": "string",
"description": "Query date"
}
}
}
}
}
]