MaaS_GLM

请求协议

Https

参数名	值
Authorization	Bearer
Content-Type	multipart/form-data

请求URL

https://genaiapi-m2.cloudsway.net/v1/ai/{endpoint}/chat/completions

请求Body

请求参数

参数	类型	默认值	描述
do_sample	布尔值	`true`	是否对输出进行采样，以增加多样性。
temperature	浮点数	(依赖模型)	控制输出的随机性，值越高越随机。
top_p	浮点数	(依赖模型)	通过核采样控制多样性，建议与 `temperature` 二选一。
max_tokens	整数	(依赖模型)	限制单次调用生成的最大 token 数。
stream	布尔值	`false`	是否以流式方式返回响应。
thinking	对象	`{"type": "enabled"}`	是否开启思维链深度思考，仅 `GLM-4.5` 及以上支持。
reasoning_effort	字符串	`max` `xhigh` `high` `medium` `low` `minimal` `none`	控制模型的推理程度，仅 `GLM-5.2` 及以上支持。

参数详解

do_sample

do_sample 是一个布尔值（true 或 false），用于决定是否对模型的输出进行采样。

true (默认值): 根据每个 token 的概率分布进行随机采样，增加文本的多样性和创造性。适用于内容创作、对话等场景。
false: 采用贪心策略，总是选择概率最高的下一个 token。输出确定性高，适用于需要精确、事实性回答的场景。

最佳实践:

需要可复现、确定性的输出时，设为 false。
希望模型生成更多样、更有趣的内容时，设为 true，并配合 temperature 或 top_p 使用。

temperature

temperature（温度）参数控制着模型输出的随机性。

较低的值 (如 0.2): 概率分布更“尖锐”，输出更具确定性、更保守。
较高的值 (如 0.8): 概率分布更“平缓”，输出更具随机性和多样性。

最佳实践:

在需要严谨、事实准确的场景（如知识问答），建议使用较低的 temperature。
在需要创意的场景（如内容创作），可以尝试较高的 temperature。
建议 temperature 和 top_p 只使用其中一个。

top_p

top_p（核采样）通过从累积概率超过阈值的最小 token 集合中进行采样来控制多样性。

较低的值 (如 0.2): 限制采样范围，输出更具确定性。
较高的值 (如 0.9): 扩大采样范围，输出更具多样性。

最佳实践:

如果希望在保证内容质量的同时获得一定的多样性，top_p 是一个很好的选择（推荐值 0.8-0.95）。
通常不建议同时修改 temperature 和 top_p。

max_tokens

max_tokens 用于限制模型单次调用生成的最大 token 数量。GLM-4.6 最大支持 128K 输出长度，GLM-4.5 最大支持 96K 输出长度，建议设置不小于 1024。令牌是文本的基本单位，通常 1 个令牌约等于 0.75 个英文单词或 1.5 个中文字符。设置合适的 max_tokens 可以控制响应长度和成本，避免过长的输出。如果模型在达到 max_tokens 限制前完成回答，会自然结束；如果达到限制，输出可能被截断。

作用: 防止生成过长文本，控制 API 调用成本。
注意: max_tokens 限制的是生成内容的长度，不包括输入。

最佳实践:

根据应用场景合理设置 max_tokens。如果需要简短回答，可设为较小的值（如 50）。

各模型的默认 max_tokens 和支持的最大 max_tokens:

stream

stream 是一个布尔值，用于控制 API 的响应方式。

false (默认值): 一次性返回完整的响应，实现简单但等待时间长。
true: 以流式（SSE）方式返回内容，显著提升实时交互应用的体验。

最佳实践:

对于聊天机器人、实时代码生成等应用，强烈建议设为 true。

thinking

thinking 参数用于控制模型是否开启“思维链”（Chain of Thought），以进行更深度的思考和推理。

类型: 对象
支持模型: GLM-4.5 及以上

属性:

type (string):
enabled (默认): 开启思维链。GLM-5.2 GLM-5.1 GLM-5 GLM-5-Turbo GLM-5v-Turbo GLM-4.6 GLM-4.6V GLM-4.5 为模型自动判断是否思考，GLM-4.7 GLM-4.5V 为强制思考。
disabled: 关闭思维链。

最佳实践:

在需要模型进行复杂推理、规划时，建议开启。
对于简单任务，可关闭以获得更快响应。

reasoning_effort

reasoning_effort 参数用于控制模型在开启“思维链”下的推理程度。

类型: 字符串
支持模型: GLM-5.2 及以上
参数支持: max xhigh high medium low minimal none
high: 增强推理
max: 深度推理（默认值）

注意:

为保持和其他协议兼容，传入 none 或 minimal 模型会放弃思考；传入 low medium 将映射为 high；传入 xhigh 将映射为 max”。

请求示例

curl --location 
--request POST 'https://genaiapi-m2.cloudsway.net/v1/ai/{endpoint}/chat/completions' \
--header 'Authorization: Bearer ${YOUR_AK}' \
--header 'Content-Type: application/json' \
    -d '{
        "model": "glm-5.2",
        "messages": [
        {
            "role": "user",
            "content": "As a marketing expert, please create a catchy slogan for my product."
        }
        ],
        "thinking": {
            "type": "enabled"
        },
        "max_tokens": 65536,
        "temperature": 1.0
    }'

返回示例

{
    "id": "chatcmpl-LucZfRaIraogqFUo0ieR6KhB",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Hello! I'm GLM, trained by Z.ai. How can I assist you today? Whether you have questions or just want to chat, I'm happy to help.",
                "reasoning_content": "Let me consider how to respond to this greeting thoughtfully.\n\nThe user has sent a simple \"Hi\" - this is likely the beginning of a conversation. I should respond in a way that's both welcoming and open-ended to encourage further interaction.\n\nI need to introduce myself and indicate my readiness to help. A warm, professional greeting would be appropriate here. I should also invite them to share what's on their mind or what they need assistance with.\n\nSince this is an initial greeting, I'll keep my response concise but friendly, making it clear that I'm here to help with whatever they might need."
            },
            "finish_reason": "stop",
            "native_finish_reason": "stop"
        }
    ],
    "created": 1782453513,
    "model": "MaaS_GLM_5.2_20260617",
    "object": "chat.completion",
    "usage": {
        "prompt_tokens": 13,
        "completion_tokens": 158,
        "total_tokens": 171,
        "completion_tokens_details": {
            "accepted_prediction_tokens": 0,
            "audio_tokens": 0,
            "image_tokens": 0,
            "reasoning_tokens": 121,
            "rejected_prediction_tokens": 0
        },
        "prompt_tokens_details": {
            "audio_tokens": 0,
            "cached_tokens": 0,
            "image_tokens": 0
        }
    }
}

统一域名调用

curl --location --request POST 'https://genaiapi-m2.cloudsway.net/v1/chat/completions' \
--header 'Authorization: Bearer ${YOUR_AK}' \
--header 'Content-Type: application/json' \
--data-raw '{
    "model": "MaaS_GLM_5.2_20260617",
    "messages": [
        {
            "role": "user",
            "content": "Hi"
        }
    ]
}'

MaaS_GLM

请求协议

请求Header

请求URL

请求Body

请求参数

参数详解

do_sample

temperature

top_p

max_tokens

stream

thinking

reasoning_effort

请求示例

返回示例

统一域名调用