Skip to content

MaaS_GLM

Request Protocol

Https

Request Header

Parameter Name Value
Authorization Bearer
Content-Type multipart/form-data

Request URL

https:// genaiapi-m2.cloudsway.net /v1/ai/{endpoint}/chat/completions

Request Body

Request Parameters

Parameter Type Default Value Description
do_sample Boolean value true Whether to sample the output to increase diversity.
temperature Floating point number (Model Dependence) Controls the randomness of the output, with higher values indicating greater randomness.
top_p Floating point number (Model Dependence) Control diversity through nucleus sampling, recommended to choose one between this and temperature.
max_tokens Integer (Model Dependence) Limit the maximum number of tokens generated per call.
stream Boolean value false Whether to return the response in streaming mode.
thinking Object {"type": "enabled"} Whether to enable chain-of-thought deep thinking is only supported by GLM-4.5 and above.
reasoning_effort String max xhigh high medium low minimal none Controls the inference level of the model, only supported by GLM-5.2 and above.

Parameter Details

do_sample

do_sample is a boolean value (true or false) used to determine whether to sample the model's output.

  • true (default): Randomly sample based on the probability distribution of each token to increase the diversity and creativity of the text. Suitable for scenarios such as content creation and conversation.

  • false: Uses a greedy strategy, always selecting the next token with the highest probability. It has high output determinacy and is suitable for scenarios that require precise and factual answers.

Best Practices:

  • When reproducible and deterministic output is required, set it to false.

  • When you want the model to generate more diverse and interesting content, set it to true, and use it in conjunction with temperature or top_p.

temperature

temperature (temperature) parameter controls the randomness of the model output.

  • Lower values (e.g., 0.2): The probability distribution is more "sharp", and the output is more deterministic and conservative.

  • Higher values (e.g., 0.8): The probability distribution is more "flat", and the output is more random and diverse.

Best Practices:

  • In scenarios that require rigor and factual accuracy (such as knowledge Q\&A), it is recommended to use a lower temperature.

  • In scenarios that require creativity (such as content creation), you can try a higher temperature.

  • It is recommended to use only one of temperature and top_p.

top_p

top_p (nucleus sampling) controls diversity by sampling from the smallest set of tokens whose cumulative probability exceeds a threshold.

  • Lower values (e.g., 0.2): Limit the sampling range and result in more deterministic outputs.

  • Higher values (e.g., 0.9): Expand the sampling range and output more diverse results.

Best Practices:

  • If you want to ensure content quality while achieving a certain degree of diversity, top_p is a good choice (recommended value 0.8-0.95).

  • It is generally not recommended to modify both temperature and top_p simultaneously.

max_tokens

max_tokens is used to limit the maximum number of tokens generated by a single model call. GLM-4.6 supports a maximum output length of 128K, while GLM-4.5 supports a maximum output length of 96K. It is recommended to set it to no less than 1024. Tokens are the basic units of text, and typically 1 token is approximately equal to 0.75 English words or 1.5 Chinese characters. Setting an appropriate max_tokens can control the response length and cost, avoiding overly long outputs.If the model completes its response before reaching the max_tokens limit, it will end naturally; if the limit is reached, the output may be truncated.

  • Function: Prevent the generation of overly long text and control API call costs.

  • Note: max_tokens limits the length of the generated content, excluding the input.

Best Practices:

  • Set max_tokens appropriately based on the application scenario. If a short answer is needed, it can be set to a smaller value (e.g., 50).

Default max_tokens and supported maximum max_tokens for each model:

stream

stream is a boolean value used to control how the API responds.

  • false (default): Returns the complete response all at once, which is simple to implement but has a long waiting time.

  • true: Return content in streaming (SSE) mode, significantly enhancing the experience of real-time interactive applications.

Best Practices:

  • For applications such as chatbots and real-time code generation, it is highly recommended to set it to true.
thinking

thinking parameter is used to control whether the model enables "Chain of Thought" to perform deeper thinking and reasoning.

  • Type: Object

  • Supported models: GLM-4.5 and above

Attributes:

  • type (string):

  • enabled (default): Enable Chain of Thought.GLM-5.2 GLM-5.1 GLM-5 GLM-5-Turbo GLM-5v-Turbo GLM-4.6 GLM-4.6V GLM-4.5 allow the model to automatically determine whether to think,GLM-4.7 GLM-4.5V enforce thinking.

  • disabled: Turn off the thought chain.

Best Practices:

  • It is recommended to enable when the model needs to perform complex reasoning and planning.

  • For simple tasks, it can be turned off to obtain a faster response.

reasoning_effort

reasoning_effort parameter is used to control the reasoning level of the model when "Chain of Thought" is enabled.

  • Type: String

  • Supported models: GLM-5.2 and above

  • Supported: max xhigh high medium low minimal none

  • high: Enhanced Reasoning

  • max: Deep Inference (default)

Note:

  • To maintain compatibility with other protocols, passing in none or minimal models will abandon thinking; passing in low medium will be mapped to high; passing in xhigh will be mapped to max.

Request Example

curl --location 
--request POST 'https://genaiapi-m2.cloudsway.net/v1/ai/{endpoint}/chat/completions' \
--header 'Authorization: Bearer ${YOUR_AK}' \
--header 'Content-Type: application/json' \
    -d '{
        "model": "glm-5.2",
        "messages": [
        {
            "role": "user",
            "content": "As a marketing expert, please create a catchy slogan for my product."
        }
        ],
        "thinking": {
            "type": "enabled"
        },
        "max_tokens": 65536,
        "temperature": 1.0
    }'

Return Example

{
    "id": "chatcmpl-LucZfRaIraogqFUo0ieR6KhB",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Hello! I'm GLM, trained by Z.ai. How can I assist you today? Whether you have questions or just want to chat, I'm happy to help.",
                "reasoning_content": "Let me consider how to respond to this greeting thoughtfully.\n\nThe user has sent a simple \"Hi\" - this is likely the beginning of a conversation. I should respond in a way that's both welcoming and open-ended to encourage further interaction.\n\nI need to introduce myself and indicate my readiness to help. A warm, professional greeting would be appropriate here. I should also invite them to share what's on their mind or what they need assistance with.\n\nSince this is an initial greeting, I'll keep my response concise but friendly, making it clear that I'm here to help with whatever they might need."
            },
            "finish_reason": "stop",
            "native_finish_reason": "stop"
        }
    ],
    "created": 1782453513,
    "model": "MaaS_GLM_5.2_20260617",
    "object": "chat.completion",
    "usage": {
        "prompt_tokens": 13,
        "completion_tokens": 158,
        "total_tokens": 171,
        "completion_tokens_details": {
            "accepted_prediction_tokens": 0,
            "audio_tokens": 0,
            "image_tokens": 0,
            "reasoning_tokens": 121,
            "rejected_prediction_tokens": 0
        },
        "prompt_tokens_details": {
            "audio_tokens": 0,
            "cached_tokens": 0,
            "image_tokens": 0
        }
    }
}

Unified Domain Name Call

curl --location --request POST 'https://genaiapi-m2.cloudsway.net/v1/chat/completions' \
--header 'Authorization: Bearer ${YOUR_AK}' \
--header 'Content-Type: application/json' \
--data-raw '{
    "model": "MaaS_GLM_5.2_20260617",
    "messages": [
        {
            "role": "user",
            "content": "Hi"
        }
    ]
}'