MaaS_GLM

Request Protocol

Https

Request Header

Parameter Name	Value
Authorization	Bearer
Content-Type	multipart/form-data

Request Body

Request Parameters

Parameter	Type	Default Value	Description
do_sample	Boolean value	`true`	Whether to sample the output to increase diversity.
temperature	Float	(Model Dependence)	Controls the randomness of the output, with higher values indicating greater randomness.
top_p	Float	(Model Dependence)	Control diversity through nucleus sampling, and it is recommended to choose one between this and `temperature`.
max_tokens	Integer	(Model Dependence)	Limit the maximum number of tokens generated per call.
stream	Boolean value	`false`	Whether to return the response in streaming mode.
thinking	Object	`{"type": "enabled"}`	Whether to enable chain-of-thought deep thinking is only supported by `GLM-4.5` and above.
reasoning_effort	String	`max` `xhigh` `high` `medium` `low` `minimal` `none`	Controls the inference level of the model, only `GLM-5.2` and above are supported.

Parameter Details

do_sample

do_sample is a boolean value (true or false) used to determine whether to sample the model's output.

true (default): Randomly sample based on the probability distribution of each token to increase the diversity and creativity of the text. Suitable for scenarios such as content creation and conversation.
false: Uses a greedy strategy, always selecting the next token with the highest probability. It has high output determinacy and is suitable for scenarios requiring precise and factual answers.

Best Practices:

When reproducible and deterministic output is required, set it to false.
When you want the model to generate more diverse and interesting content, set it to true, and use it in conjunction with temperature or top_p.

temperature

temperature (temperature) parameter controls the randomness of the model output.

Lower values (e.g., 0.2): The probability distribution is more "sharp", and the output is more deterministic and conservative.
Higher values (e.g., 0.8): The probability distribution is more "flat", and the output is more random and diverse.

Best Practices:

In scenarios that require rigor and factual accuracy (such as knowledge Q\&A), it is recommended to use a lower temperature.
In scenarios that require creativity (such as content creation), you can try a higher temperature.
It is recommended to use only one of temperature and top_p.

top_p

top_p (nucleus sampling) controls diversity by sampling from the smallest set of tokens whose cumulative probability exceeds a threshold.

Lower values (e.g., 0.2): Limit the sampling range and result in more deterministic outputs.
Higher values (e.g., 0.9): Expand the sampling range and output more diverse results.

Best Practices:

If you want to ensure content quality while achieving a certain degree of diversity, top_p is a good choice (recommended value 0.8-0.95).
It is generally not recommended to modify both temperature and top_p simultaneously.

max_tokens

max_tokens is used to limit the maximum number of tokens generated in a single model call. GLM-4.6 supports a maximum output length of 128K, GLM-4.5 supports a maximum output length of 96K, and it is recommended to set it to no less than 1024. Tokens are the basic units of text, and typically 1 token is approximately equal to 0.75 English words or 1.5 Chinese characters. Setting an appropriate max_tokens can control the response length and cost, avoiding overly long outputs.If the model completes its response before reaching the max_tokens limit, it will end naturally; if the limit is reached, the output may be truncated.

Function: Prevents the generation of overly long text and controls API call costs.
Note: max_tokens limits the length of the generated content, excluding the input.

Best Practices:

Set max_tokens appropriately based on the application scenario. If a short answer is required, it can be set to a smaller value (e.g., 50).

Default max_tokens and supported maximum max_tokens for each model:

stream

stream is a boolean value used to control how the API responds.

false (default): Returns the complete response all at once, which is simple to implement but has a long waiting time.
true: Returns content in streaming (SSE) mode, significantly enhancing the experience of real-time interactive applications.

Best Practices:

For applications such as chatbots and real-time code generation, it is strongly recommended to set it to true.

thinking

thinking parameter is used to control whether the model enables "Chain of Thought" for deeper thinking and reasoning.

Type: Object
Supported models: GLM-4.5 and above

Attributes:

type (string):
enabled (default): Enable the thought chain.GLM-5.2 GLM-5.1 GLM-5 GLM-5-Turbo GLM-5v-Turbo GLM-4.6 GLM-4.6V GLM-4.5 are models that automatically determine whether to think,GLM-4.7 GLM-4.5V are models that enforce thinking.
disabled: Turn off the thought chain.

Best Practices:

It is recommended to enable when the model needs to perform complex reasoning and planning.
For simple tasks, it can be turned off to obtain a faster response.

reasoning_effort

reasoning_effort parameter is used to control the reasoning level of the model when "Chain of Thought" is enabled.

Type: String
Supported models: GLM-5.2 and above
max xhigh high medium low minimal none
high: Enhanced Reasoning
max: Deep Reasoning (default)

Note:

To maintain compatibility with other protocols, passing in none or minimal models will result in no thinking; passing in low medium will be mapped to high; passing in xhigh will be mapped to max.

Request Example

/chat/completions

curl --location 
--request POST 'https://genaiapi.cloudsway.net/v1/ai/{endpoint}/chat/completions' \
--header 'Authorization: Bearer ${YOUR_AK}' \
--header 'Content-Type: application/json' \
    -d '{
        "model": "glm-5.2",
        "messages": [
        {
            "role": "user",
            "content": "As a marketing expert, please create a catchy slogan for my product."
        }
        ],
        "thinking": {
            "type": "enabled"
        },
        "max_tokens": 65536,
        "temperature": 1.0
    }'

/v1/messages

curl --request POST \
  --url https://genaiapi.cloudsway.net/{endpoint}/v1/messages \
  --header 'Authorization: Bearer ${YOUR_AK}' \
  --header 'Content-Type: application/json' \
  --data '{
        "max_tokens": 5000,
         "messages": [
        {
            "role": "user",
            "content": "Analyze the major themes in Pride and Prejudice."
        }
    ]
}'

Return Example

{
    "id": "chatcmpl-LucZfRaIraogqFUo0ieR6KhB",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Hello! I'm GLM, trained by Z.ai. How can I assist you today? Whether you have questions or just want to chat, I'm happy to help.",
                "reasoning_content": "Let me consider how to respond to this greeting thoughtfully.\n\nThe user has sent a simple \"Hi\" - this is likely the beginning of a conversation. I should respond in a way that's both welcoming and open-ended to encourage further interaction.\n\nI need to introduce myself and indicate my readiness to help. A warm, professional greeting would be appropriate here. I should also invite them to share what's on their mind or what they need assistance with.\n\nSince this is an initial greeting, I'll keep my response concise but friendly, making it clear that I'm here to help with whatever they might need."
            },
            "finish_reason": "stop",
            "native_finish_reason": "stop"
        }
    ],
    "created": 1782453513,
    "model": "MaaS_GLM_5.2_20260617",
    "object": "chat.completion",
    "usage": {
        "prompt_tokens": 13,
        "completion_tokens": 158,
        "total_tokens": 171,
        "completion_tokens_details": {
            "accepted_prediction_tokens": 0,
            "audio_tokens": 0,
            "image_tokens": 0,
            "reasoning_tokens": 121,
            "rejected_prediction_tokens": 0
        },
        "prompt_tokens_details": {
            "audio_tokens": 0,
            "cached_tokens": 0,
            "image_tokens": 0
        }
    }
}

Unified Domain Name Call

curl --location --request POST 'https://genaiapi.cloudsway.net/v1/chat/completions' \
--header 'Authorization: Bearer ${YOUR_AK}' \
--header 'Content-Type: application/json' \
--data-raw '{
    "model": "MaaS_GLM_5.2_20260617",
    "messages": [
        {
            "role": "user",
            "content": "Hi"
        }
    ]
}'

curl --location 'https://genaiapi.cloudsway.net/v1/messages' \
--header 'Authorization: Bearer ${YOUR_AK}' \
--header 'Content-Type: application/json' \
--data '{
        "max_tokens": 5000,
         "messages": [
        {
            "role": "user",
            "content": "Analyze the major themes in Pride and Prejudice."
        }
    ],
     "model":"MaaS_GLM_5.2_20260617"
}'