MaaS_DB_Speech API

1. Interface Description

The interface address is wss://genaiapiws.cloudsway.net/ws/api/v1/tts/ws_binary

2. Authentication

Authentication method uses Bearer Token. Add "Authorization": "Bearer {YOUR_ACCESS_KEY}" and "ModelName":"MaaS_DB_Speech" to the request header.

3. Request Method

Binary Protocol

Message Format

Field Description

Field (Size, unit: bit)	Description	Values
Protocol version (4)	May use different protocol versions in the future, so this field is to keep the client and server consistent in version.	`0b0001` - Version 1 (Currently only version 1)
Header size (4)	The actual header size is `header size value x 4` bytes. There's a special value `0b1111` indicating the header size is greater than or equal to 60 (15 x 4 bytes), meaning there will be a header extension field.	`0b0001` - Header size = 4 (1 x 4) `0b0010` - Header size = 8 (2 x 4) `0b1010` - Header size = 40 (10 x 4) `0b1110` - Header size = 56 (14 x 4) `0b1111` - Header size is 60 or larger; actual size defined in header extension
Message type (4)	Defines the message type.	`0b0001` - full client request. `0b1001`- full server response (deprecated). `0b1011` - Audio-only server response (ACK). `0b1111` - Error message from server (e.g., wrong message type, unsupported serialization method, etc.)
Message type specific flags (4)	The meaning of flags depends on the message type. See the message type section for details.
Message serialization method (4)	Defines the method for serializing the payload. Note: It only makes sense for certain message types (e.g., Audio-only server response `0b1011` doesn't need serialization).	`0b0000` - No serialization (raw bytes) `0b0001` - JSON `0b1111` - Custom type, defined in header extension Note: Currently only JSON format is supported
Message Compression (4)	Defines the compression method for the payload. Payload size field is not compressed (if present, depends on message type), and Payload size refers to the size of the payload after compression. Header is not compressed.	`0b0000` - No compression `0b0001` - gzip `0b1111` - Custom compression method, defined in header extension Note: Currently only no compression is supported
Reserved (8)	Reserved field, also serves as a boundary (making the entire header size 4 bytes).	`0x00` - Currently only 0

Detailed Description of Message Types

Currently, all TTS websocket requests use the full client request format, regardless of "query" or "submit".

Full client request

Header size is b0001 (i.e., 4B, no header extension).
Message type is b0001.
Message type specific flags are fixed at b0000.
Message serialization method is b0001 JSON. Refer to the table above for fields.
If gzip compression is used for the payload, the payload size is the size after compression.

Audio-only server response

Header size should be b0001.
Message type is b1011.
Message type specific flags can have the following values:
b0000 - No sequence number.
b0001 - sequence number > 0.
b0010 or b0011 - sequence number < 0, indicating the last message from the server, at which point the client should merge all audio segments (if there are multiple).
Message serialization method is b0000 (raw bytes).

4. Request Parameters

Field	Meaning	Level	Format	Required	Notes
user	User config	1	dict	✓
uid	User ID	2	string	✓	Can be any non-empty string, used for server-side logging
audio	Audio config	1	dict	✓
voice_type	Voice type	2	string	✓
emotion	Voice emotion	2	string		Set voice emotion. Example: "emotion": "angry" Note: Currently only some voices support emotion settings, and different voices support different emotion ranges.
enable_emotion	Enable voice emotion	2	bool		Whether to enable voice emotion, set enable_emotion to true. Example: "enable_emotion": True
encoding	Audio encoding format	2	string		wav / pcm / ogg_opus / mp3, default is pcm. Note: wav doesn't support streaming
speed_ratio	Speech rate	2	float		[0.8,2], default is 1, usually one decimal place is sufficient
rate	Audio sample rate	2	int		Default is 24000, options are 8000, 16000
BitRate	Bit rate	2	int		Can be 16, 96, 128, etc.
explicit_language	Explicit language	2	string		Only read text in specified language. If not given, normal Chinese-English mix. crosslingual enables multi-language frontend (including zh/en/ja/es-ms/id/pt-br). zh: mainly Chinese, supports Chinese-English mix. en: English only. ja: Japanese only. es-mx: Mexican Spanish only. id: Indonesian only. pt-br: Brazilian Portuguese only.
context_language	Reference language	2	string		Provides reference language for the model. If not given, Western European languages use English. id: Western European languages use Indonesian. es: Western European languages use Mexican Spanish. pt: Western European languages use Brazilian Portuguese.
loudness_ratio	Volume adjustment	2	float		[0.5,2], default is 1, usually one decimal place is sufficient. 0.5 represents 0.5 times the original volume, 2 represents 2 times the original volume
request	Request config	1	dict	✓
reqid	Request ID	2	string	✓	Need to ensure unique value for each call, UUID recommended
text	Text	2	string	✓	Text for speech synthesis, limited to 1024 bytes (UTF-8 encoding)
text_type	Text type	2	string		Need to specify when using ssml, value should be "ssml"
silence_duration	End-of-sentence silence	2	float		Setting this parameter can add silence duration at the end of the sentence, range 0~30000ms. (Note: The added end-of-sentence silence mainly applies to the end of the input text, not the end of each sentence) If this parameter is enabled, enable_trailing_silence_audio = true must be set under request first
with_timestamp	Timestamp related	2	int string		Pass 1 to enable, will return timestamp of text after TN, e.g.: 2025. According to semantics, the text after TN will be "two thousand and twenty-five" or "two zero two five". Note: Multiple punctuations or spaces in the original text will still be processed, but will not affect the continuity of timestamps (only for large model scenarios). Additional note (difference in timestamp principle between small and large models): Small models generate timestamps based on the frontend model, then synthesize audio. When processing timestamps, TN before and after text is mapped, so small models can return timestamps of the original text before TN, i.e., retaining Arabic numerals or special symbols in the original text. Large models synthesize audio after semantic understanding of the input text, then output timestamps by marking the synthesized audio after TN. If not using the text after TN, the output timestamps will not align with the synthesized audio, so large models return timestamps corresponding to the text after TN.
operation	Operation	2	string	✓	query (non-streaming, http can only query) / submit (streaming)
extra_param	Additional parameters	2	jsonstring
disable_markdown_filter		3	bool		Whether to enable markdown parsing and filtering. When true, parse and filter markdown syntax, e.g., hello will be read as "hello". When false, no parsing or filtering, e.g., hello will be read as "star star 'hello' star star". Example: "disable_markdown_filter": True
enable_latex_tn		3	bool		Whether to allow reading LaTeX formulas, need to set disable_markdown_filter to true. Example: "enable_latex_tn": True

Notes:

Character-level timestamp capability is now supported (not supported for ssml text type)
Pitch adjustment is not currently supported
Large model voice types support Chinese-English mix
Large model non-bidirectional streaming now supports LaTeX formulas
After successful websocket handshake, these Response headers will be returned

{
    "user": {
        "uid": "uid123"
    },
    "audio": {
        "voice_type": "zh_male_M392_conversation_wvae_bigtts",
        "encoding": "mp3",
        "speed_ratio": 1.0,
    },
    "request": {
        "reqid": "uuid",
        "text": "I love China",
        "operation": "query",
    }
}

5. Notes

A single websocket connection only supports one synthesis. For multiple syntheses, multiple connections need to be established
The reqid parameter needs to be reset for each synthesis and must be unique (using uuid.V4 is recommended)
Streaming response is only available when the operation is set to submit.

Return Code Description

Error Code	Error Description	Example	Suggested Action
3000	Correct request	Normal synthesis	Process normally
3001	Invalid request	Some parameter values are illegal, such as incorrect operation configuration	Check parameters
3003	Concurrency exceeded	Exceeds the concurrency threshold set online	Retry; switch offline if using sdk
3005	Backend service busy	Backend server load is high	Retry; switch offline if using sdk
3006	Service interrupted	Request with the same reqid after a request has completed/failed	Check parameters
3010	Text length exceeded	Single request exceeds the set text length threshold	Check parameters
3011	Invalid text	Parameters are incorrect or text is empty, text doesn't match the language, text only contains punctuation	Check parameters
3030	Processing timeout	Single request exceeds the service's maximum time limit	Retry or check text
3031	Processing error	Backend exception occurred	Retry; switch offline if using sdk
3032	Audio retrieval timeout	Backend network exception	Retry; switch offline if using sdk
3040	Backend link error	Backend network exception	Retry
3050	Voice does not exist	Check the voice_type code used	Check parameters

6. Voice List

Multi-Emotion

Voice Name	voice_type	Timestamp	Language	Supported Emotions
Beijing Young Man (Multi-emotion)	zh_male_beijingxiaoye_emo_v2_mars_bigtts	√	Chinese	Corresponding emotion parameters: angry, surprised, fear, excited, coldness, neutral
Gentle Girlfriend (Multi-emotion)	zh_female_roumeinvyou_emo_v2_mars_bigtts	√	Chinese	Corresponding emotion parameters: happy, sad, angry, surprised, fear, hate, excited, coldness, neutral
Sunny Youth (Multi-emotion)	zh_male_yangguangqingnian_emo_v2_mars_bigtts	√	Chinese	Corresponding emotion parameters: happy, sad, angry, fear, excited, coldness, neutral
Charming Girlfriend (Multi-emotion)	zh_female_meilinvyou_emo_v2_mars_bigtts	√	Chinese	Corresponding emotion parameters: sad, fear, neutral
Refreshing Sisi (Multi-emotion)	zh_female_shuangkuaisisi_emo_v2_mars_bigtts	√	Chinese, American English	Corresponding emotion parameters: happy, sad, angry, surprised, excited, coldness, neutral

General Scenarios

Voice Name	voice_type	Timestamp	Language
Cancan/Shiny	zh_female_cancan_mars_bigtts	√	Chinese, American English
Fresh Female Voice	zh_female_qingxinnvsheng_mars_bigtts	√	Chinese
Refreshing Sisi/Skye	zh_female_shuangkuaisisi_moon_bigtts	√	Chinese, American English
Warm Ahu/Alvin	zh_male_wennuanahu_moon_bigtts	√	Chinese, American English
Young Zixin/Brayan	zh_male_shaonianzixin_moon_bigtts	√	Chinese, American English
Intellectual Female Voice	zh_female_zhixingnvsheng_mars_bigtts	√	Chinese
Refreshing Male Voice	zh_male_qingshuangnanda_mars_bigtts	√	Chinese
Girl Next Door	zh_female_linjianvhai_moon_bigtts	√	Chinese
Knowledgeable Uncle	zh_male_yuanboxiaoshu_moon_bigtts	√	Chinese
Sunny Youth	zh_male_yangguangqingnian_moon_bigtts	√	Chinese
Sweet Xiaoyuan	zh_female_tianmeixiaoyuan_moon_bigtts	√	Chinese
Clear Zizi	zh_female_qingchezizi_moon_bigtts	√	Chinese
Explanatory Xiaoming	zh_male_jieshuoxiaoming_moon_bigtts	√	Chinese
Cheerful Sister	zh_female_kailangjiejie_moon_bigtts	√	Chinese
Boy Next Door	zh_male_linjiananhai_moon_bigtts	√	Chinese
Sweet Yueyue	zh_female_tianmeiyueyue_moon_bigtts	√	Chinese
Chicken Soup	zh_female_xinlingjitang_moon_bigtts	√	Chinese
Intellectual and Gentle	ICL_zh_female_zhixingwenwan_tob	√	Chinese
Warm and Considerate	ICL_zh_male_nuanxintitie_tob	√	Chinese
Gentle and Elegant	ICL_zh_female_wenrouwenya_tob	√	Chinese
Cheerful and Lively	ICL_zh_male_kailangqingkuai_tob	√	Chinese
Lively and Bright	ICL_zh_male_huoposhuanglang_tob	√	Chinese
Straightforward Guy	ICL_zh_male_shuaizhenxiaohuo_tob	√	Chinese
Gentle Brother	zh_male_wenrouxiaoge_mars_bigtts	√	Chinese
Smith	en_male_smith_mars_bigtts	√	British English
Anna	en_female_anna_mars_bigtts	√	British English
Adam	en_male_adam_mars_bigtts	√	American English
Sarah	en_female_sarah_mars_bigtts	√	Australian English
Dryw	en_male_dryw_mars_bigtts	√	Australian English

Multi-language

Voice Name	voice_type	Timestamp	Language
かずね（和音）/Javier or Álvaro	multi_male_jingqiangkanye_moon_bigtts	√	Japanese, Spanish
はるこ（晴子）/Esmeralda	multi_female_shuangkuaisisi_moon_bigtts	√	Japanese, Spanish
ひろし（広志）/Roberto	multi_male_wanqudashu_moon_bigtts	√	Japanese, Spanish
あけみ（朱美）	multi_female_gaolengyujie_moon_bigtts	√	Japanese
Amanda	en_female_amanda_mars_bigtts	√	American English
Jackson	en_male_jackson_mars_bigtts	√	American English

Fun Accents

Voice Name	voice_type	Timestamp	Language
Beijing Accent Kanye/Harmony	zh_male_jingqiangkanye_moon_bigtts	√	Chinese-Beijing accent, English
Taiwan Little He	zh_female_wanwanxiaohe_moon_bigtts	√	Chinese-Taiwan accent
Bay Area Uncle	zh_female_wanqudashu_moon_bigtts	√	Chinese-Cantonese accent
Cute Sichuan Girl	zh_female_daimengchuanmei_moon_bigtts	√	Chinese-Sichuan accent
Guangzhou Dege	zh_male_guozhoudege_moon_bigtts	√	Chinese-Cantonese accent
Beijing Young Man	zh_male_beijingxiaoye_moon_bigtts	√	Chinese-Beijing accent
Haoyu Brother	zh_male_haoyuxiaoge_moon_bigtts	√	Chinese-Qingdao accent
Guangxi Yuanzhou	zh_male_guangxiyuanzhou_moon_bigtts	√	Chinese-Guangxi accent
Changsha Meituo Jie'er	zh_female_meituojieer_moon_bigtts	√	Chinese-Changsha accent
Henan Yuzhou Zixuan	zh_male_yuzhouzixuan_moon_bigtts	√	Chinese-Henan accent

Role-playing

Voice Name	voice_type	Timestamp	Language
Cute Baby	zh_male_naiqimengwa_mars_bigtts	√	Chinese
Grandma	zh_female_popo_mars_bigtts	√	Chinese
Cool Sister	zh_female_gaolengyujie_moon_bigtts	√	Chinese
Arrogant Boss	zh_male_aojiaobazong_moon_bigtts	√	Chinese
Charming Girlfriend	zh_female_meilinvyou_moon_bigtts	√	Chinese
Late Night Podcaster	zh_male_shenyeboke_moon_bigtts	√	Chinese
Gentle Girlfriend	zh_female_sajiaonvyou_moon_bigtts	√	Chinese
Coquettish Junior	zh_female_yuanqinvyou_moon_bigtts	√	Chinese
Frail Girl	ICL_zh_female_bingruoshaonv_tob	√	Chinese
Lively Girl	ICL_zh_female_huoponvhai_tob	√	Chinese
Oriental Haoran	zh_male_dongfanghaoran_moon_bigtts	√	Chinese
Green Tea Brother	ICL_zh_male_lvchaxiaoge_tob	√	Chinese
Delicate Loli	ICL_zh_female_jiaoruoluoli_tob	√	Chinese
Cold and Distant	ICL_zh_male_lengdanshuli_tob	√	Chinese
Honest and Solid	ICL_zh_male_hanhoudunshi_tob	√	Chinese
Arrogant and Domineering	ICL_zh_male_aiqilingren_tob	√	Chinese
Lively and Mischievous	ICL_zh_female_huopodiaoman_tob	√	Chinese
Stubborn and Yandere	ICL_zh_male_guzhibingjiao_tob	√	Chinese
Coquettish and Clingy	ICL_zh_male_sajiaonianren_tob	√	Chinese
Arrogant and High-pitched	ICL_zh_female_aomanjiaosheng_tob	√	Chinese
Casual and Free-spirited	ICL_zh_male_xiaosasuixing_tob	√	Chinese
Cunning Young Master	ICL_zh_male_fuheigongzi_tob	√	Chinese
Mysterious and Enigmatic	ICL_zh_male_guiyishenmi_tob	√	Chinese
Elegant and Talented	ICL_zh_male_ruyacaijun_tob	√	Chinese
Yandere White Lotus	ICL_zh_male_bingjiaobailian_tob	√	Chinese
Upright Youth	ICL_zh_male_zhengzhiqingnian_tob	√	Chinese
Coquettish Queen	ICL_zh_female_jiaohannvwang_tob	√	Chinese
Yandere Cute Girl	ICL_zh_female_bingjiaomengmei_tob	√	Chinese
Inexperienced Young Man	ICL_zh_male_qingsenaigou_tob	√	Chinese
Innocent Junior	ICL_zh_male_chunzhenxuedi_tob	√	Chinese
Caring Senior Sister	ICL_zh_female_nuanxinxuejie_tob	√	Chinese
Cute Girl	ICL_zh_female_keainvsheng_tob	√	Chinese
Mature Sister	ICL_zh_female_chengshujiejie_tob	√	Chinese
Yandere Sister	ICL_zh_female_bingjiaojiejie_tob	√	Chinese
Indecisive Leader	ICL_zh_male_youroubangzhu_tob	√	Chinese
Indecisive Young Master	ICL_zh_male_yourougongzi_tob	√	Chinese
Charming Sister	ICL_zh_female_wumeiyujie_tob	√	Chinese
Mischievous Princess	ICL_zh_female_tiaopigongzhu_tob	√	Chinese
Tsundere Girlfriend	ICL_zh_female_aojiaonvyou_tob	√	Chinese
Caring Boyfriend	ICL_zh_male_tiexinnanyou_tob	√	Chinese
Young General	ICL_zh_male_shaonianjiangjun_tob	√	Chinese
Caring Girlfriend	ICL_zh_female_tiexinnvyou_tob	√	Chinese
Yandere Brother	ICL_zh_male_bingjiaogege_tob	√	Chinese
Top Student Male Classmate	ICL_zh_male_xuebanantongzhuo_tob	√	Chinese
Humorous Uncle	ICL_zh_male_youmoshushu_tob	√	Chinese
Sexy Sister	ICL_zh_female_xingganyujie_tob	√	Chinese
Tomboy	ICL_zh_female_jiaxiaozi_tob	√	Chinese
Cold Superior	ICL_zh_male_lengjunshangsi_tob	√	Chinese
Gentle Male Classmate	ICL_zh_male_wenrounantongzhuo_tob	√	Chinese
Yandere Younger Brother	ICL_zh_male_bingjiaodidi_tob	√	Chinese
Humorous Old Man	ICL_zh_male_youmodaye_tob	√	Chinese
Arrogant Young Master	ICL_zh_male_aomanshaoye_tob	√	Chinese
Mysterious Mage	ICL_zh_male_shenmifashi_tob	√	Chinese

Video Configuration

Voice Name	voice_type	Timestamp	Language
Kind Grandma	ICL_zh_female_heainainai_tob	√	Chinese
Neighbor Aunt	ICL_zh_female_linjuayi_tob	√	Chinese
Gentle Little Ya	zh_female_wenrouxiaoya_moon_bigtts	√	Chinese
Genius Child Voice	zh_male_tiancaitongsheng_mars_bigtts	√	Chinese
Monkey Brother	zh_male_sunwukong_mars_bigtts	√	Chinese
Bear Two	zh_male_xionger_mars_bigtts	√	Chinese
Peppa Pig	zh_female_peiqi_mars_bigtts	√	Chinese
Wu Zetian	zh_female_wuzetian_mars_bigtts	√	Chinese
Sister Gu	zh_female_gujie_mars_bigtts	√	Chinese
Cherry Ball	zh_female_yingtaowanzi_mars_bigtts	√	Chinese
Advertising Narration	zh_male_chunhui_mars_bigtts	√	Chinese
Children's Story	zh_female_shaoergushi_mars_bigtts	√	Chinese
Fourth Brother	zh_male_silang_mars_bigtts	√	Chinese
Magnetic Narrator (Morgan)	zh_male_jieshuonansheng_mars_bigtts	√	Chinese, American English
Chicken Soup Sister (Hope)	zh_female_jitangmeimei_mars_bigtts	√	Chinese, American English
Caring Female Voice (Candy)	zh_female_tiexinnvsheng_mars_bigtts	√	Chinese, American English
Playful Female Voice	zh_female_qiaopinvsheng_mars_bigtts	√	Chinese
Cute Girl (Cutey)	zh_female_mengyatou_mars_bigtts	√	Chinese, American English
Lazy Sound Male	zh_male_lanxiaoyang_mars_bigtts	√	Chinese
Bright Voice Cute Boy	zh_male_dongmanhaimian_mars_bigtts	√	Chinese

Audiobook Reading

Voice Name	voice_type	Timestamp	Language
Suspense Narration	zh_male_changtianyi_mars_bigtts	√	Chinese
Elegant Young Man	zh_male_ruyaqingnian_mars_bigtts	√	Chinese
Dominant Young Uncle	zh_male_baqiqingshu_mars_bigtts	√	Chinese
Grand Voice	zh_male_qingcang_mars_bigtts	√	Chinese
Energetic Young Man	zh_male_yangguangqingnian_mars_bigtts	√	Chinese
Ancient Style Young Gentleman	zh_female_gufengshaoyu_mars_bigtts	√	Chinese
Tender Lady	zh_female_wenroushunv_mars_bigtts	√	Chinese
Anti-rolling Young Man	zh_male_fanjuanqingnian_mars_bigtts	√	Chinese

Note:

The above Chinese voice types support mixed Chinese and English scenarios.
The voice types "Sickly Girl," "Lively Girl," "Kind Grandma," and "Neighbor Aunt" do not currently support bidirectional streaming interface calls.

7.Example：

package org.example.websocket.tts.demo;

import com.alibaba.fastjson.annotation.JSONField;
import lombok.Builder;
import lombok.Data;

@Data
@Builder
public class TtsRequest {
    @JSONField(name = "user")
    private User user;
    @JSONField(name = "audio")
    private Audio audio;
    @JSONField(name = "request")
    private Request request;

    @Data
    @Builder
    public static class User {
        @JSONField(name = "uid")
        private String uid;
    }

    @Data
    @Builder
    public static class Audio {
        @JSONField(name = "voice_type")
        private String voiceType;
        @JSONField(name = "emotion")
        private String emotion;
        @JSONField(name = "enable_emotion")
        private Boolean enableEmotion;
        @JSONField(name = "encoding")
        private String encoding;
        @JSONField(name = "speed_ratio")
        private Double speedRatio;
        @JSONField(name = "rate")
        private Integer rate;
        @JSONField(name = "BitRate")
        private Integer BitRate;
        @JSONField(name = "explicit_language")
        private String explicitLanguage;
        @JSONField(name = "context_language")
        private String contextLanguage;
        @JSONField(name = "loudness_ratio")
        private Double loudnessRatio;
        @JSONField(name = "volume_ratio")
        private Double volumeRatio;
        @JSONField(name = "voice")
        private String voice;
        @JSONField(name = "pitch_ratio")
        private Double pitchRatio;
        @JSONField(name = "language")
        private String language;
    }

    @Data
    @Builder
    public static class Request {
        @JSONField(name = "reqid")
        private String reqid;
        @JSONField(name = "text")
        private String text;
        @JSONField(name = "text_type")
        private String textType;
        @JSONField(name = "silence_duration")
        private Double silenceDuration;
        @JSONField(name = "with_timestamp")
        private Integer withTimestamp;
        @JSONField(name = "operation")
        private String operation;
        @JSONField(name = "extra_param")
        private String extraParam;

    }
}

package org.example.websocket.tts;

import com.alibaba.fastjson.JSON;
import lombok.Getter;
import org.example.websocket.tts.demo.TtsRequest;
import org.java_websocket.client.WebSocketClient;
import org.java_websocket.framing.CloseFrame;
import org.java_websocket.handshake.ServerHandshake;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.*;
import java.math.BigInteger;
import java.net.URI;
import java.nio.ByteBuffer;
import java.nio.charset.StandardCharsets;
import java.util.HashMap;
import java.util.Map;
import java.util.UUID;

public class TtsWebsocketDemo {
    private static final Logger log = LoggerFactory.getLogger(TtsWebsocketDemo.class);
    public static final String API_URL = "";

    public static void main(String[] args) throws Exception {
        String accessToken = "";

        TtsRequest ttsRequest;

        ttsRequest = TtsRequest.builder()
                .user(TtsRequest.User.builder()
                        .uid("dadadadwd1e12121rtdawq")
                        .build())
                .audio(TtsRequest.Audio.builder()
                        .encoding("mp3")
                        .voiceType("zh_female_qingchezizi_moon_bigtts")
                        .language("en")
                        .build())
                .request(TtsRequest.Request.builder()
                        .reqid(UUID.randomUUID().toString())
                        .operation("query")
                        .text("我爱中国")
                        .build())
                .build();

        String json = JSON.toJSONString(ttsRequest);

        log.info("request: {}", json);
        HashMap<String, String> authorization = new HashMap<>();
        authorization.put("Authorization", "Bearer " + accessToken);
        authorization.put("ModelName","MaaS_DB_Speech");
        TtsWebsocketClient ttsWebsocketClient = new TtsWebsocketClient(authorization);
        byte[] audio = ttsWebsocketClient.submit(ttsRequest);
        FileOutputStream fos = new FileOutputStream("test4.mp3");
        fos.write(audio);
        fos.close();
        log.info("TTS done.");
    }



    public static class TtsWebsocketClient extends WebSocketClient {
        private final ByteArrayOutputStream buffer = new ByteArrayOutputStream();

        public TtsWebsocketClient(Map<String, String> authorization) {
            super(URI.create(API_URL), authorization);
//            super(URI.create(API_URL));
        }

        public byte[] submit(TtsRequest ttsRequest) throws InterruptedException {
            String json = JSON.toJSONString(ttsRequest);
            log.info("request: {}", json);
            byte[] jsonBytes = json.getBytes(StandardCharsets.UTF_8);
            byte[] header = {0x11, 0x10, 0x10, 0x00};
            ByteBuffer requestByte = ByteBuffer.allocate(8 + jsonBytes.length);
            requestByte.put(header).putInt(jsonBytes.length).put(jsonBytes);

            this.connectBlocking();
            synchronized (this) {
                this.send(requestByte.array());
                wait();
                return this.buffer.toByteArray();
            }
        }

        @Override
        public void onMessage(ByteBuffer bytes) {
            log.info("received message:" + bytes.remaining() + " bytes");
            int protocolVersion = (bytes.get(0) & 0xff) >> 4;
            int headerSize = bytes.get(0) & 0x0f;
            int messageType = (bytes.get(1) & 0xff) >> 4;
            int messageTypeSpecificFlags = bytes.get(1) & 0x0f;
            int serializationMethod = (bytes.get(2) & 0xff) >> 4;
            int messageCompression = bytes.get(2) & 0x0f;
            int reserved = bytes.get(3) & 0xff;
            bytes.position(headerSize * 4);
            byte[] fourByte = new byte[4];
            if (messageType == 11) {
                // Audio-only server response
                log.info("received audio-only response.");
                if (messageTypeSpecificFlags == 0) {
                    // Ack without audio data
                } else {
                    bytes.get(fourByte, 0, 4);
                    int sequenceNumber = new BigInteger(fourByte).intValue();
                    bytes.get(fourByte, 0, 4);
                    int payloadSize = new BigInteger(fourByte).intValue();
                    byte[] payload = new byte[payloadSize];
                    bytes.get(payload, 0, payloadSize);
                    try {
                        this.buffer.write(payload);
                    } catch (IOException e) {
                        throw new RuntimeException(e);
                    }
                    if (sequenceNumber < 0) {
                        // received the last segment
                        this.close(CloseFrame.NORMAL, "received all audio data.");
                    }
                }
            } else if (messageType == 15) {
                // Error message from server
                bytes.get(fourByte, 0, 4);
                int code = new BigInteger(fourByte).intValue();
                bytes.get(fourByte, 0, 4);
                int messageSize = new BigInteger(fourByte).intValue();
                byte[] messageBytes = new byte[messageSize];
                bytes.get(messageBytes, 0, messageSize);
                String message = new String(messageBytes, StandardCharsets.UTF_8);
                throw new TtsException(code, message);
            } else {
                log.warn("Received unknown response message type: {}", messageType);
            }
        }

        @Override
        public void onOpen(ServerHandshake serverHandshake) {
            log.info("opened connection");
        }

        @Override
        public void onMessage(String message) {
            log.info("received message: " + message);
        }

        @Override
        public void onClose(int code, String reason, boolean remote) {
            log.info("Connection closed by {}, Code: {}, Reason: {}", (remote ? "remote" : "us"), code, reason);
            synchronized (this) {
                notify();
            }
        }

        @Override
        public void onError(Exception e) {
            close(CloseFrame.NORMAL, e.toString());
        }
    }

    @Getter
    public static class TtsException extends RuntimeException {
        private final int code;
        private final String message;

        public TtsException(int code, String message) {
            super("code=" + code + ", message=" + message);
            this.code = code;
            this.message = message;
        }
    }
}