Skip to content

MaaS_DB_Speech API

1. Interface Description

The interface address is wss://genaiapi.cloudsway.net/ws/api/v1/tts/ws_binary

2. Authentication

Authentication method uses Bearer Token. Add "Authorization": "Bearer {YOUR_ACCESS_KEY}" and "ModelName":"MaaS_DB_Speech" to the request header.

3. Request Method

Binary Protocol

Message Format

Image Field Description

Field (Size, unit: bit) Description Values
Protocol version (4) May use different protocol versions in the future, so this field is to keep the client and server consistent in version. 0b0001 - Version 1 (Currently only version 1)
Header size (4) The actual header size is header size value x 4 bytes. There's a special value 0b1111 indicating the header size is greater than or equal to 60 (15 x 4 bytes), meaning there will be a header extension field. 0b0001 - Header size = 4 (1 x 4)
0b0010 - Header size = 8 (2 x 4)
0b1010 - Header size = 40 (10 x 4)
0b1110 - Header size = 56 (14 x 4)
0b1111 - Header size is 60 or larger; actual size defined in header extension
Message type (4) Defines the message type. 0b0001 - full client request.
0b1001- full server response (deprecated). 0b1011 - Audio-only server response (ACK).
0b1111 - Error message from server (e.g., wrong message type, unsupported serialization method, etc.)
Message type specific flags (4) The meaning of flags depends on the message type. See the message type section for details.
Message serialization method (4) Defines the method for serializing the payload. Note: It only makes sense for certain message types (e.g., Audio-only server response 0b1011 doesn't need serialization). 0b0000 - No serialization (raw bytes) 0b0001 - JSON
0b1111 - Custom type, defined in header extension
Note: Currently only JSON format is supported
Message Compression (4) Defines the compression method for the payload. Payload size field is not compressed (if present, depends on message type), and Payload size refers to the size of the payload after compression. Header is not compressed. 0b0000 - No compression
0b0001 - gzip
0b1111 - Custom compression method, defined in header extension
Note: Currently only no compression is supported
Reserved (8) Reserved field, also serves as a boundary (making the entire header size 4 bytes). 0x00 - Currently only 0

Detailed Description of Message Types

Currently, all TTS websocket requests use the full client request format, regardless of "query" or "submit".

Full client request

  • Header size is b0001 (i.e., 4B, no header extension).
  • Message type is b0001.
  • Message type specific flags are fixed at b0000.
  • Message serialization method is b0001 JSON. Refer to the table above for fields.
  • If gzip compression is used for the payload, the payload size is the size after compression.

Audio-only server response

  • Header size should be b0001.
  • Message type is b1011.
  • Message type specific flags can have the following values:
  • b0000 - No sequence number.
  • b0001 - sequence number > 0.
  • b0010 or b0011 - sequence number < 0, indicating the last message from the server, at which point the client should merge all audio segments (if there are multiple).
  • Message serialization method is b0000 (raw bytes).

4. Request Parameters

Field Meaning Level Format Required Notes
app Application config 1 dict
appid Application ID 2 string Needs to be applied for
token Application token 2 string Can be any non-empty string
cluster Business cluster 2 string volcano_tts
user User config 1 dict
uid User ID 2 string Can be any non-empty string, used for server-side logging
audio Audio config 1 dict
voice_type Voice type 2 string
emotion Voice emotion 2 string Set voice emotion. Example: "emotion": "angry" Note: Currently only some voices support emotion settings, and different voices support different emotion ranges.
enable_emotion Enable voice emotion 2 bool Whether to enable voice emotion, set enable_emotion to true. Example: "enable_emotion": True
encoding Audio encoding format 2 string wav / pcm / ogg_opus / mp3, default is pcm. Note: wav doesn't support streaming
speed_ratio Speech rate 2 float [0.8,2], default is 1, usually one decimal place is sufficient
rate Audio sample rate 2 int Default is 24000, options are 8000, 16000
BitRate Bit rate 2 int Can be 16, 96, 128, etc.
explicit_language Explicit language 2 string Only read text in specified language. If not given, normal Chinese-English mix. crosslingual enables multi-language frontend (including zh/en/ja/es-ms/id/pt-br). zh: mainly Chinese, supports Chinese-English mix. en: English only. ja: Japanese only. es-mx: Mexican Spanish only. id: Indonesian only. pt-br: Brazilian Portuguese only.
context_language Reference language 2 string Provides reference language for the model. If not given, Western European languages use English. id: Western European languages use Indonesian. es: Western European languages use Mexican Spanish. pt: Western European languages use Brazilian Portuguese.
loudness_ratio Volume adjustment 2 float [0.5,2], default is 1, usually one decimal place is sufficient. 0.5 represents 0.5 times the original volume, 2 represents 2 times the original volume
request Request config 1 dict
reqid Request ID 2 string Need to ensure unique value for each call, UUID recommended
text Text 2 string Text for speech synthesis, limited to 1024 bytes (UTF-8 encoding)
text_type Text type 2 string Need to specify when using ssml, value should be "ssml"
silence_duration End-of-sentence silence 2 float Setting this parameter can add silence duration at the end of the sentence, range 0~30000ms. (Note: The added end-of-sentence silence mainly applies to the end of the input text, not the end of each sentence) If this parameter is enabled, enable_trailing_silence_audio = true must be set under request first
with_timestamp Timestamp related 2 int string Pass 1 to enable, will return timestamp of text after TN, e.g.: 2025. According to semantics, the text after TN will be "two thousand and twenty-five" or "two zero two five". Note: Multiple punctuations or spaces in the original text will still be processed, but will not affect the continuity of timestamps (only for large model scenarios). Additional note (difference in timestamp principle between small and large models): Small models generate timestamps based on the frontend model, then synthesize audio. When processing timestamps, TN before and after text is mapped, so small models can return timestamps of the original text before TN, i.e., retaining Arabic numerals or special symbols in the original text. Large models synthesize audio after semantic understanding of the input text, then output timestamps by marking the synthesized audio after TN. If not using the text after TN, the output timestamps will not align with the synthesized audio, so large models return timestamps corresponding to the text after TN.
operation Operation 2 string query (non-streaming, http can only query) / submit (streaming)
extra_param Additional parameters 2 jsonstring
disable_markdown_filter 3 bool Whether to enable markdown parsing and filtering. When true, parse and filter markdown syntax, e.g., hello will be read as "hello". When false, no parsing or filtering, e.g., hello will be read as "star star 'hello' star star". Example: "disable_markdown_filter": True
enable_latex_tn 3 bool Whether to allow reading LaTeX formulas, need to set disable_markdown_filter to true. Example: "enable_latex_tn": True

Notes:

  1. Character-level timestamp capability is now supported (not supported for ssml text type)
  2. Pitch adjustment is not currently supported
  3. Large model voice types support Chinese-English mix
  4. Large model non-bidirectional streaming now supports LaTeX formulas
  5. After successful websocket handshake, these Response headers will be returned
{
    "user": {
        "uid": "uid123"
    },
    "audio": {
        "voice_type": "zh_male_M392_conversation_wvae_bigtts",
        "encoding": "mp3",
        "speed_ratio": 1.0,
    },
    "request": {
        "reqid": "uuid",
        "text": "I love China",
        "operation": "query",
    }
}

5. Notes

  • A single websocket connection only supports one synthesis. For multiple syntheses, multiple connections need to be established
  • The reqid parameter needs to be reset for each synthesis and must be unique (using uuid.V4 is recommended)
  • operation needs to be set to submit

Return Code Description

Error Code Error Description Example Suggested Action
3000 Correct request Normal synthesis Process normally
3001 Invalid request Some parameter values are illegal, such as incorrect operation configuration Check parameters
3003 Concurrency exceeded Exceeds the concurrency threshold set online Retry; switch offline if using sdk
3005 Backend service busy Backend server load is high Retry; switch offline if using sdk
3006 Service interrupted Request with the same reqid after a request has completed/failed Check parameters
3010 Text length exceeded Single request exceeds the set text length threshold Check parameters
3011 Invalid text Parameters are incorrect or text is empty, text doesn't match the language, text only contains punctuation Check parameters
3030 Processing timeout Single request exceeds the service's maximum time limit Retry or check text
3031 Processing error Backend exception occurred Retry; switch offline if using sdk
3032 Audio retrieval timeout Backend network exception Retry; switch offline if using sdk
3040 Backend link error Backend network exception Retry
3050 Voice does not exist Check the voice_type code used Check parameters

6. Voice List

Multi-Emotion

Voice Name voice_type Timestamp Language Supported Emotions
Beijing Young Man (Multi-emotion) zh_male_beijingxiaoye_emo_v2_mars_bigtts Chinese Corresponding emotion parameters: angry, surprised, fear, excited, coldness, neutral
Gentle Girlfriend (Multi-emotion) zh_female_roumeinvyou_emo_v2_mars_bigtts Chinese Corresponding emotion parameters: happy, sad, angry, surprised, fear, hate, excited, coldness, neutral
Sunny Youth (Multi-emotion) zh_male_yangguangqingnian_emo_v2_mars_bigtts Chinese Corresponding emotion parameters: happy, sad, angry, fear, excited, coldness, neutral
Charming Girlfriend (Multi-emotion) zh_female_meilinvyou_emo_v2_mars_bigtts Chinese Corresponding emotion parameters: sad, fear, neutral
Refreshing Sisi (Multi-emotion) zh_female_shuangkuaisisi_emo_v2_mars_bigtts Chinese, American English Corresponding emotion parameters: happy, sad, angry, surprised, excited, coldness, neutral

General Scenarios

Voice Name voice_type Timestamp Language Supported Emotions
Cancan/Shiny zh_female_cancan_mars_bigtts Chinese, American English
Fresh Female Voice zh_female_qingxinnvsheng_mars_bigtts Chinese
Refreshing Sisi/Skye zh_female_shuangkuaisisi_moon_bigtts Chinese, American English
Warm Ahu/Alvin zh_male_wennuanahu_moon_bigtts Chinese, American English
Young Zixin/Brayan zh_male_shaonianzixin_moon_bigtts Chinese, American English
Intellectual Female Voice zh_female_zhixingnvsheng_mars_bigtts Chinese
Refreshing Male Voice zh_male_qingshuangnanda_mars_bigtts Chinese
Girl Next Door zh_female_linjianvhai_moon_bigtts Chinese
Knowledgeable Uncle zh_male_yuanboxiaoshu_moon_bigtts Chinese
Sunny Youth zh_male_yangguangqingnian_moon_bigtts Chinese
Sweet Xiaoyuan zh_female_tianmeixiaoyuan_moon_bigtts Chinese
Clear Zizi zh_female_qingchezizi_moon_bigtts Chinese
Explanatory Xiaoming zh_male_jieshuoxiaoming_moon_bigtts Chinese
Cheerful Sister zh_female_kailangjiejie_moon_bigtts Chinese
Boy Next Door zh_male_linjiananhai_moon_bigtts Chinese
Sweet Yueyue zh_female_tianmeiyueyue_moon_bigtts Chinese
Chicken Soup zh_female_xinlingjitang_moon_bigtts Chinese
Intellectual and Gentle ICL_zh_female_zhixingwenwan_tob Chinese
Warm and Considerate ICL_zh_male_nuanxintitie_tob Chinese
Gentle and Elegant ICL_zh_female_wenrouwenya_tob Chinese
Cheerful and Lively ICL_zh_male_kailangqingkuai_tob Chinese
Lively and Bright ICL_zh_male_huoposhuanglang_tob Chinese
Straightforward Guy ICL_zh_male_shuaizhenxiaohuo_tob Chinese
Gentle Brother zh_male_wenrouxiaoge_mars_bigtts Chinese
Smith en_male_smith_mars_bigtts British English
Anna en_female_anna_mars_bigtts British English
Adam en_male_adam_mars_bigtts American English
Sarah en_female_sarah_mars_bigtts Australian English
Dryw en_male_dryw_mars_bigtts Australian English

Multi-language

Voice Name voice_type Timestamp Language Supported Emotions
かずね(和音)/Javier or Álvaro multi_male_jingqiangkanye_moon_bigtts Japanese, Spanish
はるこ(晴子)/Esmeralda multi_female_shuangkuaisisi_moon_bigtts Japanese, Spanish
ひろし(広志)/Roberto multi_male_wanqudashu_moon_bigtts Japanese, Spanish
あけみ(朱美) multi_female_gaolengyujie_moon_bigtts Japanese
Amanda en_female_amanda_mars_bigtts American English
Jackson en_male_jackson_mars_bigtts American English

Fun Accents

Voice Name voice_type Timestamp Language Supported Emotions
Beijing Accent Kanye/Harmony zh_male_jingqiangkanye_moon_bigtts Chinese-Beijing accent, English
Taiwan Little He zh_female_wanwanxiaohe_moon_bigtts Chinese-Taiwan accent
Bay Area Uncle zh_female_wanqudashu_moon_bigtts Chinese-Cantonese accent
Cute Sichuan Girl zh_female_daimengchuanmei_moon_bigtts Chinese-Sichuan accent
Guangzhou Dege zh_male_guozhoudege_moon_bigtts Chinese-Cantonese accent
Beijing Young Man zh_male_beijingxiaoye_moon_bigtts Chinese-Beijing accent
Haoyu Brother zh_male_haoyuxiaoge_moon_bigtts Chinese-Qingdao accent
Guangxi Yuanzhou zh_male_guangxiyuanzhou_moon_bigtts Chinese-Guangxi accent
Changsha Meituo Jie'er zh_female_meituojieer_moon_bigtts Chinese-Changsha accent
Henan Yuzhou Zixuan zh_male_yuzhouzixuan_moon_bigtts Chinese-Henan accent

Role-playing

Voice Name voice_type Timestamp Language Supported Emotions
Cute Baby zh_male_naiqimengwa_mars_bigtts Chinese
Grandma zh_female_popo_mars_bigtts Chinese
Cool Sister zh_female_gaolengyujie_moon_bigtts Chinese
Arrogant Boss zh_male_aojiaobazong_moon_bigtts Chinese
Charming Girlfriend zh_female_meilinvyou_moon_bigtts Chinese
Late Night Podcaster zh_male_shenyeboke_moon_bigtts Chinese
Gentle Girlfriend zh_female_sajiaonvyou_moon_bigtts Chinese
Coquettish Junior zh_female_yuanqinvyou_moon_bigtts Chinese
Frail Girl ICL_zh_female_bingruoshaonv_tob Chinese
Lively Girl ICL_zh_female_huoponvhai_tob Chinese
Oriental Haoran zh_male_dongfanghaoran_moon_bigtts Chinese
Green Tea Brother ICL_zh_male_lvchaxiaoge_tob Chinese
Delicate Loli ICL_zh_female_jiaoruoluoli_tob Chinese
Cold and Distant ICL_zh_male_lengdanshuli_tob Chinese
Honest and Solid ICL_zh_male_hanhoudunshi_tob Chinese
Arrogant and Domineering ICL_zh_male_aiqilingren_tob Chinese
Lively and Mischievous ICL_zh_female_huopodiaoman_tob Chinese
Stubborn and Yandere ICL_zh_male_guzhibingjiao_tob Chinese
Coquettish and Clingy ICL_zh_male_sajiaonianren_tob Chinese
Arrogant and High-pitched ICL_zh_female_aomanjiaosheng_tob Chinese
Casual and Free-spirited ICL_zh_male_xiaosasuixing_tob Chinese
Cunning Young Master ICL_zh_male_fuheigongzi_tob Chinese
Mysterious and Enigmatic ICL_zh_male_guiyishenmi_tob Chinese
Elegant and Talented ICL_zh_male_ruyacaijun_tob Chinese
Yandere White Lotus ICL_zh_male_bingjiaobailian_tob Chinese
Upright Youth ICL_zh_male_zhengzhiqingnian_tob Chinese
Coquettish Queen ICL_zh_female_jiaohannvwang_tob Chinese
Yandere Cute Girl ICL_zh_female_bingjiaomengmei_tob Chinese
Inexperienced Young Man ICL_zh_male_qingsenaigou_tob Chinese
Innocent Junior ICL_zh_male_chunzhenxuedi_tob Chinese
Caring Senior Sister ICL_zh_female_nuanxinxuejie_tob Chinese
Cute Girl ICL_zh_female_keainvsheng_tob Chinese
Mature Sister ICL_zh_female_chengshujiejie_tob Chinese
Yandere Sister ICL_zh_female_bingjiaojiejie_tob Chinese
Indecisive Leader ICL_zh_male_youroubangzhu_tob Chinese
Indecisive Young Master ICL_zh_male_yourougongzi_tob Chinese
Charming Sister ICL_zh_female_wumeiyujie_tob Chinese
Mischievous Princess ICL_zh_female_tiaopigongzhu_tob Chinese
Tsundere Girlfriend ICL_zh_female_aojiaonvyou_tob Chinese
Caring Boyfriend ICL_zh_male_tiexinnanyou_tob Chinese
Young General ICL_zh_male_shaonianjiangjun_tob Chinese
Caring Girlfriend ICL_zh_female_tiexinnvyou_tob Chinese
Yandere Brother ICL_zh_male_bingjiaogege_tob Chinese
Top Student Male Classmate ICL_zh_male_xuebanantongzhuo_tob Chinese
Humorous Uncle ICL_zh_male_youmoshushu_tob Chinese
Sexy Sister ICL_zh_female_xingganyujie_tob Chinese
Tomboy ICL_zh_female_jiaxiaozi_tob Chinese
Cold Superior ICL_zh_male_lengjunshangsi_tob Chinese
Gentle Male Classmate ICL_zh_male_wenrounantongzhuo_tob Chinese
Yandere Younger Brother ICL_zh_male_bingjiaodidi_tob Chinese
Humorous Old Man ICL_zh_male_youmodaye_tob Chinese
Arrogant Young Master ICL_zh_male_aomanshaoye_tob Chinese
Mysterious Mage ICL_zh_male_shenmifashi_tob Chinese

Video Configuration

Voice Name voice_type Timestamp Language Supported Emotions
Kind Grandma ICL_zh_female_heainainai_tob Chinese
Neighbor Aunt ICL_zh_female_linjuayi_tob Chinese
Gentle Little Ya zh_female_wenrouxiaoya_moon_bigtts Chinese
Genius Child Voice zh_male_tiancaitongsheng_mars_bigtts Chinese
Monkey Brother zh_male_sunwukong_mars_bigtts Chinese
Bear Two zh_male_xionger_mars_bigtts Chinese
Peppa Pig zh_female_peiqi_mars_bigtts Chinese
Wu Zetian zh_female_wuzetian_mars_bigtts Chinese
Sister Gu zh_female_gujie_mars_bigtts Chinese
Cherry Ball zh_female_yingtaowanzi_mars_bigtts Chinese
Advertising Narration zh_male_chunhui_mars_bigtts Chinese
Children's Story zh_female_shaoergushi_mars_bigtts Chinese
Fourth Brother zh_male_silang_mars_bigtts Chinese
Magnetic Narrator (Morgan) zh_male_jieshuonansheng_mars_bigtts Chinese, American English
Chicken Soup Sister (Hope) zh_female_jitangmeimei_mars_bigtts Chinese, American English
Caring Female Voice (Candy) zh_female_tiexinnvsheng_mars_bigtts Chinese, American English
Playful Female Voice zh_female_qiaopinvsheng_mars_bigtts Chinese
Cute Girl (Cutey) zh_female_mengyatou_mars_bigtts Chinese, American English
Lazy Sound Male zh_male_lanxiaoyang_mars_bigtts Chinese
Bright Voice Cute Boy zh_male_dongmanhaimian_mars_bigtts Chinese

Audiobook Reading

Voice Name voice_type Timestamp Language Supported Emotions
Suspense Narration zh_male_changtianyi_mars_bigtts Chinese
Elegant Young Man zh_male_ruyaqingnian_mars_bigtts Chinese
Dominant Young Uncle zh_male_baqiqingshu_mars_bigtts Chinese
Grand Voice zh_male_qingcang_mars_bigtts Chinese
Energetic Young Man zh_male_yangguangqingnian_mars_bigtts Chinese
Ancient Style Young Gentleman zh_female_gufengshaoyu_mars_bigtts Chinese
Tender Lady zh_female_wenroushunv_mars_bigtts Chinese
Anti-rolling Young Man zh_male_fanjuanqingnian_mars_bigtts Chinese

Note:

  1. The above Chinese voice types support mixed Chinese and English scenarios.
  2. The voice types "Sickly Girl," "Lively Girl," "Kind Grandma," and "Neighbor Aunt" do not currently support bidirectional streaming interface calls.

7.Example:

package org.example.websocket.tts.demo;

import com.alibaba.fastjson.annotation.JSONField;
import lombok.Builder;
import lombok.Data;

@Data
@Builder
public class TtsRequest {
    @JSONField(name = "app")
    private App app;
    @JSONField(name = "user")
    private User user;
    @JSONField(name = "audio")
    private Audio audio;
    @JSONField(name = "request")
    private Request request;

    @Data
    @Builder
    public static class App {
        @JSONField(name = "appid")
        private String appid;
        @JSONField(name = "token")
        private String token; // Currently not in effect, using the default value is fine.
        @JSONField(name = "cluster")
        private String cluster;
    }

    @Data
    @Builder
    public static class User {
        @JSONField(name = "uid")
        private String uid;
    }

    @Data
    @Builder
    public static class Audio {
        @JSONField(name = "voice_type")
        private String voiceType;
        @JSONField(name = "emotion")
        private String emotion;
        @JSONField(name = "enable_emotion")
        private Boolean enableEmotion;
        @JSONField(name = "encoding")
        private String encoding;
        @JSONField(name = "speed_ratio")
        private Double speedRatio;
        @JSONField(name = "rate")
        private Integer rate;
        @JSONField(name = "BitRate")
        private Integer BitRate;
        @JSONField(name = "explicit_language")
        private String explicitLanguage;
        @JSONField(name = "context_language")
        private String contextLanguage;
        @JSONField(name = "loudness_ratio")
        private Double loudnessRatio;
        @JSONField(name = "volume_ratio")
        private Double volumeRatio;
        @JSONField(name = "voice")
        private String voice;
        @JSONField(name = "pitch_ratio")
        private Double pitchRatio;
        @JSONField(name = "language")
        private String language;
    }

    @Data
    @Builder
    public static class Request {
        @JSONField(name = "reqid")
        private String reqid;
        @JSONField(name = "text")
        private String text;
        @JSONField(name = "text_type")
        private String textType;
        @JSONField(name = "silence_duration")
        private Double silenceDuration;
        @JSONField(name = "with_timestamp")
        private Integer withTimestamp;
        @JSONField(name = "operation")
        private String operation;
        @JSONField(name = "extra_param")
        private String extraParam;

    }
}
package org.example.websocket.tts;

import com.alibaba.fastjson.JSON;
import lombok.Getter;
import org.example.websocket.tts.demo.TtsRequest;
import org.java_websocket.client.WebSocketClient;
import org.java_websocket.framing.CloseFrame;
import org.java_websocket.handshake.ServerHandshake;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.*;
import java.math.BigInteger;
import java.net.URI;
import java.nio.ByteBuffer;
import java.nio.charset.StandardCharsets;
import java.util.HashMap;
import java.util.Map;
import java.util.UUID;

public class TtsWebsocketDemo {
    private static final Logger log = LoggerFactory.getLogger(TtsWebsocketDemo.class);
    public static final String API_URL = "";

    public static void main(String[] args) throws Exception {
        String accessToken = "";

        TtsRequest ttsRequest;

        ttsRequest = TtsRequest.builder()
                .user(TtsRequest.User.builder()
                        .uid("dadadadwd1e12121rtdawq")
                        .build())
                .audio(TtsRequest.Audio.builder()
                        .encoding("mp3")
                        .voiceType("zh_female_qingchezizi_moon_bigtts")
                        .language("en")
                        .build())
                .request(TtsRequest.Request.builder()
                        .reqid(UUID.randomUUID().toString())
                        .operation("query")
                        .text("我爱中国")
                        .build())
                .build();

        String json = JSON.toJSONString(ttsRequest);

        log.info("request: {}", json);
        HashMap<String, String> authorization = new HashMap<>();
        authorization.put("Authorization", "Bearer " + accessToken);
        authorization.put("ModelName","MaaS_DB_Speech");
        TtsWebsocketClient ttsWebsocketClient = new TtsWebsocketClient(authorization);
        byte[] audio = ttsWebsocketClient.submit(ttsRequest);
        FileOutputStream fos = new FileOutputStream("test4.mp3");
        fos.write(audio);
        fos.close();
        log.info("TTS done.");
    }



    public static class TtsWebsocketClient extends WebSocketClient {
        private final ByteArrayOutputStream buffer = new ByteArrayOutputStream();

        public TtsWebsocketClient(Map<String, String> authorization) {
            super(URI.create(API_URL), authorization);
//            super(URI.create(API_URL));
        }

        public byte[] submit(TtsRequest ttsRequest) throws InterruptedException {
            String json = JSON.toJSONString(ttsRequest);
            log.info("request: {}", json);
            byte[] jsonBytes = json.getBytes(StandardCharsets.UTF_8);
            byte[] header = {0x11, 0x10, 0x10, 0x00};
            ByteBuffer requestByte = ByteBuffer.allocate(8 + jsonBytes.length);
            requestByte.put(header).putInt(jsonBytes.length).put(jsonBytes);

            this.connectBlocking();
            synchronized (this) {
                this.send(requestByte.array());
                wait();
                return this.buffer.toByteArray();
            }
        }

        @Override
        public void onMessage(ByteBuffer bytes) {
            log.info("received message:" + bytes.remaining() + " bytes");
            int protocolVersion = (bytes.get(0) & 0xff) >> 4;
            int headerSize = bytes.get(0) & 0x0f;
            int messageType = (bytes.get(1) & 0xff) >> 4;
            int messageTypeSpecificFlags = bytes.get(1) & 0x0f;
            int serializationMethod = (bytes.get(2) & 0xff) >> 4;
            int messageCompression = bytes.get(2) & 0x0f;
            int reserved = bytes.get(3) & 0xff;
            bytes.position(headerSize * 4);
            byte[] fourByte = new byte[4];
            if (messageType == 11) {
                // Audio-only server response
                log.info("received audio-only response.");
                if (messageTypeSpecificFlags == 0) {
                    // Ack without audio data
                } else {
                    bytes.get(fourByte, 0, 4);
                    int sequenceNumber = new BigInteger(fourByte).intValue();
                    bytes.get(fourByte, 0, 4);
                    int payloadSize = new BigInteger(fourByte).intValue();
                    byte[] payload = new byte[payloadSize];
                    bytes.get(payload, 0, payloadSize);
                    try {
                        this.buffer.write(payload);
                    } catch (IOException e) {
                        throw new RuntimeException(e);
                    }
                    if (sequenceNumber < 0) {
                        // received the last segment
                        this.close(CloseFrame.NORMAL, "received all audio data.");
                    }
                }
            } else if (messageType == 15) {
                // Error message from server
                bytes.get(fourByte, 0, 4);
                int code = new BigInteger(fourByte).intValue();
                bytes.get(fourByte, 0, 4);
                int messageSize = new BigInteger(fourByte).intValue();
                byte[] messageBytes = new byte[messageSize];
                bytes.get(messageBytes, 0, messageSize);
                String message = new String(messageBytes, StandardCharsets.UTF_8);
                throw new TtsException(code, message);
            } else {
                log.warn("Received unknown response message type: {}", messageType);
            }
        }

        @Override
        public void onOpen(ServerHandshake serverHandshake) {
            log.info("opened connection");
        }

        @Override
        public void onMessage(String message) {
            log.info("received message: " + message);
        }

        @Override
        public void onClose(int code, String reason, boolean remote) {
            log.info("Connection closed by {}, Code: {}, Reason: {}", (remote ? "remote" : "us"), code, reason);
            synchronized (this) {
                notify();
            }
        }

        @Override
        public void onError(Exception e) {
            close(CloseFrame.NORMAL, e.toString());
        }
    }

    @Getter
    public static class TtsException extends RuntimeException {
        private final int code;
        private final String message;

        public TtsException(int code, String message) {
            super("code=" + code + ", message=" + message);
            this.code = code;
            this.message = message;
        }
    }
}