MaaS_DB_Speech API
1. Interface Description
The interface address is wss://genaiapi.cloudsway.net/ws/api/v1/tts/ws_binary
2. Authentication
Authentication method uses Bearer Token. Add "Authorization": "Bearer {YOUR_ACCESS_KEY}"
and "ModelName":"MaaS_DB_Speech"
to the request header.
3. Request Method
Binary Protocol
Field Description
Field (Size, unit: bit) |
Description |
Values |
Protocol version (4) |
May use different protocol versions in the future, so this field is to keep the client and server consistent in version. |
0b0001 - Version 1 (Currently only version 1) |
Header size (4) |
The actual header size is header size value x 4 bytes. There's a special value 0b1111 indicating the header size is greater than or equal to 60 (15 x 4 bytes), meaning there will be a header extension field. |
0b0001 - Header size = 4 (1 x 4)
0b0010 - Header size = 8 (2 x 4)
0b1010 - Header size = 40 (10 x 4) 0b1110 - Header size = 56 (14 x 4) 0b1111 - Header size is 60 or larger; actual size defined in header extension |
Message type (4) |
Defines the message type. |
0b0001 - full client request.
0b1001 - full server response (deprecated). 0b1011 - Audio-only server response (ACK).
0b1111 - Error message from server (e.g., wrong message type, unsupported serialization method, etc.) |
Message type specific flags (4) |
The meaning of flags depends on the message type. See the message type section for details. |
|
Message serialization method (4) |
Defines the method for serializing the payload. Note: It only makes sense for certain message types (e.g., Audio-only server response 0b1011 doesn't need serialization). |
0b0000 - No serialization (raw bytes) 0b0001 - JSON
0b1111 - Custom type, defined in header extension Note: Currently only JSON format is supported |
Message Compression (4) |
Defines the compression method for the payload. Payload size field is not compressed (if present, depends on message type), and Payload size refers to the size of the payload after compression. Header is not compressed. |
0b0000 - No compression
0b0001 - gzip
0b1111 - Custom compression method, defined in header extension Note: Currently only no compression is supported |
Reserved (8) |
Reserved field, also serves as a boundary (making the entire header size 4 bytes). |
0x00 - Currently only 0 |
Detailed Description of Message Types
Currently, all TTS websocket requests use the full client request format, regardless of "query" or "submit".
Full client request
- Header size is
b0001
(i.e., 4B, no header extension).
- Message type is
b0001
.
- Message type specific flags are fixed at
b0000
.
- Message serialization method is
b0001
JSON. Refer to the table above for fields.
- If gzip compression is used for the payload, the payload size is the size after compression.
Audio-only server response
- Header size should be
b0001
.
- Message type is
b1011
.
- Message type specific flags can have the following values:
b0000
- No sequence number.
b0001
- sequence number > 0.
b0010
or b0011
- sequence number < 0, indicating the last message from the server, at which point the client should merge all audio segments (if there are multiple).
- Message serialization method is
b0000
(raw bytes).
4. Request Parameters
Field |
Meaning |
Level |
Format |
Required |
Notes |
app |
Application config |
1 |
dict |
✓ |
|
appid |
Application ID |
2 |
string |
✓ |
Needs to be applied for |
token |
Application token |
2 |
string |
✓ |
Can be any non-empty string |
cluster |
Business cluster |
2 |
string |
✓ |
volcano_tts |
user |
User config |
1 |
dict |
✓ |
|
uid |
User ID |
2 |
string |
✓ |
Can be any non-empty string, used for server-side logging |
audio |
Audio config |
1 |
dict |
✓ |
|
voice_type |
Voice type |
2 |
string |
✓ |
|
emotion |
Voice emotion |
2 |
string |
|
Set voice emotion. Example: "emotion": "angry" Note: Currently only some voices support emotion settings, and different voices support different emotion ranges. |
enable_emotion |
Enable voice emotion |
2 |
bool |
|
Whether to enable voice emotion, set enable_emotion to true. Example: "enable_emotion": True |
encoding |
Audio encoding format |
2 |
string |
|
wav / pcm / ogg_opus / mp3, default is pcm. Note: wav doesn't support streaming |
speed_ratio |
Speech rate |
2 |
float |
|
[0.8,2], default is 1, usually one decimal place is sufficient |
rate |
Audio sample rate |
2 |
int |
|
Default is 24000, options are 8000, 16000 |
BitRate |
Bit rate |
2 |
int |
|
Can be 16, 96, 128, etc. |
explicit_language |
Explicit language |
2 |
string |
|
Only read text in specified language. If not given, normal Chinese-English mix. crosslingual enables multi-language frontend (including zh/en/ja/es-ms/id/pt-br). zh: mainly Chinese, supports Chinese-English mix. en: English only. ja: Japanese only. es-mx: Mexican Spanish only. id: Indonesian only. pt-br: Brazilian Portuguese only. |
context_language |
Reference language |
2 |
string |
|
Provides reference language for the model. If not given, Western European languages use English. id: Western European languages use Indonesian. es: Western European languages use Mexican Spanish. pt: Western European languages use Brazilian Portuguese. |
loudness_ratio |
Volume adjustment |
2 |
float |
|
[0.5,2], default is 1, usually one decimal place is sufficient. 0.5 represents 0.5 times the original volume, 2 represents 2 times the original volume |
request |
Request config |
1 |
dict |
✓ |
|
reqid |
Request ID |
2 |
string |
✓ |
Need to ensure unique value for each call, UUID recommended |
text |
Text |
2 |
string |
✓ |
Text for speech synthesis, limited to 1024 bytes (UTF-8 encoding) |
text_type |
Text type |
2 |
string |
|
Need to specify when using ssml, value should be "ssml" |
silence_duration |
End-of-sentence silence |
2 |
float |
|
Setting this parameter can add silence duration at the end of the sentence, range 0~30000ms. (Note: The added end-of-sentence silence mainly applies to the end of the input text, not the end of each sentence) If this parameter is enabled, enable_trailing_silence_audio = true must be set under request first |
with_timestamp |
Timestamp related |
2 |
int string |
|
Pass 1 to enable, will return timestamp of text after TN, e.g.: 2025. According to semantics, the text after TN will be "two thousand and twenty-five" or "two zero two five". Note: Multiple punctuations or spaces in the original text will still be processed, but will not affect the continuity of timestamps (only for large model scenarios). Additional note (difference in timestamp principle between small and large models): Small models generate timestamps based on the frontend model, then synthesize audio. When processing timestamps, TN before and after text is mapped, so small models can return timestamps of the original text before TN, i.e., retaining Arabic numerals or special symbols in the original text. Large models synthesize audio after semantic understanding of the input text, then output timestamps by marking the synthesized audio after TN. If not using the text after TN, the output timestamps will not align with the synthesized audio, so large models return timestamps corresponding to the text after TN. |
operation |
Operation |
2 |
string |
✓ |
query (non-streaming, http can only query) / submit (streaming) |
extra_param |
Additional parameters |
2 |
jsonstring |
|
|
disable_markdown_filter |
|
3 |
bool |
|
Whether to enable markdown parsing and filtering. When true, parse and filter markdown syntax, e.g., hello will be read as "hello". When false, no parsing or filtering, e.g., hello will be read as "star star 'hello' star star". Example: "disable_markdown_filter": True |
enable_latex_tn |
|
3 |
bool |
|
Whether to allow reading LaTeX formulas, need to set disable_markdown_filter to true. Example: "enable_latex_tn": True |
Notes:
- Character-level timestamp capability is now supported (not supported for ssml text type)
- Pitch adjustment is not currently supported
- Large model voice types support Chinese-English mix
- Large model non-bidirectional streaming now supports LaTeX formulas
- After successful websocket handshake, these Response headers will be returned
{
"user": {
"uid": "uid123"
},
"audio": {
"voice_type": "zh_male_M392_conversation_wvae_bigtts",
"encoding": "mp3",
"speed_ratio": 1.0,
},
"request": {
"reqid": "uuid",
"text": "I love China",
"operation": "query",
}
}
5. Notes
- A single websocket connection only supports one synthesis. For multiple syntheses, multiple connections need to be established
- The reqid parameter needs to be reset for each synthesis and must be unique (using uuid.V4 is recommended)
- operation needs to be set to submit
Return Code Description
Error Code |
Error Description |
Example |
Suggested Action |
3000 |
Correct request |
Normal synthesis |
Process normally |
3001 |
Invalid request |
Some parameter values are illegal, such as incorrect operation configuration |
Check parameters |
3003 |
Concurrency exceeded |
Exceeds the concurrency threshold set online |
Retry; switch offline if using sdk |
3005 |
Backend service busy |
Backend server load is high |
Retry; switch offline if using sdk |
3006 |
Service interrupted |
Request with the same reqid after a request has completed/failed |
Check parameters |
3010 |
Text length exceeded |
Single request exceeds the set text length threshold |
Check parameters |
3011 |
Invalid text |
Parameters are incorrect or text is empty, text doesn't match the language, text only contains punctuation |
Check parameters |
3030 |
Processing timeout |
Single request exceeds the service's maximum time limit |
Retry or check text |
3031 |
Processing error |
Backend exception occurred |
Retry; switch offline if using sdk |
3032 |
Audio retrieval timeout |
Backend network exception |
Retry; switch offline if using sdk |
3040 |
Backend link error |
Backend network exception |
Retry |
3050 |
Voice does not exist |
Check the voice_type code used |
Check parameters |
6. Voice List
Multi-Emotion
Voice Name |
voice_type |
Timestamp |
Language |
Supported Emotions |
Beijing Young Man (Multi-emotion) |
zh_male_beijingxiaoye_emo_v2_mars_bigtts |
√ |
Chinese |
Corresponding emotion parameters: angry, surprised, fear, excited, coldness, neutral |
Gentle Girlfriend (Multi-emotion) |
zh_female_roumeinvyou_emo_v2_mars_bigtts |
√ |
Chinese |
Corresponding emotion parameters: happy, sad, angry, surprised, fear, hate, excited, coldness, neutral |
Sunny Youth (Multi-emotion) |
zh_male_yangguangqingnian_emo_v2_mars_bigtts |
√ |
Chinese |
Corresponding emotion parameters: happy, sad, angry, fear, excited, coldness, neutral |
Charming Girlfriend (Multi-emotion) |
zh_female_meilinvyou_emo_v2_mars_bigtts |
√ |
Chinese |
Corresponding emotion parameters: sad, fear, neutral |
Refreshing Sisi (Multi-emotion) |
zh_female_shuangkuaisisi_emo_v2_mars_bigtts |
√ |
Chinese, American English |
Corresponding emotion parameters: happy, sad, angry, surprised, excited, coldness, neutral |
General Scenarios
Voice Name |
voice_type |
Timestamp |
Language |
Supported Emotions |
Cancan/Shiny |
zh_female_cancan_mars_bigtts |
√ |
Chinese, American English |
|
Fresh Female Voice |
zh_female_qingxinnvsheng_mars_bigtts |
√ |
Chinese |
|
Refreshing Sisi/Skye |
zh_female_shuangkuaisisi_moon_bigtts |
√ |
Chinese, American English |
|
Warm Ahu/Alvin |
zh_male_wennuanahu_moon_bigtts |
√ |
Chinese, American English |
|
Young Zixin/Brayan |
zh_male_shaonianzixin_moon_bigtts |
√ |
Chinese, American English |
|
Intellectual Female Voice |
zh_female_zhixingnvsheng_mars_bigtts |
√ |
Chinese |
|
Refreshing Male Voice |
zh_male_qingshuangnanda_mars_bigtts |
√ |
Chinese |
|
Girl Next Door |
zh_female_linjianvhai_moon_bigtts |
√ |
Chinese |
|
Knowledgeable Uncle |
zh_male_yuanboxiaoshu_moon_bigtts |
√ |
Chinese |
|
Sunny Youth |
zh_male_yangguangqingnian_moon_bigtts |
√ |
Chinese |
|
Sweet Xiaoyuan |
zh_female_tianmeixiaoyuan_moon_bigtts |
√ |
Chinese |
|
Clear Zizi |
zh_female_qingchezizi_moon_bigtts |
√ |
Chinese |
|
Explanatory Xiaoming |
zh_male_jieshuoxiaoming_moon_bigtts |
√ |
Chinese |
|
Cheerful Sister |
zh_female_kailangjiejie_moon_bigtts |
√ |
Chinese |
|
Boy Next Door |
zh_male_linjiananhai_moon_bigtts |
√ |
Chinese |
|
Sweet Yueyue |
zh_female_tianmeiyueyue_moon_bigtts |
√ |
Chinese |
|
Chicken Soup |
zh_female_xinlingjitang_moon_bigtts |
√ |
Chinese |
|
Intellectual and Gentle |
ICL_zh_female_zhixingwenwan_tob |
√ |
Chinese |
|
Warm and Considerate |
ICL_zh_male_nuanxintitie_tob |
√ |
Chinese |
|
Gentle and Elegant |
ICL_zh_female_wenrouwenya_tob |
√ |
Chinese |
|
Cheerful and Lively |
ICL_zh_male_kailangqingkuai_tob |
√ |
Chinese |
|
Lively and Bright |
ICL_zh_male_huoposhuanglang_tob |
√ |
Chinese |
|
Straightforward Guy |
ICL_zh_male_shuaizhenxiaohuo_tob |
√ |
Chinese |
|
Gentle Brother |
zh_male_wenrouxiaoge_mars_bigtts |
√ |
Chinese |
|
Smith |
en_male_smith_mars_bigtts |
√ |
British English |
|
Anna |
en_female_anna_mars_bigtts |
√ |
British English |
|
Adam |
en_male_adam_mars_bigtts |
√ |
American English |
|
Sarah |
en_female_sarah_mars_bigtts |
√ |
Australian English |
|
Dryw |
en_male_dryw_mars_bigtts |
√ |
Australian English |
|
Multi-language
Voice Name |
voice_type |
Timestamp |
Language |
Supported Emotions |
かずね(和音)/Javier or Álvaro |
multi_male_jingqiangkanye_moon_bigtts |
√ |
Japanese, Spanish |
|
はるこ(晴子)/Esmeralda |
multi_female_shuangkuaisisi_moon_bigtts |
√ |
Japanese, Spanish |
|
ひろし(広志)/Roberto |
multi_male_wanqudashu_moon_bigtts |
√ |
Japanese, Spanish |
|
あけみ(朱美) |
multi_female_gaolengyujie_moon_bigtts |
√ |
Japanese |
|
Amanda |
en_female_amanda_mars_bigtts |
√ |
American English |
|
Jackson |
en_male_jackson_mars_bigtts |
√ |
American English |
|
Fun Accents
Voice Name |
voice_type |
Timestamp |
Language |
Supported Emotions |
Beijing Accent Kanye/Harmony |
zh_male_jingqiangkanye_moon_bigtts |
√ |
Chinese-Beijing accent, English |
|
Taiwan Little He |
zh_female_wanwanxiaohe_moon_bigtts |
√ |
Chinese-Taiwan accent |
|
Bay Area Uncle |
zh_female_wanqudashu_moon_bigtts |
√ |
Chinese-Cantonese accent |
|
Cute Sichuan Girl |
zh_female_daimengchuanmei_moon_bigtts |
√ |
Chinese-Sichuan accent |
|
Guangzhou Dege |
zh_male_guozhoudege_moon_bigtts |
√ |
Chinese-Cantonese accent |
|
Beijing Young Man |
zh_male_beijingxiaoye_moon_bigtts |
√ |
Chinese-Beijing accent |
|
Haoyu Brother |
zh_male_haoyuxiaoge_moon_bigtts |
√ |
Chinese-Qingdao accent |
|
Guangxi Yuanzhou |
zh_male_guangxiyuanzhou_moon_bigtts |
√ |
Chinese-Guangxi accent |
|
Changsha Meituo Jie'er |
zh_female_meituojieer_moon_bigtts |
√ |
Chinese-Changsha accent |
|
Henan Yuzhou Zixuan |
zh_male_yuzhouzixuan_moon_bigtts |
√ |
Chinese-Henan accent |
|
Role-playing
Voice Name |
voice_type |
Timestamp |
Language |
Supported Emotions |
Cute Baby |
zh_male_naiqimengwa_mars_bigtts |
√ |
Chinese |
|
Grandma |
zh_female_popo_mars_bigtts |
√ |
Chinese |
|
Cool Sister |
zh_female_gaolengyujie_moon_bigtts |
√ |
Chinese |
|
Arrogant Boss |
zh_male_aojiaobazong_moon_bigtts |
√ |
Chinese |
|
Charming Girlfriend |
zh_female_meilinvyou_moon_bigtts |
√ |
Chinese |
|
Late Night Podcaster |
zh_male_shenyeboke_moon_bigtts |
√ |
Chinese |
|
Gentle Girlfriend |
zh_female_sajiaonvyou_moon_bigtts |
√ |
Chinese |
|
Coquettish Junior |
zh_female_yuanqinvyou_moon_bigtts |
√ |
Chinese |
|
Frail Girl |
ICL_zh_female_bingruoshaonv_tob |
√ |
Chinese |
|
Lively Girl |
ICL_zh_female_huoponvhai_tob |
√ |
Chinese |
|
Oriental Haoran |
zh_male_dongfanghaoran_moon_bigtts |
√ |
Chinese |
|
Green Tea Brother |
ICL_zh_male_lvchaxiaoge_tob |
√ |
Chinese |
|
Delicate Loli |
ICL_zh_female_jiaoruoluoli_tob |
√ |
Chinese |
|
Cold and Distant |
ICL_zh_male_lengdanshuli_tob |
√ |
Chinese |
|
Honest and Solid |
ICL_zh_male_hanhoudunshi_tob |
√ |
Chinese |
|
Arrogant and Domineering |
ICL_zh_male_aiqilingren_tob |
√ |
Chinese |
|
Lively and Mischievous |
ICL_zh_female_huopodiaoman_tob |
√ |
Chinese |
|
Stubborn and Yandere |
ICL_zh_male_guzhibingjiao_tob |
√ |
Chinese |
|
Coquettish and Clingy |
ICL_zh_male_sajiaonianren_tob |
√ |
Chinese |
|
Arrogant and High-pitched |
ICL_zh_female_aomanjiaosheng_tob |
√ |
Chinese |
|
Casual and Free-spirited |
ICL_zh_male_xiaosasuixing_tob |
√ |
Chinese |
|
Cunning Young Master |
ICL_zh_male_fuheigongzi_tob |
√ |
Chinese |
|
Mysterious and Enigmatic |
ICL_zh_male_guiyishenmi_tob |
√ |
Chinese |
|
Elegant and Talented |
ICL_zh_male_ruyacaijun_tob |
√ |
Chinese |
|
Yandere White Lotus |
ICL_zh_male_bingjiaobailian_tob |
√ |
Chinese |
|
Upright Youth |
ICL_zh_male_zhengzhiqingnian_tob |
√ |
Chinese |
|
Coquettish Queen |
ICL_zh_female_jiaohannvwang_tob |
√ |
Chinese |
|
Yandere Cute Girl |
ICL_zh_female_bingjiaomengmei_tob |
√ |
Chinese |
|
Inexperienced Young Man |
ICL_zh_male_qingsenaigou_tob |
√ |
Chinese |
|
Innocent Junior |
ICL_zh_male_chunzhenxuedi_tob |
√ |
Chinese |
|
Caring Senior Sister |
ICL_zh_female_nuanxinxuejie_tob |
√ |
Chinese |
|
Cute Girl |
ICL_zh_female_keainvsheng_tob |
√ |
Chinese |
|
Mature Sister |
ICL_zh_female_chengshujiejie_tob |
√ |
Chinese |
|
Yandere Sister |
ICL_zh_female_bingjiaojiejie_tob |
√ |
Chinese |
|
Indecisive Leader |
ICL_zh_male_youroubangzhu_tob |
√ |
Chinese |
|
Indecisive Young Master |
ICL_zh_male_yourougongzi_tob |
√ |
Chinese |
|
Charming Sister |
ICL_zh_female_wumeiyujie_tob |
√ |
Chinese |
|
Mischievous Princess |
ICL_zh_female_tiaopigongzhu_tob |
√ |
Chinese |
|
Tsundere Girlfriend |
ICL_zh_female_aojiaonvyou_tob |
√ |
Chinese |
|
Caring Boyfriend |
ICL_zh_male_tiexinnanyou_tob |
√ |
Chinese |
|
Young General |
ICL_zh_male_shaonianjiangjun_tob |
√ |
Chinese |
|
Caring Girlfriend |
ICL_zh_female_tiexinnvyou_tob |
√ |
Chinese |
|
Yandere Brother |
ICL_zh_male_bingjiaogege_tob |
√ |
Chinese |
|
Top Student Male Classmate |
ICL_zh_male_xuebanantongzhuo_tob |
√ |
Chinese |
|
Humorous Uncle |
ICL_zh_male_youmoshushu_tob |
√ |
Chinese |
|
Sexy Sister |
ICL_zh_female_xingganyujie_tob |
√ |
Chinese |
|
Tomboy |
ICL_zh_female_jiaxiaozi_tob |
√ |
Chinese |
|
Cold Superior |
ICL_zh_male_lengjunshangsi_tob |
√ |
Chinese |
|
Gentle Male Classmate |
ICL_zh_male_wenrounantongzhuo_tob |
√ |
Chinese |
|
Yandere Younger Brother |
ICL_zh_male_bingjiaodidi_tob |
√ |
Chinese |
|
Humorous Old Man |
ICL_zh_male_youmodaye_tob |
√ |
Chinese |
|
Arrogant Young Master |
ICL_zh_male_aomanshaoye_tob |
√ |
Chinese |
|
Mysterious Mage |
ICL_zh_male_shenmifashi_tob |
√ |
Chinese |
|
Video Configuration
Voice Name |
voice_type |
Timestamp |
Language |
Supported Emotions |
Kind Grandma |
ICL_zh_female_heainainai_tob |
√ |
Chinese |
|
Neighbor Aunt |
ICL_zh_female_linjuayi_tob |
√ |
Chinese |
|
Gentle Little Ya |
zh_female_wenrouxiaoya_moon_bigtts |
√ |
Chinese |
|
Genius Child Voice |
zh_male_tiancaitongsheng_mars_bigtts |
√ |
Chinese |
|
Monkey Brother |
zh_male_sunwukong_mars_bigtts |
√ |
Chinese |
|
Bear Two |
zh_male_xionger_mars_bigtts |
√ |
Chinese |
|
Peppa Pig |
zh_female_peiqi_mars_bigtts |
√ |
Chinese |
|
Wu Zetian |
zh_female_wuzetian_mars_bigtts |
√ |
Chinese |
|
Sister Gu |
zh_female_gujie_mars_bigtts |
√ |
Chinese |
|
Cherry Ball |
zh_female_yingtaowanzi_mars_bigtts |
√ |
Chinese |
|
Advertising Narration |
zh_male_chunhui_mars_bigtts |
√ |
Chinese |
|
Children's Story |
zh_female_shaoergushi_mars_bigtts |
√ |
Chinese |
|
Fourth Brother |
zh_male_silang_mars_bigtts |
√ |
Chinese |
|
Magnetic Narrator (Morgan) |
zh_male_jieshuonansheng_mars_bigtts |
√ |
Chinese, American English |
|
Chicken Soup Sister (Hope) |
zh_female_jitangmeimei_mars_bigtts |
√ |
Chinese, American English |
|
Caring Female Voice (Candy) |
zh_female_tiexinnvsheng_mars_bigtts |
√ |
Chinese, American English |
|
Playful Female Voice |
zh_female_qiaopinvsheng_mars_bigtts |
√ |
Chinese |
|
Cute Girl (Cutey) |
zh_female_mengyatou_mars_bigtts |
√ |
Chinese, American English |
|
Lazy Sound Male |
zh_male_lanxiaoyang_mars_bigtts |
√ |
Chinese |
|
Bright Voice Cute Boy |
zh_male_dongmanhaimian_mars_bigtts |
√ |
Chinese |
|
Audiobook Reading
Voice Name |
voice_type |
Timestamp |
Language |
Supported Emotions |
Suspense Narration |
zh_male_changtianyi_mars_bigtts |
√ |
Chinese |
|
Elegant Young Man |
zh_male_ruyaqingnian_mars_bigtts |
√ |
Chinese |
|
Dominant Young Uncle |
zh_male_baqiqingshu_mars_bigtts |
√ |
Chinese |
|
Grand Voice |
zh_male_qingcang_mars_bigtts |
√ |
Chinese |
|
Energetic Young Man |
zh_male_yangguangqingnian_mars_bigtts |
√ |
Chinese |
|
Ancient Style Young Gentleman |
zh_female_gufengshaoyu_mars_bigtts |
√ |
Chinese |
|
Tender Lady |
zh_female_wenroushunv_mars_bigtts |
√ |
Chinese |
|
Anti-rolling Young Man |
zh_male_fanjuanqingnian_mars_bigtts |
√ |
Chinese |
|
Note:
- The above Chinese voice types support mixed Chinese and English scenarios.
- The voice types "Sickly Girl," "Lively Girl," "Kind Grandma," and "Neighbor Aunt" do not currently support bidirectional streaming interface calls.
7.Example:
package org.example.websocket.tts.demo;
import com.alibaba.fastjson.annotation.JSONField;
import lombok.Builder;
import lombok.Data;
@Data
@Builder
public class TtsRequest {
@JSONField(name = "app")
private App app;
@JSONField(name = "user")
private User user;
@JSONField(name = "audio")
private Audio audio;
@JSONField(name = "request")
private Request request;
@Data
@Builder
public static class App {
@JSONField(name = "appid")
private String appid;
@JSONField(name = "token")
private String token; // Currently not in effect, using the default value is fine.
@JSONField(name = "cluster")
private String cluster;
}
@Data
@Builder
public static class User {
@JSONField(name = "uid")
private String uid;
}
@Data
@Builder
public static class Audio {
@JSONField(name = "voice_type")
private String voiceType;
@JSONField(name = "emotion")
private String emotion;
@JSONField(name = "enable_emotion")
private Boolean enableEmotion;
@JSONField(name = "encoding")
private String encoding;
@JSONField(name = "speed_ratio")
private Double speedRatio;
@JSONField(name = "rate")
private Integer rate;
@JSONField(name = "BitRate")
private Integer BitRate;
@JSONField(name = "explicit_language")
private String explicitLanguage;
@JSONField(name = "context_language")
private String contextLanguage;
@JSONField(name = "loudness_ratio")
private Double loudnessRatio;
@JSONField(name = "volume_ratio")
private Double volumeRatio;
@JSONField(name = "voice")
private String voice;
@JSONField(name = "pitch_ratio")
private Double pitchRatio;
@JSONField(name = "language")
private String language;
}
@Data
@Builder
public static class Request {
@JSONField(name = "reqid")
private String reqid;
@JSONField(name = "text")
private String text;
@JSONField(name = "text_type")
private String textType;
@JSONField(name = "silence_duration")
private Double silenceDuration;
@JSONField(name = "with_timestamp")
private Integer withTimestamp;
@JSONField(name = "operation")
private String operation;
@JSONField(name = "extra_param")
private String extraParam;
}
}
package org.example.websocket.tts;
import com.alibaba.fastjson.JSON;
import lombok.Getter;
import org.example.websocket.tts.demo.TtsRequest;
import org.java_websocket.client.WebSocketClient;
import org.java_websocket.framing.CloseFrame;
import org.java_websocket.handshake.ServerHandshake;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.*;
import java.math.BigInteger;
import java.net.URI;
import java.nio.ByteBuffer;
import java.nio.charset.StandardCharsets;
import java.util.HashMap;
import java.util.Map;
import java.util.UUID;
public class TtsWebsocketDemo {
private static final Logger log = LoggerFactory.getLogger(TtsWebsocketDemo.class);
public static final String API_URL = "";
public static void main(String[] args) throws Exception {
String accessToken = "";
TtsRequest ttsRequest;
ttsRequest = TtsRequest.builder()
.user(TtsRequest.User.builder()
.uid("dadadadwd1e12121rtdawq")
.build())
.audio(TtsRequest.Audio.builder()
.encoding("mp3")
.voiceType("zh_female_qingchezizi_moon_bigtts")
.language("en")
.build())
.request(TtsRequest.Request.builder()
.reqid(UUID.randomUUID().toString())
.operation("query")
.text("我爱中国")
.build())
.build();
String json = JSON.toJSONString(ttsRequest);
log.info("request: {}", json);
HashMap<String, String> authorization = new HashMap<>();
authorization.put("Authorization", "Bearer " + accessToken);
authorization.put("ModelName","MaaS_DB_Speech");
TtsWebsocketClient ttsWebsocketClient = new TtsWebsocketClient(authorization);
byte[] audio = ttsWebsocketClient.submit(ttsRequest);
FileOutputStream fos = new FileOutputStream("test4.mp3");
fos.write(audio);
fos.close();
log.info("TTS done.");
}
public static class TtsWebsocketClient extends WebSocketClient {
private final ByteArrayOutputStream buffer = new ByteArrayOutputStream();
public TtsWebsocketClient(Map<String, String> authorization) {
super(URI.create(API_URL), authorization);
// super(URI.create(API_URL));
}
public byte[] submit(TtsRequest ttsRequest) throws InterruptedException {
String json = JSON.toJSONString(ttsRequest);
log.info("request: {}", json);
byte[] jsonBytes = json.getBytes(StandardCharsets.UTF_8);
byte[] header = {0x11, 0x10, 0x10, 0x00};
ByteBuffer requestByte = ByteBuffer.allocate(8 + jsonBytes.length);
requestByte.put(header).putInt(jsonBytes.length).put(jsonBytes);
this.connectBlocking();
synchronized (this) {
this.send(requestByte.array());
wait();
return this.buffer.toByteArray();
}
}
@Override
public void onMessage(ByteBuffer bytes) {
log.info("received message:" + bytes.remaining() + " bytes");
int protocolVersion = (bytes.get(0) & 0xff) >> 4;
int headerSize = bytes.get(0) & 0x0f;
int messageType = (bytes.get(1) & 0xff) >> 4;
int messageTypeSpecificFlags = bytes.get(1) & 0x0f;
int serializationMethod = (bytes.get(2) & 0xff) >> 4;
int messageCompression = bytes.get(2) & 0x0f;
int reserved = bytes.get(3) & 0xff;
bytes.position(headerSize * 4);
byte[] fourByte = new byte[4];
if (messageType == 11) {
// Audio-only server response
log.info("received audio-only response.");
if (messageTypeSpecificFlags == 0) {
// Ack without audio data
} else {
bytes.get(fourByte, 0, 4);
int sequenceNumber = new BigInteger(fourByte).intValue();
bytes.get(fourByte, 0, 4);
int payloadSize = new BigInteger(fourByte).intValue();
byte[] payload = new byte[payloadSize];
bytes.get(payload, 0, payloadSize);
try {
this.buffer.write(payload);
} catch (IOException e) {
throw new RuntimeException(e);
}
if (sequenceNumber < 0) {
// received the last segment
this.close(CloseFrame.NORMAL, "received all audio data.");
}
}
} else if (messageType == 15) {
// Error message from server
bytes.get(fourByte, 0, 4);
int code = new BigInteger(fourByte).intValue();
bytes.get(fourByte, 0, 4);
int messageSize = new BigInteger(fourByte).intValue();
byte[] messageBytes = new byte[messageSize];
bytes.get(messageBytes, 0, messageSize);
String message = new String(messageBytes, StandardCharsets.UTF_8);
throw new TtsException(code, message);
} else {
log.warn("Received unknown response message type: {}", messageType);
}
}
@Override
public void onOpen(ServerHandshake serverHandshake) {
log.info("opened connection");
}
@Override
public void onMessage(String message) {
log.info("received message: " + message);
}
@Override
public void onClose(int code, String reason, boolean remote) {
log.info("Connection closed by {}, Code: {}, Reason: {}", (remote ? "remote" : "us"), code, reason);
synchronized (this) {
notify();
}
}
@Override
public void onError(Exception e) {
close(CloseFrame.NORMAL, e.toString());
}
}
@Getter
public static class TtsException extends RuntimeException {
private final int code;
private final String message;
public TtsException(int code, String message) {
super("code=" + code + ", message=" + message);
this.code = code;
this.message = message;
}
}
}