MaaS HL SPeech
1. T2A v2 (Synchronous Speech Generation)
URL
https://genaiapi.cloudsway.net/v1/ai/{endpoint}/hailuo/tts/t2a_v2
Request Method
post
Content-Type
application/json
Input Parameters
Field Name | Type | Required | Description |
---|---|---|---|
text | string | Yes | The text to be synthesized, with a length limit of <10000 characters. Line breaks are used to replace paragraph breaks. (To control the interval time in the speech, add <#x#> between words, where x is in seconds, supporting 0.01-99.99 with a maximum of two decimal places). It supports customizing the speech interval between texts to achieve the effect of customizing text speech pause time. Note that the text interval time must be set between two pronounceable texts, and multiple consecutive time intervals cannot be set. |
voice_setting | object | No | Parameters to control the speech rate, volume, pitch, etc. of the generated voice. |
audio_setting | object | No | Parameters to control the sample rate, bit rate, and audio format of the generated voice. |
pronunciation_dict | object | No | Parameters to control voice timbre mixing, which is required alternatively with voice_id. |
timber_weights | array | No | Weights, must be filled in synchronously with voice_id. A maximum of 4 timbre mixtures are supported, with values ranging from [1, 100]. |
stream | bool | No | Whether to stream. Default is false (not enabled). |
language_boost | string | No | Default is null. Enhances the recognition ability for specified minority languages and dialects. Setting it can improve speech performance in specified minority language/dialect scenarios. If the minority language type is not clear, "auto" can be selected, and the model will independently judge the minority language type. Supported values: 'Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'auto' |
subtitle_enable | bool | No | Switch to control whether to enable subtitle service. Default is false. |
output_format | string | No | Parameter to control the form of output results. Optional value is urlhex. Default is hex. This parameter is only effective in non-streaming scenarios; streaming scenarios only support returning hex format. |
voice_setting Parameters
Field Name | Type | Required | Description |
---|---|---|---|
speed | Float | No | Speech rate range [0.5, 2], default value is 1.0; the larger the value, the faster the speech rate. |
vol | Float | No | Volume range (0, 10], default value is 1.0; the larger the value, the higher the volume. |
pitch | int | No | Pitch range [-12, 12], default value is 0. 0 means output with original timbre, and the value must be an integer. |
voice_id | string | Yes | The requested timbre ID. Required alternatively with timber_weights. Supports two types: system timbres (IDs) and replicated timbres (IDs). The system timbres (IDs) are as follows: Green youth voice: male-qn-qingse; Elite youth voice: male-qn-jingying; Domineering youth voice: male-qn-badao; College student voice: male-qn-daxuesheng; Girl voice: female-shaonv; Royal sister voice: female-yujie; Mature female voice: female-chengshu; Sweet female voice: female-tianmei; Male host: presenter_male; Female host: presenter_female; Male audiobook 1: audiobook_male_1; Male audiobook 2: audiobook_male_2; Female audiobook 1: audiobook_female_1; Female audiobook 2: audiobook_female_2; Green youth voice-beta: male-qn-qingse-jingpin; Elite youth voice-beta: male-qn-jingying-jingpin; Domineering youth voice-beta: male-qn-badao-jingpin; College student voice-beta: male-qn-daxuesheng-jingpin; Girl voice-beta: female-shaonv-jingpin; Royal sister voice-beta: female-yujie-jingpin; Mature female voice-beta: female-chengshu-jingpin; Sweet female voice-beta: female-tianmei-jingpin; Smart boy: clever_boy; Cute boy: cute_boy; Cute girl: lovely_girl; Cartoon Zhu Xiaoqi: cartoon_pig; Tsundere younger brother: bingjiao_didi; Handsome boyfriend: junlang_nanyou; Pure schoolboy: chunzhen_xuedi; Cold senior: lengdan_xiongzhang; Domineering young master: badao_shaoye; Sweet Xiaoling: tianxin_xiaoling; Playful cute girl: qiaopi_mengmei; Charming royal sister: wumei_yujie; Coquettish schoolgirl: diadia_xuemei; Elegant senior sister: danya_xuejie; Santa Claus: Santa_Claus; Grinch: Grinch; Rudolph: Rudolph; Arnold: Arnold; Charming Santa: Charming_Santa; Charming Lady: Charming_Lady; Sweet Girl: Sweet_Girl; Cute Elf: Cute_Elf; Attractive Girl: Attractive_Girl; Serene Woman: Serene_Woman |
emotion | string | No | Controls the emotion of the synthesized speech; currently supports 7 emotions: happy, sad, angry, fearful, disgusted, surprised, neutral; parameter range ["happy", "sad", "angry", "fearful", "disgusted", "surprised", "neutral"] |
latex_read | bool | No | Controls whether to support reading LaTeX formulas, default is false. Note: 1. Formulas in the request need to be enclosed with \$\$; 2. If there is "\" in the formula in the request, it needs to be escaped to "\\". Example: The basic formula of derivative is \$\$\\frac{d}{dx}(x^n) = nx^{n-1}\$\$ |
english_normalization | bool | No | This parameter supports English text normalization, which can improve performance in digital reading scenarios but will slightly increase latency. If not provided, the default value is false. |
audio_setting Parameters
Field Name | Type | Required | Description |
---|---|---|---|
sample_rate | int | No | Sample rate range [8000, 16000, 22050, 24000, 32000, 44100], default is 32000. |
bitrate | int | No | Bit rate range [32000, 64000, 128000, 256000], default is 128000, which is only valid for mp3 format audio. |
format | string | No | Generated audio format, default is mp3, optional range [mp3, pcm, flac, wav]. |
channel | int | No | Number of channels, default is 1 (mono), optional values are 1 (mono) or 2 (stereo). |
pronunciation_dict Parameters
Field Name | Type | Required | Description |
---|---|---|---|
tone | list |
No | Replace words, symbols that need special annotation and their corresponding phonetic notations. Replace pronunciation (adjust tone/replace pronunciation of other characters) in the following format: ["燕少飞/(yan4)(shao3)(fei1)", "达菲/(da2)(fei1)", "omg/oh my god"]. Tones are represented by numbers: first tone (yinping) is 1, second tone (yangping) is 2, third tone (shangsheng) is 3, fourth tone (qusheng) is 4, and neutral tone is 5. |
timber_weights Parameters
Field Name | Type | Required | Description |
---|---|---|---|
voice_id | string | No | Requested timbre id. Must be filled in synchronously with the weight parameter. |
weight | int | No | Range [1, 100] weight, must be filled in synchronously with voice_id. A maximum of 4 timbre mixtures are supported, with integer values. The higher the proportion of a single timbre, the more similar the synthesized timbre. |
Output Parameters
Field Name | Parameter Type | Description |
---|---|---|
data | object | The returned data object, which may be null and needs to be checked for non-null. |
trace_id | string | The ID of this session, used to help locate problems during consultation/feedback. |
extra_info | object | Related additional information. |
base_resp | object | If the request fails, the corresponding error status code and details. |
data Parameters
Field Name | Type | Description |
---|---|---|
audio | string | The synthesized audio segment, encoded in hex, generated according to the input-defined format (mp3/pcm/flac). |
subtitle_file | string | The download link of the synthesized subtitle, which corresponds to the audio file, accurate to the sentence (no more than 50 words), in milliseconds, in JSON format. |
status | int | Current audio stream status: 1 means synthesizing, 2 means synthesis completed. |
extra_info Parameters
Field Name | Type | Description |
---|---|---|
audio_length | long | Audio duration, accurate to milliseconds. |
audio_sample_rate | long | Sample rate. |
audio_size | long | Audio size, in bytes. |
bitrate | long | Bit rate. |
audio_format | string | Format of the generated audio file, with values in mp3/pcm/flac. |
audio_channel | long | Number of audio channels generated: 1 for mono, 2 for stereo. |
invisible_character_ratio | double | Proportion of illegal characters. If illegal characters do not exceed 10% (including 10%), the audio will be generated normally and the proportion of illegal characters will be returned; the maximum does not exceed 0.1 (10%), otherwise an error will be reported. |
usage_characters | long | Billing character count, the number of billing characters for this speech generation. |
base_resp Parameters
Field Name | Type | Description |
---|---|---|
status_code | int64 | Status code. 1000: Unknown error; 1001: Timeout; 1002: Rate limited; 1004: Authentication failed; 1039: TPM rate limited; 1042: Illegal characters exceed 10%; 2013: Input format information is abnormal. |
status_msg | string | Status details. |
curl
curl --location --request POST 'https://genaiapi.cloudsway.net/v1/ai/{endpoint}/hailuo/file/upload' \
--header 'Authorization: Bearer {KEY}' \
--form 'purpose="t2a_async_input"' \
--form 'file=@"/C:/Users/xxx/Desktop/FJC/文档.txt"'
2. T2A Large v2 (Asynchronous Long Text Speech Generation)
1. File Upload
URL
https://genaiapi.cloudsway.net/v1/ai/{endpoint}/hailuo/file/upload
Request Method
post
Content-Type
form/data
Input Parameters
Field Name | Type | Required | Description |
---|---|---|---|
purpose | string | Yes | Purpose of the file. Values and supported formats are as follows: t2a_async_input: files used when creating speech generation tasks, supporting documents in txt and zipt formats; |
file | file | Yes | File |
Output Parameters
Field Name | Type | Description |
---|---|---|
file | object | File information object, containing detailed information of the file. |
base_resp | object | Basic response information object, containing status code and status message. |
file Parameters
Field Name | Type | Description |
---|---|---|
file_id | int64 | Unique identifier of the file. |
bytes | int | File size, in bytes. |
created_at | int64 | Creation time of the file, in Unix timestamp format. |
filename | string | File name. |
purpose | string | Purpose of the file, currently t2a_async_input . |
base_resp Parameters
Field Name | Type | Description |
---|---|---|
status_code | int64 | Status code, 0 indicates success. |
status_msg | string | Status message, success indicates the request is successful. |
curl
curl --location --request POST 'https://genaiapi.cloudsway.net/v1/ai/{endpoint}/hailuo/file/upload' \
--header 'Authorization: Bearer {KEY}' \
--form 'purpose="t2a_async_input"' \
--form 'file=@"/C:/Users/xxx/Desktop/FJC/文档.txt"'
2. Create Asynchronous Task
URL
https://genaiapi.cloudsway.net/v1/ai/{endpoint}/hailuo/tts/t2a_async_v2
Request Method
post
Content-Type
application/json
Input Parameters
Field Name | Type | Required | Description |
---|---|---|---|
text | string | No | Text to be synthesized, with a maximum limit of 50,000 characters. Required alternatively with "text_file_id". |
text_file_id | long | No | file_id returned by the file upload interface. ID of the text file to be synthesized. A single txt file has a length limit of <100,000 characters, supporting txt and zip formats. Required alternatively with "text". The format will be automatically verified after input. 1. txt: length limit <100,000 characters (to control the interval time in the speech, add <#x#> between words, where x is in seconds, supporting 0.01-99.99s with a maximum of two decimal places). It supports customizing the speech interval between texts to achieve the effect of customizing text speech pause time. Note that the text interval time must be set between two pronounceable texts, and multiple consecutive time intervals cannot be set; 2. zip: upload as a package, which should only contain txt or json files (files in the compression package are of the same format). A json file can have three fields: ["title", "content", "extra"], representing title, main text, and author respectively. Finally, three results will be generated corresponding to the three fields, with a total of 9 files in one folder. If a field does not exist or its content is empty, the corresponding file will not be generated. |
voice_setting | object | No | Parameters to control the speech rate, volume, pitch, etc. of the generated voice. |
audio_setting | object | No | Parameters to control the sample rate, bit rate, and audio format of the generated voice. |
pronunciation_dict | object | No | Parameters to control voice timbre mixing, which is required alternatively with voice_id. |
language_boost | string | No | Default is null. Enhances the recognition ability for specified minority languages and dialects. Setting it can improve speech performance in specified minority language/dialect scenarios. If the minority language type is not clear, "auto" can be selected, and the model will independently judge the minority language type. Supported values: 'Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'auto' |
voice_setting Parameters
Field Name | Type | Required | Description |
---|---|---|---|
speed | Float | No | Speech rate range [0.5, 2], default value is 1.0; the larger the value, the faster the speech rate. |
vol | Float | No | Volume range (0, 10], default value is 1.0; the larger the value, the higher the volume. |
pitch | int | No | Pitch range [-12, 12], default value is 0. 0 means output with original timbre, and the value must be an integer. |
voice_id | string | Yes | The requested timbre ID, supporting two types: system timbres (IDs) and replicated timbres (IDs). The system timbres (IDs) are as follows: Green youth voice: male-qn-qingse; Elite youth voice: male-qn-jingying; Domineering youth voice: male-qn-badao; College student voice: male-qn-daxuesheng; Girl voice: female-shaonv; Royal sister voice: female-yujie; Mature female voice: female-chengshu; Sweet female voice: female-tianmei; Male host: presenter_male; Female host: presenter_female; Male audiobook 1: audiobook_male_1; Male audiobook 2: audiobook_male_2; Female audiobook 1: audiobook_female_1; Female audiobook 2: audiobook_female_2; Green youth voice-beta: male-qn-qingse-jingpin; Elite youth voice-beta: male-qn-jingying-jingpin; Domineering youth voice-beta: male-qn-badao-jingpin; College student voice-beta: male-qn-daxuesheng-jingpin; Girl voice-beta: female-shaonv-jingpin; Royal sister voice-beta: female-yujie-jingpin; Mature female voice-beta: female-chengshu-jingpin; Sweet female voice-beta: female-tianmei-jingpin; Smart boy: clever_boy; Cute boy: cute_boy; Cute girl: lovely_girl; Cartoon Zhu Xiaoqi: cartoon_pig |
emotion | string | No | Controls the emotion of the synthesized speech; currently supports 7 emotions: happy, sad, angry, fearful, disgusted, surprised, neutral; parameter range ["happy", "sad", "angry", "fearful", "disgusted", "surprised", "neutral"] |
english_normalization | bool | No | This parameter supports English text normalization, which can improve performance in digital reading scenarios but will slightly increase latency. If not provided, the default value is false. |
audio_setting Parameters
Field Name | Type | Required | Description |
---|---|---|---|
sample_rate | int | No | Sample rate range [8000, 16000, 22050, 24000, 32000, 44100], default is 32000. |
bitrate | int | No | Bit rate range [32000, 64000, 128000, 256000], default is 128000, which is only valid for mp3 format audio. |
format | string | No | Generated audio format, default is mp3, optional range [mp3, pcm, flac, wav]. |
channel | int | No | Number of channels, default is 1 (mono), optional values are 1 (mono) or 2 (stereo). |
pronunciation_dict Parameters
Field Name | Type | Required | Description |
---|---|---|---|
tone | list |
No | Replace words, symbols that need special annotation and their corresponding phonetic notations. Replace pronunciation (adjust tone/replace pronunciation of other characters) in the following format: ["燕少飞/(yan4)(shao3)(fei1)", "达菲/(da2)(fei1)", "omg/oh my god"]. Tones are represented by numbers: first tone (yinping) is 1, second tone (yangping) is 2, third tone (shangsheng) is 3, fourth tone (qusheng) is 4, and neutral tone is 5. |
Output Parameters
Field Name | Parameter Type | Description |
---|---|---|
taskId | long | Asynchronous task id |
curl
curl --location --request POST 'https://genaiapi.cloudsway.net/v1/ai/{endpoint}/hailuo/tts/t2a_async_v2' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer {KEY}' \
--data-raw '{
"text": "我爱中国",
"language_boost": "auto",
"voice_setting": {
"voice_id": "audiobook_male_1",
"speed": 1,
"vol": 1,
"pitch": 1
},
"pronunciation_dict": {
"tone": [
"草地/(cao3)(di1)"
]
},
"audio_setting": {
"audio_sample_rate": 32000,
"bitrate": 128000,
"format": "mp3",
"channel": 2
}
}'
3. Get Asynchronous Task Status
URL
https://genaiapi.cloudsway.net/v1/ai/{endpoint}/hailuo/tts/task/{taskId}
Request Method
get
Input Parameters
Field Name | Type | Required | Description |
---|---|---|---|
taskId | string | Yes | taskId returned by the create asynchronous task interface |
Output Parameters
Field Name | Type | Description |
---|---|---|
taskId | string | Unique identifier of the task. |
status | string | Task status, such as SUCCESS , Fail, Processing, Preparing, Queueing. |
fileId | string | Unique identifier of the file. |
curl
curl --location --request GET 'https://genaiapi.cloudsway.net/v1/ai/{endpoint}/hailuo/tts/task/1915360414669643778' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer {KEY}'
4. File Download
URL
https://genaiapi.cloudsway.net/v1/ai/{endpoint}/hailuo/files/retrieve?taskId=19153409980342...\&fileId=261877976...
Request Method
get
Input Parameters
Field Name | Type | Required | Description |
---|---|---|---|
taskId | string | Yes | taskId returned by the get asynchronous task status interface |
fileId | string | Yes | fileId returned by the get asynchronous task status interface |
Output Parameters
Field Name | Type | Description |
---|---|---|
fileId | string | Unique identifier of the file. |
bytes | int | File size, in bytes. |
createdAt | int64 | Creation time of the file, in Unix timestamp format. |
filename | string | File name, including extension. |
purpose | string | Purpose of the file, such as t2a_async . |
mediaUrl | string | File download address, a complete URL with access permission signature. |
expireTime | int64 | Expiration time of the file, in Unix timestamp format, indicating the time when the file's download link becomes invalid. |
curl
curl --location --request GET 'https://genaiapi.cloudsway.net/v1/ai/{endpoint}/hailuo/files/retrieve?taskId=1915340998034292737&fileId=261877976617219' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer {KEY}'