Skip to content

MaaS HL SPeech

1. T2A v2 (Synchronous Speech Generation)

URL

https://genaiapi.cloudsway.net/v1/ai/{endpoint}/hailuo/tts/t2a_v2

Request Method

post

Content-Type

application/json

Input Parameters

Field Name Type Required Description
text string Yes The text to be synthesized, with a length limit of <10000 characters. Line breaks are used to replace paragraph breaks. (To control the interval time in the speech, add <#x#> between words, where x is in seconds, supporting 0.01-99.99 with a maximum of two decimal places). It supports customizing the speech interval between texts to achieve the effect of customizing text speech pause time. Note that the text interval time must be set between two pronounceable texts, and multiple consecutive time intervals cannot be set.
voice_setting object No Parameters to control the speech rate, volume, pitch, etc. of the generated voice.
audio_setting object No Parameters to control the sample rate, bit rate, and audio format of the generated voice.
pronunciation_dict object No Parameters to control voice timbre mixing, which is required alternatively with voice_id.
timber_weights array No Weights, must be filled in synchronously with voice_id. A maximum of 4 timbre mixtures are supported, with values ranging from [1, 100].
stream bool No Whether to stream. Default is false (not enabled).
language_boost string No Default is null. Enhances the recognition ability for specified minority languages and dialects. Setting it can improve speech performance in specified minority language/dialect scenarios. If the minority language type is not clear, "auto" can be selected, and the model will independently judge the minority language type.
Supported values: 'Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'auto'
subtitle_enable bool No Switch to control whether to enable subtitle service. Default is false.
output_format string No Parameter to control the form of output results. Optional value is urlhex. Default is hex. This parameter is only effective in non-streaming scenarios; streaming scenarios only support returning hex format.

voice_setting Parameters

Field Name Type Required Description
speed Float No Speech rate range [0.5, 2], default value is 1.0; the larger the value, the faster the speech rate.
vol Float No Volume range (0, 10], default value is 1.0; the larger the value, the higher the volume.
pitch int No Pitch range [-12, 12], default value is 0. 0 means output with original timbre, and the value must be an integer.
voice_id string Yes The requested timbre ID. Required alternatively with timber_weights. Supports two types: system timbres (IDs) and replicated timbres (IDs).
The system timbres (IDs) are as follows:
Green youth voice: male-qn-qingse;
Elite youth voice: male-qn-jingying;
Domineering youth voice: male-qn-badao;
College student voice: male-qn-daxuesheng;
Girl voice: female-shaonv;
Royal sister voice: female-yujie;
Mature female voice: female-chengshu;
Sweet female voice: female-tianmei;
Male host: presenter_male;
Female host: presenter_female;
Male audiobook 1: audiobook_male_1;
Male audiobook 2: audiobook_male_2;
Female audiobook 1: audiobook_female_1;
Female audiobook 2: audiobook_female_2;
Green youth voice-beta: male-qn-qingse-jingpin;
Elite youth voice-beta: male-qn-jingying-jingpin;
Domineering youth voice-beta: male-qn-badao-jingpin;
College student voice-beta: male-qn-daxuesheng-jingpin;
Girl voice-beta: female-shaonv-jingpin;
Royal sister voice-beta: female-yujie-jingpin;
Mature female voice-beta: female-chengshu-jingpin;
Sweet female voice-beta: female-tianmei-jingpin;
Smart boy: clever_boy;
Cute boy: cute_boy;
Cute girl: lovely_girl;
Cartoon Zhu Xiaoqi: cartoon_pig;
Tsundere younger brother: bingjiao_didi;
Handsome boyfriend: junlang_nanyou;
Pure schoolboy: chunzhen_xuedi;
Cold senior: lengdan_xiongzhang;
Domineering young master: badao_shaoye;
Sweet Xiaoling: tianxin_xiaoling;
Playful cute girl: qiaopi_mengmei;
Charming royal sister: wumei_yujie;
Coquettish schoolgirl: diadia_xuemei;
Elegant senior sister: danya_xuejie;
Santa Claus: Santa_Claus;
Grinch: Grinch;
Rudolph: Rudolph;
Arnold: Arnold;
Charming Santa: Charming_Santa;
Charming Lady: Charming_Lady;
Sweet Girl: Sweet_Girl;
Cute Elf: Cute_Elf;
Attractive Girl: Attractive_Girl;
Serene Woman: Serene_Woman
emotion string No Controls the emotion of the synthesized speech; currently supports 7 emotions: happy, sad, angry, fearful, disgusted, surprised, neutral; parameter range ["happy", "sad", "angry", "fearful", "disgusted", "surprised", "neutral"]
latex_read bool No Controls whether to support reading LaTeX formulas, default is false. Note: 1. Formulas in the request need to be enclosed with \$\$; 2. If there is "\" in the formula in the request, it needs to be escaped to "\\". Example: The basic formula of derivative is \$\$\\frac{d}{dx}(x^n) = nx^{n-1}\$\$
english_normalization bool No This parameter supports English text normalization, which can improve performance in digital reading scenarios but will slightly increase latency. If not provided, the default value is false.

audio_setting Parameters

Field Name Type Required Description
sample_rate int No Sample rate range [8000, 16000, 22050, 24000, 32000, 44100], default is 32000.
bitrate int No Bit rate range [32000, 64000, 128000, 256000], default is 128000, which is only valid for mp3 format audio.
format string No Generated audio format, default is mp3, optional range [mp3, pcm, flac, wav].
channel int No Number of channels, default is 1 (mono), optional values are 1 (mono) or 2 (stereo).

pronunciation_dict Parameters

Field Name Type Required Description
tone list No Replace words, symbols that need special annotation and their corresponding phonetic notations.
Replace pronunciation (adjust tone/replace pronunciation of other characters) in the following format: ["燕少飞/(yan4)(shao3)(fei1)", "达菲/(da2)(fei1)", "omg/oh my god"]. Tones are represented by numbers: first tone (yinping) is 1, second tone (yangping) is 2, third tone (shangsheng) is 3, fourth tone (qusheng) is 4, and neutral tone is 5.

timber_weights Parameters

Field Name Type Required Description
voice_id string No Requested timbre id. Must be filled in synchronously with the weight parameter.
weight int No Range [1, 100] weight, must be filled in synchronously with voice_id. A maximum of 4 timbre mixtures are supported, with integer values. The higher the proportion of a single timbre, the more similar the synthesized timbre.

Output Parameters

Field Name Parameter Type Description
data object The returned data object, which may be null and needs to be checked for non-null.
trace_id string The ID of this session, used to help locate problems during consultation/feedback.
extra_info object Related additional information.
base_resp object If the request fails, the corresponding error status code and details.

data Parameters

Field Name Type Description
audio string The synthesized audio segment, encoded in hex, generated according to the input-defined format (mp3/pcm/flac).
subtitle_file string The download link of the synthesized subtitle, which corresponds to the audio file, accurate to the sentence (no more than 50 words), in milliseconds, in JSON format.
status int Current audio stream status: 1 means synthesizing, 2 means synthesis completed.

extra_info Parameters

Field Name Type Description
audio_length long Audio duration, accurate to milliseconds.
audio_sample_rate long Sample rate.
audio_size long Audio size, in bytes.
bitrate long Bit rate.
audio_format string Format of the generated audio file, with values in mp3/pcm/flac.
audio_channel long Number of audio channels generated: 1 for mono, 2 for stereo.
invisible_character_ratio double Proportion of illegal characters. If illegal characters do not exceed 10% (including 10%), the audio will be generated normally and the proportion of illegal characters will be returned; the maximum does not exceed 0.1 (10%), otherwise an error will be reported.
usage_characters long Billing character count, the number of billing characters for this speech generation.

base_resp Parameters

Field Name Type Description
status_code int64 Status code. 1000: Unknown error; 1001: Timeout; 1002: Rate limited; 1004: Authentication failed; 1039: TPM rate limited; 1042: Illegal characters exceed 10%; 2013: Input format information is abnormal.
status_msg string Status details.

curl

curl --location --request POST 'https://genaiapi.cloudsway.net/v1/ai/{endpoint}/hailuo/file/upload' \
--header 'Authorization: Bearer {KEY}' \
--form 'purpose="t2a_async_input"' \
--form 'file=@"/C:/Users/xxx/Desktop/FJC/文档.txt"'

2. T2A Large v2 (Asynchronous Long Text Speech Generation)

1. File Upload

URL

https://genaiapi.cloudsway.net/v1/ai/{endpoint}/hailuo/file/upload

Request Method

post

Content-Type

form/data

Input Parameters

Field Name Type Required Description
purpose string Yes Purpose of the file. Values and supported formats are as follows: t2a_async_input: files used when creating speech generation tasks, supporting documents in txt and zipt formats;
file file Yes File

Output Parameters

Field Name Type Description
file object File information object, containing detailed information of the file.
base_resp object Basic response information object, containing status code and status message.

file Parameters

Field Name Type Description
file_id int64 Unique identifier of the file.
bytes int File size, in bytes.
created_at int64 Creation time of the file, in Unix timestamp format.
filename string File name.
purpose string Purpose of the file, currently t2a_async_input.

base_resp Parameters

Field Name Type Description
status_code int64 Status code, 0 indicates success.
status_msg string Status message, success indicates the request is successful.

curl

curl --location --request POST 'https://genaiapi.cloudsway.net/v1/ai/{endpoint}/hailuo/file/upload' \
--header 'Authorization: Bearer {KEY}' \
--form 'purpose="t2a_async_input"' \
--form 'file=@"/C:/Users/xxx/Desktop/FJC/文档.txt"'

2. Create Asynchronous Task

URL

https://genaiapi.cloudsway.net/v1/ai/{endpoint}/hailuo/tts/t2a_async_v2

Request Method

post

Content-Type

application/json

Input Parameters

Field Name Type Required Description
text string No Text to be synthesized, with a maximum limit of 50,000 characters. Required alternatively with "text_file_id".
text_file_id long No file_id returned by the file upload interface. ID of the text file to be synthesized. A single txt file has a length limit of <100,000 characters, supporting txt and zip formats. Required alternatively with "text". The format will be automatically verified after input.
1. txt: length limit <100,000 characters (to control the interval time in the speech, add <#x#> between words, where x is in seconds, supporting 0.01-99.99s with a maximum of two decimal places). It supports customizing the speech interval between texts to achieve the effect of customizing text speech pause time. Note that the text interval time must be set between two pronounceable texts, and multiple consecutive time intervals cannot be set;
2. zip: upload as a package, which should only contain txt or json files (files in the compression package are of the same format). A json file can have three fields: ["title", "content", "extra"], representing title, main text, and author respectively. Finally, three results will be generated corresponding to the three fields, with a total of 9 files in one folder. If a field does not exist or its content is empty, the corresponding file will not be generated.
voice_setting object No Parameters to control the speech rate, volume, pitch, etc. of the generated voice.
audio_setting object No Parameters to control the sample rate, bit rate, and audio format of the generated voice.
pronunciation_dict object No Parameters to control voice timbre mixing, which is required alternatively with voice_id.
language_boost string No Default is null. Enhances the recognition ability for specified minority languages and dialects. Setting it can improve speech performance in specified minority language/dialect scenarios. If the minority language type is not clear, "auto" can be selected, and the model will independently judge the minority language type.
Supported values: 'Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'auto'

voice_setting Parameters

Field Name Type Required Description
speed Float No Speech rate range [0.5, 2], default value is 1.0; the larger the value, the faster the speech rate.
vol Float No Volume range (0, 10], default value is 1.0; the larger the value, the higher the volume.
pitch int No Pitch range [-12, 12], default value is 0. 0 means output with original timbre, and the value must be an integer.
voice_id string Yes The requested timbre ID, supporting two types: system timbres (IDs) and replicated timbres (IDs).
The system timbres (IDs) are as follows: Green youth voice: male-qn-qingse;
Elite youth voice: male-qn-jingying;
Domineering youth voice: male-qn-badao;
College student voice: male-qn-daxuesheng;
Girl voice: female-shaonv;
Royal sister voice: female-yujie;
Mature female voice: female-chengshu;
Sweet female voice: female-tianmei;
Male host: presenter_male;
Female host: presenter_female;
Male audiobook 1: audiobook_male_1;
Male audiobook 2: audiobook_male_2;
Female audiobook 1: audiobook_female_1;
Female audiobook 2: audiobook_female_2;
Green youth voice-beta: male-qn-qingse-jingpin;
Elite youth voice-beta: male-qn-jingying-jingpin;
Domineering youth voice-beta: male-qn-badao-jingpin;
College student voice-beta: male-qn-daxuesheng-jingpin;
Girl voice-beta: female-shaonv-jingpin;
Royal sister voice-beta: female-yujie-jingpin;
Mature female voice-beta: female-chengshu-jingpin;
Sweet female voice-beta: female-tianmei-jingpin;
Smart boy: clever_boy;
Cute boy: cute_boy;
Cute girl: lovely_girl;
Cartoon Zhu Xiaoqi: cartoon_pig
emotion string No Controls the emotion of the synthesized speech; currently supports 7 emotions: happy, sad, angry, fearful, disgusted, surprised, neutral; parameter range ["happy", "sad", "angry", "fearful", "disgusted", "surprised", "neutral"]
english_normalization bool No This parameter supports English text normalization, which can improve performance in digital reading scenarios but will slightly increase latency. If not provided, the default value is false.

audio_setting Parameters

Field Name Type Required Description
sample_rate int No Sample rate range [8000, 16000, 22050, 24000, 32000, 44100], default is 32000.
bitrate int No Bit rate range [32000, 64000, 128000, 256000], default is 128000, which is only valid for mp3 format audio.
format string No Generated audio format, default is mp3, optional range [mp3, pcm, flac, wav].
channel int No Number of channels, default is 1 (mono), optional values are 1 (mono) or 2 (stereo).

pronunciation_dict Parameters

Field Name Type Required Description
tone list No Replace words, symbols that need special annotation and their corresponding phonetic notations.
Replace pronunciation (adjust tone/replace pronunciation of other characters) in the following format: ["燕少飞/(yan4)(shao3)(fei1)", "达菲/(da2)(fei1)", "omg/oh my god"]. Tones are represented by numbers: first tone (yinping) is 1, second tone (yangping) is 2, third tone (shangsheng) is 3, fourth tone (qusheng) is 4, and neutral tone is 5.

Output Parameters

Field Name Parameter Type Description
taskId long Asynchronous task id

curl

curl --location --request POST 'https://genaiapi.cloudsway.net/v1/ai/{endpoint}/hailuo/tts/t2a_async_v2' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer {KEY}' \
--data-raw '{
    "text": "我爱中国",
    "language_boost": "auto",
    "voice_setting": {
        "voice_id": "audiobook_male_1",
        "speed": 1,
        "vol": 1,
        "pitch": 1
    },
    "pronunciation_dict": {
        "tone": [
            "草地/(cao3)(di1)"
        ]
    },
    "audio_setting": {
        "audio_sample_rate": 32000,
        "bitrate": 128000,
        "format": "mp3",
        "channel": 2
    }
}'

3. Get Asynchronous Task Status

URL

https://genaiapi.cloudsway.net/v1/ai/{endpoint}/hailuo/tts/task/{taskId}

Request Method

get

Input Parameters

Field Name Type Required Description
taskId string Yes taskId returned by the create asynchronous task interface

Output Parameters

Field Name Type Description
taskId string Unique identifier of the task.
status string Task status, such as SUCCESS, Fail, Processing, Preparing, Queueing.
fileId string Unique identifier of the file.

curl

curl --location --request GET 'https://genaiapi.cloudsway.net/v1/ai/{endpoint}/hailuo/tts/task/1915360414669643778' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer {KEY}'

4. File Download

URL

https://genaiapi.cloudsway.net/v1/ai/{endpoint}/hailuo/files/retrieve?taskId=19153409980342...\&fileId=261877976...

Request Method

get

Input Parameters

Field Name Type Required Description
taskId string Yes taskId returned by the get asynchronous task status interface
fileId string Yes fileId returned by the get asynchronous task status interface

Output Parameters

Field Name Type Description
fileId string Unique identifier of the file.
bytes int File size, in bytes.
createdAt int64 Creation time of the file, in Unix timestamp format.
filename string File name, including extension.
purpose string Purpose of the file, such as t2a_async.
mediaUrl string File download address, a complete URL with access permission signature.
expireTime int64 Expiration time of the file, in Unix timestamp format, indicating the time when the file's download link becomes invalid.

curl

curl --location --request GET 'https://genaiapi.cloudsway.net/v1/ai/{endpoint}/hailuo/files/retrieve?taskId=1915340998034292737&fileId=261877976617219' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer {KEY}'