MaaS HL SPeech

1. T2A v2 (Synchronous Speech Generation)

URL

https://genaiapi.cloudsway.net/v1/ai/{endpoint}/hailuo/tts/t2a_v2

Request Method

post

Content-Type

application/json

Input Parameters

Field Name	Type	Required	Description
text	string	Yes	The text to be synthesized, with a length limit of <10000 characters. Line breaks are used to replace paragraph breaks. (To control the interval time in the speech, add <#x#> between words, where x is in seconds, supporting 0.01-99.99 with a maximum of two decimal places). It supports customizing the speech interval between texts to achieve the effect of customizing text speech pause time. Note that the text interval time must be set between two pronounceable texts, and multiple consecutive time intervals cannot be set.
voice_setting	object	No	Parameters to control the speech rate, volume, pitch, etc. of the generated voice.
audio_setting	object	No	Parameters to control the sample rate, bit rate, and audio format of the generated voice.
pronunciation_dict	object	No	Parameters to control voice timbre mixing, which is required alternatively with voice_id.
timber_weights	array	No	Weights, must be filled in synchronously with voice_id. A maximum of 4 timbre mixtures are supported, with values ranging from [1, 100].
stream	bool	No	Whether to stream. Default is false (not enabled).
language_boost	string	No	Default is null. Enhances the recognition ability for specified minority languages and dialects. Setting it can improve speech performance in specified minority language/dialect scenarios. If the minority language type is not clear, "auto" can be selected, and the model will independently judge the minority language type. Supported values: 'Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'auto'
subtitle_enable	bool	No	Switch to control whether to enable subtitle service. Default is false.
output_format	string	No	Parameter to control the form of output results. Optional value is urlhex. Default is hex. This parameter is only effective in non-streaming scenarios; streaming scenarios only support returning hex format.

voice_setting Parameters

Field Name	Type	Required	Description
speed	Float	No	Speech rate range [0.5, 2], default value is 1.0; the larger the value, the faster the speech rate.
vol	Float	No	Volume range (0, 10], default value is 1.0; the larger the value, the higher the volume.
pitch	int	No	Pitch range [-12, 12], default value is 0. 0 means output with original timbre, and the value must be an integer.
voice_id	string	Yes	The requested timbre ID. Required alternatively with timber_weights. Supports two types: system timbres (IDs) and replicated timbres (IDs). The system timbres (IDs) are as follows: Green youth voice: male-qn-qingse; Elite youth voice: male-qn-jingying; Domineering youth voice: male-qn-badao; College student voice: male-qn-daxuesheng; Girl voice: female-shaonv; Royal sister voice: female-yujie; Mature female voice: female-chengshu; Sweet female voice: female-tianmei; Male host: presenter_male; Female host: presenter_female; Male audiobook 1: audiobook_male_1; Male audiobook 2: audiobook_male_2; Female audiobook 1: audiobook_female_1; Female audiobook 2: audiobook_female_2; Green youth voice-beta: male-qn-qingse-jingpin; Elite youth voice-beta: male-qn-jingying-jingpin; Domineering youth voice-beta: male-qn-badao-jingpin; College student voice-beta: male-qn-daxuesheng-jingpin; Girl voice-beta: female-shaonv-jingpin; Royal sister voice-beta: female-yujie-jingpin; Mature female voice-beta: female-chengshu-jingpin; Sweet female voice-beta: female-tianmei-jingpin; Smart boy: clever_boy; Cute boy: cute_boy; Cute girl: lovely_girl; Cartoon Zhu Xiaoqi: cartoon_pig; Tsundere younger brother: bingjiao_didi; Handsome boyfriend: junlang_nanyou; Pure schoolboy: chunzhen_xuedi; Cold senior: lengdan_xiongzhang; Domineering young master: badao_shaoye; Sweet Xiaoling: tianxin_xiaoling; Playful cute girl: qiaopi_mengmei; Charming royal sister: wumei_yujie; Coquettish schoolgirl: diadia_xuemei; Elegant senior sister: danya_xuejie; Santa Claus: Santa_Claus; Grinch: Grinch; Rudolph: Rudolph; Arnold: Arnold; Charming Santa: Charming_Santa; Charming Lady: Charming_Lady; Sweet Girl: Sweet_Girl; Cute Elf: Cute_Elf; Attractive Girl: Attractive_Girl; Serene Woman: Serene_Woman
emotion	string	No	Controls the emotion of the synthesized speech; currently supports 7 emotions: happy, sad, angry, fearful, disgusted, surprised, neutral; parameter range ["happy", "sad", "angry", "fearful", "disgusted", "surprised", "neutral"]
latex_read	bool	No	Controls whether to support reading LaTeX formulas, default is false. Note: 1. Formulas in the request need to be enclosed with \$\$; 2. If there is "\" in the formula in the request, it needs to be escaped to "\\". Example: The basic formula of derivative is \$\$\\frac{d}{dx}(x^n) = nx^{n-1}\$\$
english_normalization	bool	No	This parameter supports English text normalization, which can improve performance in digital reading scenarios but will slightly increase latency. If not provided, the default value is false.

audio_setting Parameters

Field Name	Type	Required	Description
sample_rate	int	No	Sample rate range [8000, 16000, 22050, 24000, 32000, 44100], default is 32000.
bitrate	int	No	Bit rate range [32000, 64000, 128000, 256000], default is 128000, which is only valid for mp3 format audio.
format	string	No	Generated audio format, default is mp3, optional range [mp3, pcm, flac, wav].
channel	int	No	Number of channels, default is 1 (mono), optional values are 1 (mono) or 2 (stereo).

pronunciation_dict Parameters

Field Name	Type	Required	Description
tone	list	No	Replace words, symbols that need special annotation and their corresponding phonetic notations. Replace pronunciation (adjust tone/replace pronunciation of other characters) in the following format: ["燕少飞/(yan4)(shao3)(fei1)", "达菲/(da2)(fei1)", "omg/oh my god"]. Tones are represented by numbers: first tone (yinping) is 1, second tone (yangping) is 2, third tone (shangsheng) is 3, fourth tone (qusheng) is 4, and neutral tone is 5.

timber_weights Parameters

Field Name	Type	Required	Description
voice_id	string	No	Requested timbre id. Must be filled in synchronously with the weight parameter.
weight	int	No	Range [1, 100] weight, must be filled in synchronously with voice_id. A maximum of 4 timbre mixtures are supported, with integer values. The higher the proportion of a single timbre, the more similar the synthesized timbre.

Output Parameters

Field Name	Parameter Type	Description
data	object	The returned data object, which may be `null` and needs to be checked for non-null.
trace_id	string	The ID of this session, used to help locate problems during consultation/feedback.
extra_info	object	Related additional information.
base_resp	object	If the request fails, the corresponding error status code and details.

data Parameters

Field Name	Type	Description
audio	string	The synthesized audio segment, encoded in hex, generated according to the input-defined format (mp3/pcm/flac).
subtitle_file	string	The download link of the synthesized subtitle, which corresponds to the audio file, accurate to the sentence (no more than 50 words), in milliseconds, in JSON format.
status	int	Current audio stream status: 1 means synthesizing, 2 means synthesis completed.

extra_info Parameters

Field Name	Type	Description
audio_length	long	Audio duration, accurate to milliseconds.
audio_sample_rate	long	Sample rate.
audio_size	long	Audio size, in bytes.
bitrate	long	Bit rate.
audio_format	string	Format of the generated audio file, with values in mp3/pcm/flac.
audio_channel	long	Number of audio channels generated: 1 for mono, 2 for stereo.
invisible_character_ratio	double	Proportion of illegal characters. If illegal characters do not exceed 10% (including 10%), the audio will be generated normally and the proportion of illegal characters will be returned; the maximum does not exceed 0.1 (10%), otherwise an error will be reported.
usage_characters	long	Billing character count, the number of billing characters for this speech generation.

base_resp Parameters

Field Name	Type	Description
status_code	int64	Status code. 1000: Unknown error; 1001: Timeout; 1002: Rate limited; 1004: Authentication failed; 1039: TPM rate limited; 1042: Illegal characters exceed 10%; 2013: Input format information is abnormal.
status_msg	string	Status details.

curl

curl --location --request POST 'https://genaiapi.cloudsway.net/v1/ai/{endpoint}/hailuo/file/upload' \
--header 'Authorization: Bearer {KEY}' \
--form 'purpose="t2a_async_input"' \
--form 'file=@"/C:/Users/xxx/Desktop/FJC/文档.txt"'

2. T2A Large v2 (Asynchronous Long Text Speech Generation)

1. File Upload

URL

https://genaiapi.cloudsway.net/v1/ai/{endpoint}/hailuo/file/upload

Request Method

post

Content-Type

form/data

Input Parameters

Field Name	Type	Required	Description
purpose	string	Yes	Purpose of the file. Values and supported formats are as follows: t2a_async_input: files used when creating speech generation tasks, supporting documents in txt and zipt formats;
file	file	Yes	File

Output Parameters

Field Name	Type	Description
file	object	File information object, containing detailed information of the file.
base_resp	object	Basic response information object, containing status code and status message.

file Parameters

Field Name	Type	Description
file_id	int64	Unique identifier of the file.
bytes	int	File size, in bytes.
created_at	int64	Creation time of the file, in Unix timestamp format.
filename	string	File name.
purpose	string	Purpose of the file, currently `t2a_async_input`.

base_resp Parameters

Field Name	Type	Description
status_code	int64	Status code, `0` indicates success.
status_msg	string	Status message, `success` indicates the request is successful.

curl

curl --location --request POST 'https://genaiapi.cloudsway.net/v1/ai/{endpoint}/hailuo/file/upload' \
--header 'Authorization: Bearer {KEY}' \
--form 'purpose="t2a_async_input"' \
--form 'file=@"/C:/Users/xxx/Desktop/FJC/文档.txt"'

2. Create Asynchronous Task

URL

https://genaiapi.cloudsway.net/v1/ai/{endpoint}/hailuo/tts/t2a_async_v2

Request Method

post

Content-Type

application/json

Input Parameters

Field Name	Type	Required	Description
text	string	No	Text to be synthesized, with a maximum limit of 50,000 characters. Required alternatively with "text_file_id".
text_file_id	long	No	file_id returned by the file upload interface. ID of the text file to be synthesized. A single txt file has a length limit of <100,000 characters, supporting txt and zip formats. Required alternatively with "text". The format will be automatically verified after input. 1. txt: length limit <100,000 characters (to control the interval time in the speech, add <#x#> between words, where x is in seconds, supporting 0.01-99.99s with a maximum of two decimal places). It supports customizing the speech interval between texts to achieve the effect of customizing text speech pause time. Note that the text interval time must be set between two pronounceable texts, and multiple consecutive time intervals cannot be set; 2. zip: upload as a package, which should only contain txt or json files (files in the compression package are of the same format). A json file can have three fields: ["title", "content", "extra"], representing title, main text, and author respectively. Finally, three results will be generated corresponding to the three fields, with a total of 9 files in one folder. If a field does not exist or its content is empty, the corresponding file will not be generated.
voice_setting	object	No	Parameters to control the speech rate, volume, pitch, etc. of the generated voice.
audio_setting	object	No	Parameters to control the sample rate, bit rate, and audio format of the generated voice.
pronunciation_dict	object	No	Parameters to control voice timbre mixing, which is required alternatively with voice_id.
language_boost	string	No	Default is null. Enhances the recognition ability for specified minority languages and dialects. Setting it can improve speech performance in specified minority language/dialect scenarios. If the minority language type is not clear, "auto" can be selected, and the model will independently judge the minority language type. Supported values: 'Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'auto'

voice_setting Parameters

Field Name	Type	Required	Description
speed	Float	No	Speech rate range [0.5, 2], default value is 1.0; the larger the value, the faster the speech rate.
vol	Float	No	Volume range (0, 10], default value is 1.0; the larger the value, the higher the volume.
pitch	int	No	Pitch range [-12, 12], default value is 0. 0 means output with original timbre, and the value must be an integer.
voice_id	string	Yes	The requested timbre ID, supporting two types: system timbres (IDs) and replicated timbres (IDs). The system timbres (IDs) are as follows: Green youth voice: male-qn-qingse; Elite youth voice: male-qn-jingying; Domineering youth voice: male-qn-badao; College student voice: male-qn-daxuesheng; Girl voice: female-shaonv; Royal sister voice: female-yujie; Mature female voice: female-chengshu; Sweet female voice: female-tianmei; Male host: presenter_male; Female host: presenter_female; Male audiobook 1: audiobook_male_1; Male audiobook 2: audiobook_male_2; Female audiobook 1: audiobook_female_1; Female audiobook 2: audiobook_female_2; Green youth voice-beta: male-qn-qingse-jingpin; Elite youth voice-beta: male-qn-jingying-jingpin; Domineering youth voice-beta: male-qn-badao-jingpin; College student voice-beta: male-qn-daxuesheng-jingpin; Girl voice-beta: female-shaonv-jingpin; Royal sister voice-beta: female-yujie-jingpin; Mature female voice-beta: female-chengshu-jingpin; Sweet female voice-beta: female-tianmei-jingpin; Smart boy: clever_boy; Cute boy: cute_boy; Cute girl: lovely_girl; Cartoon Zhu Xiaoqi: cartoon_pig
emotion	string	No	Controls the emotion of the synthesized speech; currently supports 7 emotions: happy, sad, angry, fearful, disgusted, surprised, neutral; parameter range ["happy", "sad", "angry", "fearful", "disgusted", "surprised", "neutral"]
english_normalization	bool	No	This parameter supports English text normalization, which can improve performance in digital reading scenarios but will slightly increase latency. If not provided, the default value is false.

audio_setting Parameters

Field Name	Type	Required	Description
sample_rate	int	No	Sample rate range [8000, 16000, 22050, 24000, 32000, 44100], default is 32000.
bitrate	int	No	Bit rate range [32000, 64000, 128000, 256000], default is 128000, which is only valid for mp3 format audio.
format	string	No	Generated audio format, default is mp3, optional range [mp3, pcm, flac, wav].
channel	int	No	Number of channels, default is 1 (mono), optional values are 1 (mono) or 2 (stereo).

pronunciation_dict Parameters

Field Name	Type	Required	Description
tone	list	No	Replace words, symbols that need special annotation and their corresponding phonetic notations. Replace pronunciation (adjust tone/replace pronunciation of other characters) in the following format: ["燕少飞/(yan4)(shao3)(fei1)", "达菲/(da2)(fei1)", "omg/oh my god"]. Tones are represented by numbers: first tone (yinping) is 1, second tone (yangping) is 2, third tone (shangsheng) is 3, fourth tone (qusheng) is 4, and neutral tone is 5.

Output Parameters

Field Name	Parameter Type	Description
taskId	long	Asynchronous task id

curl

curl --location --request POST 'https://genaiapi.cloudsway.net/v1/ai/{endpoint}/hailuo/tts/t2a_async_v2' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer {KEY}' \
--data-raw '{
    "text": "我爱中国",
    "language_boost": "auto",
    "voice_setting": {
        "voice_id": "audiobook_male_1",
        "speed": 1,
        "vol": 1,
        "pitch": 1
    },
    "pronunciation_dict": {
        "tone": [
            "草地/(cao3)(di1)"
        ]
    },
    "audio_setting": {
        "audio_sample_rate": 32000,
        "bitrate": 128000,
        "format": "mp3",
        "channel": 2
    }
}'

3. Get Asynchronous Task Status

URL

https://genaiapi.cloudsway.net/v1/ai/{endpoint}/hailuo/tts/task/{taskId}

Request Method

get

Input Parameters

Field Name	Type	Required	Description
taskId	string	Yes	taskId returned by the create asynchronous task interface

Output Parameters

Field Name	Type	Description
taskId	string	Unique identifier of the task.
status	string	Task status, such as `SUCCESS`, Fail, Processing, Preparing, Queueing.
fileId	string	Unique identifier of the file.

curl

curl --location --request GET 'https://genaiapi.cloudsway.net/v1/ai/{endpoint}/hailuo/tts/task/1915360414669643778' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer {KEY}'

4. File Download

URL

https://genaiapi.cloudsway.net/v1/ai/{endpoint}/hailuo/files/retrieve?taskId=19153409980342...\&fileId=261877976...

Request Method

get

Input Parameters

Field Name	Type	Required	Description
taskId	string	Yes	taskId returned by the get asynchronous task status interface
fileId	string	Yes	fileId returned by the get asynchronous task status interface

Output Parameters

Field Name	Type	Description
fileId	string	Unique identifier of the file.
bytes	int	File size, in bytes.
createdAt	int64	Creation time of the file, in Unix timestamp format.
filename	string	File name, including extension.
purpose	string	Purpose of the file, such as `t2a_async`.
mediaUrl	string	File download address, a complete URL with access permission signature.
expireTime	int64	Expiration time of the file, in Unix timestamp format, indicating the time when the file's download link becomes invalid.

curl

curl --location --request GET 'https://genaiapi.cloudsway.net/v1/ai/{endpoint}/hailuo/files/retrieve?taskId=1915340998034292737&fileId=261877976617219' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer {KEY}'