Text-to-Speech API
Version History
Version | Date | Changes |
---|---|---|
v1.2 | 2024-08-23 | 1.Added the API documentation for the MaaS ASpeech/OSpeech model |
v1.1 | 2024-07-31 | 1.Modified the request path 2. ElvenLabs Text-to-Speech added request-id to the response header |
v1.0 | 2024-07-29 | Initial release |
MaaS ASpeech/OSpeech
Request Method:
POST
Request Path:
{endpoint}/audio/speech
Request header
Parameter | Description | Example |
---|---|---|
Authorization | AccessKey Bearer ${AccessKey} | Bearer xxxxxx |
Request Body
Parameter | Description | Example |
---|---|---|
model | The model being used | tts-1 |
input | The text input that needs to be converted to audio | how are you |
voice | The voice option available for audio generation Possible values: "alloy", "echo", "fable", "onyx", "nova", "shimmer" |
alloy |
response_format | The audio format Possible values: "mp3", "opus", "aac", "flac", "wav", "pcm" |
mp3 |
speed | The audio speed a range of: 0.25 to 4.0 . |
1.0 |
Response Body
File stream
Sample Request
MaaS ASpeech
curl --location '{endpoint}/audio/speech' \
--header 'Authorization: Bearer xxxx' \
--header 'Content-Type: application/json' \
--data '{
"input":"<speak version='\''1.0'\'' xml:lang='\''en-US'\''><voice xml:lang='\''en-US'\'' xml:gender='\''Female'\'' name='\''en-US-AvaMultilingualNeural'\''>my voice is my passport verify me</voice></speak>",
"response_format":"audio-16khz-128kbitrate-mono-mp3"
}'
MaaS OSpeech
curl --location '{endpoint}/audio/speech' \
--header 'Authorization: Bearer xxxx' \
--header 'Content-Type: application/json' \
--data '{
"input":"hi,what is your name?",
"voice":"alloy",
"speed":1.0,
"response_format":"mp3"
}'
Get MaaS Aspeech Regional Language List
Request Method:
GET
Request Path:
{endpointPath}/cognitiveservices/voices/list
Request Header
Parameter | Description | Example |
---|---|---|
Authorization | AccessKey Bearer ${AccessKey} | Bearer xxxxxx |
Return Parameters
Object Array
Object Parameters:
Parameter | Description | Example |
---|---|---|
Name | Voice Full Name | Microsoft Server Speech Text to Speech Voice (af-ZA, AdriNeural) |
DisplayName | Display Name | Adri |
LocalName | Local Name | Adri |
ShortName | Short Name | af-ZA-AdriNeural |
Gender | Voice Gender | Female |
Locale | Locale | af-ZA |
LocaleName | Locale Language Name | Afrikaans (South Africa) |
SampleRateHertz | Sampling Rate | 48000 |
VoiceType | Voice Type | Neural |
Status | Status | GA |
WordsPerMinute | Words Per Minute | 147 |
Request Example
curl --location '{endpointPath}/cognitiveservices/voices/list' \
--header 'Authorization: Bearer xxxx'
MaaS-nar shortTextToSpeech
Request Method:
POST
Request Path:
/tts-n/text-to-speech/{responseFormat}
pathVariables
Parameter | Description | Example |
---|---|---|
responseFormat | Audio format (mp3/m4a) | mp3 |
queryParams
Parameter | Description | Example |
---|---|---|
voice | voice Possible values |
Yifei |
voice-speed | Speech speed Possible values fast normal slow numeric between 0.3-2 |
normal |
voice-volume | Volume Possible values x-loud loud standard soft x-soft normalized |
loud |
Request header
Parameter | Description | Example |
---|---|---|
Accept | Fixed value:application/octet-stream | application/octet-stream |
Authorization | AccessKey Bearer ${AccessKey} | Bearer RWYhq1NsLPAMmieux0Gd |
Content-type | Possible values text/plain application/application/x-www-form-urlencoded text/vtt application/x-subrip text/srt |
text/plain |
The request body format and the Content-Type in the header need to correspond.
Request body
Please ensure that the Content-Type and the body format are properly aligned.
-
Sending the UTF-8 string in the body:
curl --location '${endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Accept: application/octet-stream' \ --header 'Authorization: Bearer ${AccessKey}' \ --header 'Content-type: text/plain' \ --data 'hello'
-
Sending a URL-encoded string in the body:
curl --location '${endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Accept: application/octet-stream' \ --header 'Authorization: Bearer ${AccessKey}' \ --header 'Content-Type: application/x-www-form-urlencoded' \ --data '%E4%BD%A0%E5%A5%BD%E5%95%8A'
-
Sending a UTF-8 text file in the body:
curl --location '${endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Accept: application/octet-stream' \ --header 'Authorization: Bearer ${AccessKey}' \ --header 'Content-type: text/plain' \ --data '@test.txt'
-
Sending a VTT file in the body:
curl --location '${endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Accept: application/octet-stream' \ --header 'Authorization: Bearer ${AccessKey}' \ --header 'Content-Type: text/vtt' \ --data '@sing-song_2024-07-29_103928.vtt'
-
Sending an SRT file in the body (Method 1):
curl --location '${endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Accept: application/octet-stream' \ --header 'Authorization: Bearer ${AccessKey}' \ --header 'Content-Type: application/x-subrip' \ --data '@sing-song_2024-07-29_103928.srt'
Sending an SRT file in the body (Method 2):
```
curl --location '${endpointPath}/tts-n/text-to-speech/mp3' \
--header 'Accept: application/octet-stream' \
--header 'Authorization: Bearer ${AccessKey}' \
--header 'Content-Type: text/srt' \
--data '@sing-song_2024-07-29_103928.srt'
```
Response headers
Parameter | Description | Example |
---|---|---|
x-duration-seconds | Duration of the audio in seconds | 3 |
Response Body
File stream
MaaS-nar longTextToSpeech
API Flow
- Call the longTextToSpeech API to obtain the statusUrl.
- Poll the statusUrl (recommended interval of 5-10s) to get the task result.
- If the task has completed successfully, download the audio file using the URL provided in the result field.
Request Method:
POST
Request Path:
/tts-n/text-to-speech/{responseFormat}
Path Variables
Parameter | Description | Example |
---|---|---|
responseFormat | Audio format: mp3/m4a/wav | wav |
Query Parameters
Parameter | Description | Example |
---|---|---|
voice | Voice option Available Options |
Yifei |
voice-speed | Speech speed Available options: fast, normal, slow, or a number between 0.3 and 2 |
normal |
voice-volume | Volume level Available options: x-loud, loud, standard, soft, x-soft, or normalized |
loud |
Request Headers
Parameter | Description | Example |
---|---|---|
Authorization | AccessKey Bearer ${AccessKey} | Bearer RWYhq1NsLPAMmieux0Gd |
Content-type | Available options text/plain application/application/x-www-form-urlencoded text/vtt application/x-subrip text/srt |
text/plain |
Do not set the Accept header.r
Request body format must match the Content-Type specified in the header.
Request Body
Note the content-type and body format correspondence.
-
Sending a UTF-8 string in the body:
curl --location '${endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Authorization: Bearer ${AccessKey}' \ --header 'Content-type: text/plain' \ --data 'hello'
-
Sending a URL-encoded string in the body:
curl --location '${endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Content-Type: application/x-www-form-urlencoded' \ --data '%E4%BD%A0%E5%A5%BD%E5%95%8A'
-
Sending a UTF-8 text file in the body:
curl --location '${endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Authorization: Bearer ${AccessKey}' \ --header 'Content-type: text/plain' \ --data '@test.txt'
-
Sending a VTT file in the body:
curl --location '${endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Authorization: Bearer ${AccessKey}' \ --header 'Content-Type: text/vtt' \ --data '@sing-song_2024-07-29_103928.vtt'
-
Sending an SRT file in the body (Method 1):
curl --location '${endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Authorization: Bearer ${AccessKey}' \ --header 'Content-Type: application/x-subrip' \ --data '@sing-song_2024-07-29_103928.srt'
Sending an SRT file in the body (Method 2):
curl --location '${endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Authorization: Bearer ${AccessKey}' \ --header 'Content-Type: text/srt' \ --data '@sing-song_2024-07-29_103928.srt'
Response
Parameter | Description | Example |
---|---|---|
statusUrl | URL to obtain the task execution result. It can be accessed directly through a GET request without any authorization information. Recommended polling interval is 5-10s. | |
taskId | Task ID | 1 |
Getting the task result through statusUrl
Parameter | Description | Example |
---|---|---|
finished | Indicates whether the task has finished. True means the polling should be stopped. | Boolean |
percent | Progress of audio generation, ranging from 0 to 100. | Integer |
succeeded | Indicates whether the audio was generated successfully. | Boolean |
result | URL to download the audio if it has been successfully generated. This URL is valid for 10 minutes. | String |
message | Reason for audio generation failure. | String |
durationInSeconds | Duration of the audio in seconds, rounded to the nearest whole second. | Integer |
MaaS-Ele TextToSpeech
Request Method:
POST
Request Path:
/tts-e/text-to-speech/{voice_id}
pathVariables
Parameter | Description | Example |
---|---|---|
voice_id | Voice ID, see Appendix for details | EXAVITQu4vr4xnSDxMaL |
queryParams
Parameter | Description | Example |
---|---|---|
enable_logging | Privacy mode true: (default) non-privacy mode false: privacy mode |
true |
optimize_streaming_latency | Latency optimization (deprecated parameter) Available values: 1-4 |
1 |
output_format | Output format Available values: mp3_22050_32 mp3_44100_32 mp3_44100_64 mp3_44100_96 mp3_44100_128 (default) mp3_44100_192 pcm_16000 pcm_22050 pcm_24000 pcm_44100 ulaw_8000 |
mp3_44100_128 |
Request Headers
Parameter | Description | Example |
---|---|---|
Authorization | AccessKey Bearer ${AccessKey} | Bearer RWYhq1NsLPAMmieux0Gd |
Request Body
Parameter | Description | Example |
---|---|---|
text | Text (required) | how are you |
model_id | Model ID Available values: eleven_monolingual_v1 (default) eleven_multilingual_v2 eleven_turbo_v2_5 eleven_turbo_v2 eleven_multilingual_v1 eleven_monolingual_v1 |
1 |
language_code | Language code, currently only supported by eleven_turbo_v2_5 | ISO 639-1 |
voice_settings | Voice settings | {"stability":0,"similarity_boost":1.0} |
voice_settings.stability | Stability | 0 |
voice_settings.similarity_boost | Similarity Boost | 1.0 |
voice_settings.style | Voice Style | 0 |
voice_settings.use_speaker_boost | Use Speaker Boost true (default)/false |
true |
pronunciation_dictionary_locators | object[] List of pronunciation dictionary locators, supports up to 3 |
"pronunciation_dictionary_locators": [{"pronunciation_dictionary_id": " |
pronunciation_dictionary_locators.pronunciation_dictionary_id | Pronunciation Dictionary ID | 123 |
pronunciation_dictionary_locators.version_id | Version ID | 123 |
seed | Deterministic Sampling | 123 |
previous_text | Previous Text Content | hi |
next_text | Next Text Content | how are you |
previous_request_ids | string[] List of previous sample request IDs |
["xx","xxx"] |
next_request_ids | string[] List of next sample request IDs |
["xx","xxx"] |
Response Headers
Parameter | Description | Example |
---|---|---|
character-cost | Character Size | 333 |
request-id | Request ID | 12342wqwqe |
Response Body
File stream
Request Example:
curl --location '${endpointPath}/tts-e/text-to-speech/{voice_id}' \
--header 'Authorization: Bearer ${AccessKey}' \
--header 'Content-Type: application/json' \
--data '{
"text": "hi",
"voice_settings": {
"stability": 0,
"similarity_boost": 1.0
}
}'
Appendix
- Available voice_id
name | voice_id |
---|---|
Sarah | EXAVITQu4vr4xnSDxMaL |
Laura | FGY2WhTYpPnrIDTdsKH5 |
Charlie | IKne3meq5aSn9XLyUdCD |
George | JBFqnCBsd6RMkjVDRZzb |
Callum | N2lVS1w4EtoT3dr4eOWO |
Liam | TX3LPaxmHKxFdv7VOQHJ |
Charlotte | XB0fDUnXU5powFXDhCwa |
Alice | Xb7hH8MSUJpSbSDYk0k2 |
Matilda | XrExE9yKIg1WjnnlVkGX |
Will | bIHbv24MWmeRgasZH58o |
Jessica | cgSgspJ2msm6clMCkdW9 |
Eric | cjVigY5qzO86Huf0OWal |
Chris | iP95p4xoKVk53GoZ742B |
Brian | nPczCjzI2devNBz1zQrb |
Daniel | onwK4e9ZLuTAKqWW03F9 |
Lily | pFZP5JQG7iQjIQuC4Bku |
Bill | pqHfZKP75CvOlQylNhV4 |