Text-to-Speech API

Version History

Version	Date	Changes
v1.2	2024-08-23	1.Added the API documentation for the MaaS ASpeech/OSpeech model
v1.1	2024-07-31	1.Modified the request path 2. ElvenLabs Text-to-Speech added request-id to the response header
v1.0	2024-07-29	Initial release

MaaS ASpeech/OSpeech

Request Method：

POST

Request Path：

{endpoint}/audio/speech

Request header

Parameter	Description	Example
Authorization	AccessKey Bearer ${AccessKey}	Bearer xxxxxx

Request Body

Parameter	Description	Example
model	The model being used	tts-1
input	The text input that needs to be converted to audio	how are you
voice	The voice option available for audio generation Possible values: "alloy", "echo", "fable", "onyx", "nova", "shimmer"	alloy
response_format	The audio format Possible values: "mp3", "opus", "aac", "flac", "wav", "pcm"	mp3
speed	The audio speed a range of: `0.25` to `4.0`.	1.0

Response Body

File stream

Sample Request

MaaS ASpeech

curl --location '{endpoint}/audio/speech' \
--header 'Authorization: Bearer xxxx' \
--header 'Content-Type: application/json' \
--data '{
    "input":"<speak version='\''1.0'\'' xml:lang='\''en-US'\''><voice xml:lang='\''en-US'\'' xml:gender='\''Female'\'' name='\''en-US-AvaMultilingualNeural'\''>my voice is my passport verify me</voice></speak>",
    "response_format":"audio-16khz-128kbitrate-mono-mp3"
}'

MaaS OSpeech

curl --location '{endpoint}/audio/speech' \
--header 'Authorization: Bearer xxxx' \
--header 'Content-Type: application/json' \
--data '{
    "input":"hi,what is your name?",
    "voice":"alloy",
    "speed":1.0,
    "response_format":"mp3"
}'

Request Method:

GET

Request Path:

{endpointPath}/cognitiveservices/voices/list

Request Header

Parameter	Description	Example
Authorization	AccessKey Bearer ${AccessKey}	Bearer xxxxxx

Return Parameters

Object Array

Object Parameters:

Parameter	Description	Example
Name	Voice Full Name	Microsoft Server Speech Text to Speech Voice (af-ZA, AdriNeural)
DisplayName	Display Name	Adri
LocalName	Local Name	Adri
ShortName	Short Name	af-ZA-AdriNeural
Gender	Voice Gender	Female
Locale	Locale	af-ZA
LocaleName	Locale Language Name	Afrikaans (South Africa)
SampleRateHertz	Sampling Rate	48000
VoiceType	Voice Type	Neural
Status	Status	GA
WordsPerMinute	Words Per Minute	147

Request Example

curl --location '{endpointPath}/cognitiveservices/voices/list' \
--header 'Authorization: Bearer xxxx'

MaaS-nar shortTextToSpeech

Request Method：

POST

Request Path：

/tts-n/text-to-speech/{responseFormat}

pathVariables

Parameter	Description	Example
responseFormat	Audio format (mp3/m4a)	mp3

queryParams

Parameter	Description	Example
voice	voice Possible values	Yifei
voice-speed	Speech speed Possible values fast normal slow numeric between 0.3-2	normal
voice-volume	Volume Possible values x-loud loud standard soft x-soft normalized	loud

Request header

Parameter	Description	Example
Accept	Fixed value:application/octet-stream	application/octet-stream
Authorization	AccessKey Bearer ${AccessKey}	Bearer RWYhq1NsLPAMmieux0Gd
Content-type	Possible values text/plain application/application/x-www-form-urlencoded text/vtt application/x-subrip text/srt	text/plain

The request body format and the Content-Type in the header need to correspond.

Request body

Please ensure that the Content-Type and the body format are properly aligned.

Sending the UTF-8 string in the body:

curl --location '${endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Accept: application/octet-stream' \ --header 'Authorization: Bearer ${AccessKey}' \ --header 'Content-type: text/plain' \ --data 'hello'
Sending a URL-encoded string in the body:

curl --location '${endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Accept: application/octet-stream' \ --header 'Authorization: Bearer ${AccessKey}' \ --header 'Content-Type: application/x-www-form-urlencoded' \ --data '%E4%BD%A0%E5%A5%BD%E5%95%8A'
Sending a UTF-8 text file in the body:

curl --location '${endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Accept: application/octet-stream' \ --header 'Authorization: Bearer ${AccessKey}' \ --header 'Content-type: text/plain' \ --data '@test.txt'
Sending a VTT file in the body:

curl --location '${endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Accept: application/octet-stream' \ --header 'Authorization: Bearer ${AccessKey}' \ --header 'Content-Type: text/vtt' \ --data '@sing-song_2024-07-29_103928.vtt'
Sending an SRT file in the body (Method 1):

curl --location '${endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Accept: application/octet-stream' \ --header 'Authorization: Bearer ${AccessKey}' \ --header 'Content-Type: application/x-subrip' \ --data '@sing-song_2024-07-29_103928.srt'

Sending an SRT file in the body (Method 2):

```
curl --location '${endpointPath}/tts-n/text-to-speech/mp3' \
--header 'Accept: application/octet-stream' \
--header 'Authorization: Bearer ${AccessKey}' \
--header 'Content-Type: text/srt' \
--data '@sing-song_2024-07-29_103928.srt'
```

Response headers

Parameter	Description	Example
x-duration-seconds	Duration of the audio in seconds	3

Response Body

File stream

MaaS-nar longTextToSpeech

API Flow

Call the longTextToSpeech API to obtain the statusUrl.
Poll the statusUrl (recommended interval of 5-10s) to get the task result.
If the task has completed successfully, download the audio file using the URL provided in the result field.

Request Method：

POST

Request Path：

/tts-n/text-to-speech/{responseFormat}

Path Variables

Parameter	Description	Example
responseFormat	Audio format: mp3/m4a/wav	wav

Query Parameters

Parameter	Description	Example
voice	Voice option Available Options	Yifei
voice-speed	Speech speed Available options: fast, normal, slow, or a number between 0.3 and 2	normal
voice-volume	Volume level Available options: x-loud, loud, standard, soft, x-soft, or normalized	loud

Request Headers

Parameter	Description	Example
Authorization	AccessKey Bearer ${AccessKey}	Bearer RWYhq1NsLPAMmieux0Gd
Content-type	Available options text/plain application/application/x-www-form-urlencoded text/vtt application/x-subrip text/srt	text/plain

Do not set the Accept header.r

Request body format must match the Content-Type specified in the header.

Request Body

Note the content-type and body format correspondence.

Sending a UTF-8 string in the body:

curl --location '${endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Authorization: Bearer ${AccessKey}' \ --header 'Content-type: text/plain' \ --data 'hello'
Sending a URL-encoded string in the body:

curl --location '${endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Content-Type: application/x-www-form-urlencoded' \ --data '%E4%BD%A0%E5%A5%BD%E5%95%8A'
Sending a UTF-8 text file in the body:

curl --location '${endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Authorization: Bearer ${AccessKey}' \ --header 'Content-type: text/plain' \ --data '@test.txt'
Sending a VTT file in the body:

curl --location '${endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Authorization: Bearer ${AccessKey}' \ --header 'Content-Type: text/vtt' \ --data '@sing-song_2024-07-29_103928.vtt'
Sending an SRT file in the body (Method 1):

curl --location '${endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Authorization: Bearer ${AccessKey}' \ --header 'Content-Type: application/x-subrip' \ --data '@sing-song_2024-07-29_103928.srt'

Sending an SRT file in the body (Method 2):

curl --location '${endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Authorization: Bearer ${AccessKey}' \ --header 'Content-Type: text/srt' \ --data '@sing-song_2024-07-29_103928.srt'

Response

Parameter	Description	Example
statusUrl	URL to obtain the task execution result. It can be accessed directly through a GET request without any authorization information. Recommended polling interval is 5-10s.
taskId	Task ID	1

Getting the task result through statusUrl

Parameter	Description	Example
finished	Indicates whether the task has finished. True means the polling should be stopped.	Boolean
percent	Progress of audio generation, ranging from 0 to 100.	Integer
succeeded	Indicates whether the audio was generated successfully.	Boolean
result	URL to download the audio if it has been successfully generated. This URL is valid for 10 minutes.	String
message	Reason for audio generation failure.	String
durationInSeconds	Duration of the audio in seconds, rounded to the nearest whole second.	Integer

MaaS-Ele TextToSpeech

Request Method：

POST

Request Path：

/tts-e/text-to-speech/{voice_id}

pathVariables

Parameter	Description	Example
voice_id	Voice ID, see Appendix for details	EXAVITQu4vr4xnSDxMaL

queryParams

Parameter	Description	Example
enable_logging	Privacy mode true: (default) non-privacy mode false: privacy mode	true
optimize_streaming_latency	Latency optimization (deprecated parameter) Available values: 1-4	1
output_format	Output format Available values: mp3_22050_32 mp3_44100_32 mp3_44100_64 mp3_44100_96 mp3_44100_128 (default) mp3_44100_192 pcm_16000 pcm_22050 pcm_24000 pcm_44100 ulaw_8000	mp3_44100_128

Request Headers

Parameter	Description	Example
Authorization	AccessKey Bearer ${AccessKey}	Bearer RWYhq1NsLPAMmieux0Gd

Request Body

Parameter	Description	Example
text	Text (required)	how are you
model_id	Model ID Available values: eleven_monolingual_v1 (default) eleven_multilingual_v2 eleven_turbo_v2_5 eleven_turbo_v2 eleven_multilingual_v1 eleven_monolingual_v1	1
language_code	Language code, currently only supported by eleven_turbo_v2_5	ISO 639-1
voice_settings	Voice settings	{"stability":0,"similarity_boost":1.0}
voice_settings.stability	Stability	0
voice_settings.similarity_boost	Similarity Boost	1.0
voice_settings.style	Voice Style	0
voice_settings.use_speaker_boost	Use Speaker Boost true (default)/false	true
pronunciation_dictionary_locators	object[] List of pronunciation dictionary locators, supports up to 3	"pronunciation_dictionary_locators": [{"pronunciation_dictionary_id": "","version_id": ""}]
pronunciation_dictionary_locators.pronunciation_dictionary_id	Pronunciation Dictionary ID	123
pronunciation_dictionary_locators.version_id	Version ID	123
seed	Deterministic Sampling	123
previous_text	Previous Text Content	hi
next_text	Next Text Content	how are you
previous_request_ids	string[] List of previous sample request IDs	["xx","xxx"]
next_request_ids	string[] List of next sample request IDs	["xx","xxx"]

Response Headers

Parameter	Description	Example
character-cost	Character Size	333
request-id	Request ID	12342wqwqe

Response Body

File stream

Request Example:

curl --location '${endpointPath}/tts-e/text-to-speech/{voice_id}' \
--header 'Authorization: Bearer ${AccessKey}' \
--header 'Content-Type: application/json' \
--data '{
  "text": "hi",
  "voice_settings": {
    "stability": 0,
    "similarity_boost": 1.0
  }
}'

Appendix

Available voice_id

name	voice_id
Sarah	EXAVITQu4vr4xnSDxMaL
Laura	FGY2WhTYpPnrIDTdsKH5
Charlie	IKne3meq5aSn9XLyUdCD
George	JBFqnCBsd6RMkjVDRZzb
Callum	N2lVS1w4EtoT3dr4eOWO
Liam	TX3LPaxmHKxFdv7VOQHJ
Charlotte	XB0fDUnXU5powFXDhCwa
Alice	Xb7hH8MSUJpSbSDYk0k2
Matilda	XrExE9yKIg1WjnnlVkGX
Will	bIHbv24MWmeRgasZH58o
Jessica	cgSgspJ2msm6clMCkdW9
Eric	cjVigY5qzO86Huf0OWal
Chris	iP95p4xoKVk53GoZ742B
Brian	nPczCjzI2devNBz1zQrb
Daniel	onwK4e9ZLuTAKqWW03F9
Lily	pFZP5JQG7iQjIQuC4Bku
Bill	pqHfZKP75CvOlQylNhV4

Text-to-Speech API

Version History

MaaS ASpeech/OSpeech

Request Method：

Request Path：

Request header

Request Body

Response Body

Sample Request

Get MaaS Aspeech Regional Language List

Request Method:

Request Path:

Request Header

Return Parameters

Request Example

MaaS-nar shortTextToSpeech

Request Method：

Request Path：

pathVariables

queryParams

Request header

Request body

Response headers

Response Body

MaaS-nar longTextToSpeech

API Flow

Request Method：

Request Path：

Path Variables

Query Parameters

Request Headers

Request Body

Response

Getting the task result through statusUrl

MaaS-Ele TextToSpeech

Request Method：

Request Path：

pathVariables

queryParams

Request Headers

Request Body

Response Headers

Response Body

Request Example:

Appendix