文本生音频接口文档

版本历史

版本号	日期	修改内容
v1.2	2024-08-23	1.增加了MaaS ASpeech/OSpeech模型的接口文档
v1.1	2024-07-31	1.修改了请求路径 2. ElvenLabs Text-to-Speech 返回header中增加request-id
v1.0	2024-07-29	初版

MaaS ASpeech/OSpeech

请求方法：

POST

请求路径：

https://genaiapi.cloudsway.net/v1/{endpoint}/audio/speech

参数	描述	示例
Authorization	AccessKey Bearer ${AccessKey}	Bearer xxxxxx

请求Body

参数	描述	示例
model	使用模型	MaaS ASpeech
input	需要生成音频的文本	how are you
voice	当生成音频时可选用的声音可选值： "alloy", "echo", "fable", "onyx", "nova", "shimmer"	alloy
response_format	音频格式可选值： "mp3", "opus", "aac", "flac", "wav", "pcm"	mp3
speed	音频速度范围是： `0.25` to `4.0`.	1.0

返回值

文件流

请求样例

MaaS ASpeech

curl --location 'https://genaiapi.cloudsway.net/v1/ai/{endpoint}/audio/speech' \
--header 'Authorization: Bearer xxxx' \
--header 'Content-Type: application/json' \
--data '{
    "input":"<speak version='\''1.0'\'' xml:lang='\''en-US'\''><voice xml:lang='\''en-US'\'' xml:gender='\''Female'\'' name='\''en-US-AvaMultilingualNeural'\''>my voice is my passport verify me</voice></speak>",
    "response_format":"audio-16khz-128kbitrate-mono-mp3"
}'

MaaS OSpeech

curl --location 'https://genaiapi.cloudsway.net/v1/ai/{endpoint}/audio/speech' \
--header 'Authorization: Bearer xxxx' \
--header 'Content-Type: application/json' \
--data '{
    "input":"hi,what is your name?",
    "voice":"alloy",
    "speed":1.0,
    "response_format":"mp3"
}'

获取MaaS ASpeech区域语音列表

请求方法：

GET

请求路径：

{endpointPath}/cognitiveservices/voices/list

请求header

参数	描述	示例
Authorization	AccessKey Bearer ${AccessKey}	Bearer xxxxxx

返回参数

对象数组

对象参数：

参数	描述	示例
Name	语音全称	Microsoft Server Speech Text to Speech Voice (af-ZA, AdriNeural)
DisplayName	展示名称	Adri
LocalName	本地名称	Adri
ShortName	简称	af-ZA-AdriNeural
Gender	语音性别	Female
Locale	语言环境	af-ZA
LocaleName	语言环境名称	Afrikaans (South Africa)
SampleRateHertz	采样赫兹	48000
VoiceType	语音类型	Neural
Status	状态	GA
WordsPerMinute	每分钟读单词数量	147

请求样例

curl --location '{endpointPath}/cognitiveservices/voices/list' \
--header 'Authorization: Bearer xxxx'

MaaS-nar shortTextToSpeech

请求方法

POST

请求路径

/tts-n/text-to-speech/{responseFormat}

pathVariables

参数	描述	示例
responseFormat	语音格式 mp3/m4a	mp3

queryParams

参数	描述	示例
voice	声音可选值	Yifei
voice-speed	语速可选值 fast normal slow 数字0.3-2之间	normal
voice-volume	音量可选值 x-loud loud standard soft x-soft normalized	loud

请求header

参数	描述	示例
Accept	固定值application/octet-stream	application/octet-stream
Authorization	AccessKey Bearer ${AccessKey}	Bearer RWYhq1NsLPAMmieux0Gd
Content-type	可选值 text/plain application/application/x-www-form-urlencoded text/vtt application/x-subrip text/srt	text/plain

请求body格式和header中Content-type需要对应

请求body

注意Content-type和body格式的对应

body传UTF-8字符串

curl --location 'https://genaiapi.cloudsway.net/v1/ai/{endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Accept: application/octet-stream' \ --header 'Authorization: Bearer ${AccessKey}' \ --header 'Content-type: text/plain' \ --data 'hello'
body传urlEncode过后的字符串

curl --location 'https://genaiapi.cloudsway.net/v1/ai/{endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Accept: application/octet-stream' \ --header 'Authorization: Bearer ${AccessKey}' \ --header 'Content-Type: application/x-www-form-urlencoded' \ --data '%E4%BD%A0%E5%A5%BD%E5%95%8A'
body传UTF-8的文本文件脚本

curl --location 'https://genaiapi.cloudsway.net/v1/ai/{endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Accept: application/octet-stream' \ --header 'Authorization: Bearer ${AccessKey}' \ --header 'Content-type: text/plain' \ --data '@test.txt'
body传VTT文件

curl --location 'https://genaiapi.cloudsway.net/v1/ai/{endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Accept: application/octet-stream' \ --header 'Authorization: Bearer ${AccessKey}' \ --header 'Content-Type: text/vtt' \ --data '@sing-song_2024-07-29_103928.vtt'
body传SRT文件

方式一 curl --location 'https://genaiapi.cloudsway.net/v1/ai/{endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Accept: application/octet-stream' \ --header 'Authorization: Bearer ${AccessKey}' \ --header 'Content-Type: application/x-subrip' \ --data '@sing-song_2024-07-29_103928.srt'

方式二 curl --location 'https://genaiapi.cloudsway.net/v1/ai/{endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Accept: application/octet-stream' \ --header 'Authorization: Bearer ${AccessKey}' \ --header 'Content-Type: text/srt' \ --data '@sing-song_2024-07-29_103928.srt'

返回值header

参数	描述	示例
x-duration-seconds	语音时长(秒)	3

返回值

文件流

MaaS-nar longTextToSpeech

调用链路

调用longTextToSpeech的接口获取到statusUrl
通过轮询请求statusUrl(建议5-10s一次)获取任务结果
如果获取到任务结束并且成功，通过返回值中result里的url下载音频

请求方法

POST

请求路径

/tts-n/text-to-speech/{responseFormat}

pathVariables

参数	描述	示例
responseFormat	语音格式 mp3/m4a/wav	wav

queryParams

参数	描述	示例
voice	声音可选值	Yifei
voice-speed	语速可选值 fast normal slow 数字0.3-2之间	normal
voice-volume	音量可选值 x-loud loud standard soft x-soft normalized	loud

请求header

参数	描述	示例
Authorization	AccessKey Bearer ${AccessKey}	Bearer RWYhq1NsLPAMmieux0Gd
Content-type	可选值 text/plain application/application/x-www-form-urlencoded text/vtt application/x-subrip text/srt	text/plain

不要设置Acceptheader

请求body格式和header中Content-type需要对应

请求body

注意Content-type和body格式的对应

body传UTF-8字符串

curl --location 'https://genaiapi.cloudsway.net/v1/ai/{endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Authorization: Bearer ${AccessKey}' \ --header 'Content-type: text/plain' \ --data 'hello'
body传urlEncode过后的字符串

curl --location 'https://genaiapi.cloudsway.net/v1/ai/{endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Content-Type: application/x-www-form-urlencoded' \ --data '%E4%BD%A0%E5%A5%BD%E5%95%8A'
body传UTF-8的文本文件脚本

curl --location 'https://genaiapi.cloudsway.net/v1/ai/{endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Authorization: Bearer ${AccessKey}' \ --header 'Content-type: text/plain' \ --data '@test.txt'
body传VTT文件

curl --location 'https://genaiapi.cloudsway.net/v1/ai/{endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Authorization: Bearer ${AccessKey}' \ --header 'Content-Type: text/vtt' \ --data '@sing-song_2024-07-29_103928.vtt'
body传SRT文件

方式一 curl --location 'https://genaiapi.cloudsway.net/v1/ai/{endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Authorization: Bearer ${AccessKey}' \ --header 'Content-Type: application/x-subrip' \ --data '@sing-song_2024-07-29_103928.srt'

方式二 curl --location 'https://genaiapi.cloudsway.net/v1/ai/{endpointPath}/tts-n/text-to-speech/mp3' \ --header 'Authorization: Bearer ${AccessKey}' \ --header 'Content-Type: text/srt' \ --data '@sing-song_2024-07-29_103928.srt'

返回值

参数	描述	示例
statusUrl	用于获取任务执行结果的url,可以直接通过get请求,不需要授权信息,建议5-10s获取一次
taskId	任务id	1

通过statusUrl获取结果

字段	描述	类型
finished	任务是否结束。true 代表应停止轮询	Boolean
percent	0-100之间，音频生成进度	Integer
succeeded	音频是否生成成功	Boolean
result	音频生成成功后下载音频的url，有效期为 10 分钟	String
message	音频生成失败的原因	String
durationInSeconds	音频时长，单位为秒，四舍五入到最近的整数秒。	Integer

MaaS-Ele TextToSpeech

请求方法

POST

请求路径

/tts-e/text-to-speech/{voice_id}

pathVariables

参数	描述	示例
voice_id	语音id,详情见附录	EXAVITQu4vr4xnSDxMaL

queryParams

参数	描述	示例
enable_logging	是否是隐私模式， true：（default）非隐私模式 false：隐私模式	true
optimize_streaming_latency	延迟优化（已弃用参数）可选值：1-4	1
output_format	输出格式可选值： mp3_22050_32 mp3_44100_32 mp3_44100_64 mp3_44100_96 mp3_44100_128 （default） mp3_44100_192 pcm_16000 pcm_22050 pcm_24000 pcm_44100 ulaw_8000	mp3_44100_128

请求header

参数	描述	示例
Authorization	AccessKey Bearer ${AccessKey}	Bearer RWYhq1NsLPAMmieux0Gd

请求Body

参数	描述	示例
text	文本（required）	how are you
model_id	模型ID 可选值： eleven_monolingual_v1(default) eleven_multilingual_v2 eleven_turbo_v2_5 eleven_turbo_v2 eleven_multilingual_v1 eleven_monolingual_v1	1
language_code	语言编码，目前只有eleven_turbo_v2_5支持该参数	ISO 639-1
voice_settings	语音设置	{"stability":0,"similarity_boost":1.0}
voice_settings.stability	稳定度	0
voice_settings.similarity_boost	相似度	1.0
voice_settings.style	语音风格	0
voice_settings.use_speaker_boost	是否使用语音增强 true(default)/false	true
pronunciation_dictionary_locators	object[] 发音字典定位器列表，最多支持3个	"pronunciation_dictionary_locators": [{"pronunciation_dictionary_id": "","version_id": ""}]
pronunciation_dictionary_locators.pronunciation_dictionary_id	发音字典id	123
pronunciation_dictionary_locators.version_id	版本id	123
seed	确定性采样	123
previous_text	上一个文本内容	hi
next_text	下一个文本内容	how are you
previous_request_ids	string[] 上一个样本请求id列表	["xx","xxx"]
next_request_ids	string[] 下一个样本请求id列表	["xx","xxx"]

返回值header

参数	描述	示例
character-cost	字符大小	333
request-id	请求id	12342wqwqe

返回值

文件流

请求样例

curl --location 'https://genaiapi.cloudsway.net/v1/ai/{endpointPath}/tts-e/text-to-speech/{voice_id}' \
--header 'Authorization: Bearer ${AccessKey}' \
--header 'Content-Type: application/json' \
--data '{
  "text": "hi",
  "voice_settings": {
    "stability": 0,
    "similarity_boost": 1.0
  }
}'

附录

可选voice_id

name	voice_id
Sarah	EXAVITQu4vr4xnSDxMaL
Laura	FGY2WhTYpPnrIDTdsKH5
Charlie	IKne3meq5aSn9XLyUdCD
George	JBFqnCBsd6RMkjVDRZzb
Callum	N2lVS1w4EtoT3dr4eOWO
Liam	TX3LPaxmHKxFdv7VOQHJ
Charlotte	XB0fDUnXU5powFXDhCwa
Alice	Xb7hH8MSUJpSbSDYk0k2
Matilda	XrExE9yKIg1WjnnlVkGX
Will	bIHbv24MWmeRgasZH58o
Jessica	cgSgspJ2msm6clMCkdW9
Eric	cjVigY5qzO86Huf0OWal
Chris	iP95p4xoKVk53GoZ742B
Brian	nPczCjzI2devNBz1zQrb
Daniel	onwK4e9ZLuTAKqWW03F9
Lily	pFZP5JQG7iQjIQuC4Bku
Bill	pqHfZKP75CvOlQylNhV4