语音生文本接口文档

版本历史

版本号	日期	修改内容
v1.1	2024-12-03	新增 MaaS-AFast-asr 、 MaaS-Arealtime-asr 和 MaaS-ASpeech-Translation 三个产品的api文档
v1.0	2024-08-29	初版

MaaS Whisper

公共信息

参数	描述	示例
basePath	调用mass api的基础路径，包含/v1/ai固定路径	https://genaiapi.cloudsway.net/v1/ai
endpointPath	调用mass api的生成的一段随机路径	RkBOAlaWzKcubSji
AccessKey	调用mass api的accessKey	RWxxxxxxxx0Gd

按照上面示例,最终请求模型接口的路径为https://genaiapi.cloudsway.net/v1/ai/RkBOAlaWzKcubSji

请求方法

POST

请求路径

{basePath}/{endpointPath}/audio/transcriptions

参数	描述	示例
Authorization	AccessKey Bearer ${AccessKey}	Bearer RWxxxxxxxx0Gd

请求body

字段名	类型	是否必填	描述	示例值
file	File	是	音频文件	mp3、mp4、mpweg、mpga、m4a、wav、webm ,文件大小限制为25M
prompt	String	否	提示词	"Generate a video of a sunset over the ocean."
response_format	String	否	指定模型返回结果的格式	json,verbose_json
temperature	String	否	温度	0-1之间的值
language	String	否	指定音频文件的语言	"en"（英语），"zh"（中文），"es"（西班牙语）等
timestamp_granularities	String	否	指定时间戳的粒度	"none"：不生成时间戳 "word"：为每个词生成时间戳 "sentence"：为每个句子生成时间戳。

返回值

字段名	类型	描述	示例值
text	String	音频转录文本

示例

请求

curl --request POST \
--url https://genaiapi.cloudsway.net/v1/ai/RkBOAlaWzKcubSji/audio/transcriptions \
--header 'Accept: */*' \
--header 'Authorization: Bearer ${AccessKey}' \
--header 'Connection: keep-alive' \
--header 'Content-Type: multipart/form-data' \
--form 'prompt=A poetic description of early morning, including words like dawn, quiet, mist, and possibility' \
--form response_format=verbose_json \
--form temperature=0.1 \
--form language=en \
--form timestamp_granularities=none \
--form 'file=xx.wav'

返回值

{
  "text": "In this ancient town, plum blossoms bloom silently. The white petals are like snow, falling on the branches, welcoming the cold winter. The fragrance of the flowers is elegant, and it touches the heart, as if it is the scent of time. In this ancient town, plum blossoms bloom silently. Every plum blossom is a small miracle, which blooms in the coldness of life. They are not afraid of the cold, they are firm, symbolizing hope and rebirth. The blooming of plum blossoms is like the praise of nature for life, warming everyone's heart. Each blossom is a small miracle, symbolizing hope and rebirth. Standing under the plum trees, it is as if you can hear the rain of years. Flowers bloom and fall, spring and autumn come. Plum blossoms witness the turning of time, and witness people's joy and sorrow. They are the guardians of memory, quietly preserving the story of this town. Standing under the plum tree, one can almost hear the whispers of time. Plum blossoms are not just a plant, but also a spiritual symbol. It teaches us to keep hope in adversity, to find warmth in the cold winter. Every year's blooming is a praise of life, a hope for the future. Plum blossoms teach us to keep hope alive in adversity. Let's cherish the beauty before us and embrace every moment of life bravely. Let's cherish the beauty before us and embrace every moment of life bravely."
}

音频翻译英文

请求方法

POST

请求路径

{basePath}/{endpointPath}/audio/translation

请求header

参数	描述	示例
Authorization	AccessKey Bearer ${AccessKey}	Bearer RWxxxxxxxx0Gd

请求body

字段名	类型	是否必填	描述	示例值
file	File	是	音频文件	mp3、mp4、mpweg、mpga、m4a、wav、webm ,文件大小限制为25M
prompt	String	否	提示词	"Generate a video of a sunset over the ocean."
response_format	String	否	指定模型返回结果的格式	json,verbose_json
temperature	String	否	温度	0-1之间的值

返回值

字段名	类型	描述	示例值
text	String	音频翻译英文

示例

请求

curl --location --request POST 'https://genaiapi.cloudsway.net/v1/ai/RkBOAcubSji/audio/translation' \
--header 'Authorization: Bearer ${AccessKey}' \
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)' \
--header 'Accept: */*' \
--header 'Connection: keep-alive' \
--header 'Content-Type: multipart/form-data; boundary=--------------------------726232426471513578566370' \
--form 'prompt=A poetic description of early morning, including words like dawn, quiet, mist, and possibility' \
--form 'response_format=verbose_json' \
--form 'temperature=0.1' \
--form 'timestamp_granularities=none' \
--form 'file=xxx.wav"'

返回值

{
  "text": "In this ancient town, plum blossoms bloom silently. The white petals are like snow, falling on the branches, welcoming the cold winter. The fragrance of the flowers is elegant, and it touches the heart, as if it is the scent of time. In this ancient town, plum blossoms bloom silently. Every plum blossom is a small miracle, which blooms in the coldness of life. They are not afraid of the cold, they are firm, symbolizing hope and rebirth. The blooming of plum blossoms is like the praise of nature for life, warming everyone's heart. Each blossom is a small miracle, symbolizing hope and rebirth. Standing under the plum trees, it is as if you can hear the rain of years. Flowers bloom and fall, spring and autumn come. Plum blossoms witness the turning of time, and witness people's joy and sorrow. They are the guardians of memory, quietly preserving the story of this town. Standing under the plum tree, one can almost hear the whispers of time. Plum blossoms are not just a plant, but also a spiritual symbol. It teaches us to keep hope in adversity, to find warmth in the cold winter. Every year's blooming is a praise of life, a hope for the future. Plum blossoms teach us to keep hope alive in adversity. Let's cherish the beauty before us and embrace every moment of life bravely. Let's cherish the beauty before us and embrace every moment of life bravely."
}

MaaS-AFast-asr

公共信息

参数	描述	示例
basePath	调用mass api的基础路径	https://genaiapi.cloudsway.net/
endpointPath	调用mass api的生成的一段随机路径	LPUqHEAjfonOmohV
AccessKey	调用mass api的accessKey	RWxxxxxxxx0Gd

按照上面示例,最终请求快速听录接口的路径为

https://genaiapi.cloudsway.net/v1/ai/LPUqHEAjfoohV/speechtotext/transcriptions:transcribe?api-version=2024-11-15

请求方法

POST

请求路径

{basePath}/v1/ai/{endpointPath}/speechtotext/transcriptions:transcribe?api-version=2024-11-15

请求header

参数	必填	描述
Authorization	是	AccessKey Bearer ${AccessKey} Bearer RWxxxxxxxx0Gd

queryParams

参数	必填	描述
api-version	是	固定2024-11-15

请求formData

参数	必填	类型	描述
audio	是	音频文件	音频文件
definition	否	json字符串	配置选项

definition

参数	必填	描述
channels	否	要单独转录的声道的从零开始的索引列表。除非启用分割聚类，否则最多支持两个声道。默认情况下，快速听录 API 将所有输入声道合并到单个声道，然后执行听录。如果不希望这样处理，可以独立转录各个声道，而不进行合并。如果要从立体声音频文件中单独转录各个声道，需要指定 `[0,1]`、`[0]` 和 `[1]`。否则，立体声音频将合并为单声道，并且仅转录单个通声。如果音频是立体声且已启用分割聚类，则无法将 `channels` 属性设置为 `[0,1]`。语音服务不支持对多个声道进行分割聚类。对于单声道音频，系统将忽略 `channels` 属性，始终将音频作为单声道进行转录。
diarization	否	分割聚类配置。分割聚类是在一个音频声道中识别和分离说话人的过程。例如，指定 `"diarization": {"maxSpeakers": 2, "enabled": true}`。然后，听录文件会包含每个已转录短语的 `speaker` 个条目(例如 `"speaker": 0` 或 `"speaker": 1`)。
locales	否，但如果你知道预期的语言，建议指定语言。	语言列表应与要转录的音频数据的预期语言相匹配。如果知道音频文件的语言设置，可以指定它以提高听录准确性并最大程度地减少延迟。如果指定了单个语言，将使用该语言进行听录。但是，如果不确定所使用的语言，可以指定多种语言。候选语言列表越精确，语言识别可能越准确。如果未指定任何语言，或者音频文件中没有指定的语言，那么语音服务将尝试识别语言。如果无法识别语言，则会返回错误。受支持的语言设置包括：de-DE、en-IN、en-US、es-ES、es-MX、fr-FR、hi-IN、it-IT、ja-JP、ko-KR、pt-BR 和 zh-CN。
profanityFilterMode	否	指定如何处理识别结果中的不雅内容。接受的值为 `None`（禁用不雅内容筛选）、`Masked`（将不雅内容替换为星号）、`Removed`（从结果中删除所有不雅内容）或 `Tags`（添加不雅内容标记）。默认值为 `Masked`。

请求示例

curl --request POST \
  --url 'https://genaiapi.cloudsway.net/v1/ai/qyBrSaFJYWcM/speechtotext/transcriptions:transcribe?api-version=2024-11-15' \
  --header 'Authorization: Bearer ${AccessKey}' \
  --header 'Content-Type: multipart/form-data' \
  --form 'audio=@path/to/your/audio/file' \
  --form 'definition={
    "channels": [0],
    "locales": ["zh-CN"],
    "diarization": {
      "maxSpeakers": 2,
      "enabled": true
    },
    "profanityFilterMode": "Masked"
  }'

返回值

字段名	类型	描述
durationMilliseconds	整数	音频文件的总时长，单位为毫秒。
combinedPhrases	数组	合并后的短语列表
phrases	数组	各个短语的详细信息

combinedPhrases

字段名	类型	描述
text	字符串	合并后的短语文本。

phrases

字段名	类型	描述
speaker	字符串	说话者的标识符。
offsetMilliseconds	整数	短语在音频中的起始偏移，单位为毫秒。
durationMilliseconds	整数	短语的持续时间，单位为毫秒。
text	字符串	短语的文本内容。
words	数组	短语中的每个单词的详细信息
locale	字符串	短语的语言区域标识符。
confidence	浮点数	短语的识别置信度。

words

字段名	类型	描述
text	字符串	单词的文本内容。
offsetMilliseconds	整数	单词在短语中的起始偏移，单位为毫秒。
durationMilliseconds	整数	单词的持续时间，单位为毫秒。

返回值示例

{
    "durationMilliseconds": 1920,
    "combinedPhrases": [
        {
            "text": "Hello，我是谁啊？"
        }
    ],
    "phrases": [
        {
            "speaker": null,
            "offsetMilliseconds": 160,
            "durationMilliseconds": 1440,
            "text": "Hello，我是谁啊？",
            "words": [
                {
                    "text": "Hello，",
                    "offsetMilliseconds": 160,
                    "durationMilliseconds": 560
                },
                {
                    "text": "我",
                    "offsetMilliseconds": 720,
                    "durationMilliseconds": 240
                },
                {
                    "text": "是",
                    "offsetMilliseconds": 960,
                    "durationMilliseconds": 160
                },
                {
                    "text": "谁",
                    "offsetMilliseconds": 1120,
                    "durationMilliseconds": 240
                },
                {
                    "text": "啊？",
                    "offsetMilliseconds": 1360,
                    "durationMilliseconds": 240
                }
            ],
            "locale": "zh-CN",
            "confidence": 0.7978613
        }
    ]
}

音频文件支持

大小支持25M以下

WAV
MP3
OPUS/OGG
FLAC
WMA
AAC
WAV 容器中的 ALAW
WAV 容器中的 MULAW
AMR
WebM
M4A
SPEEX

MaaS-Arealtime-asr

请求协议

Http

Header

参数名	类型	描述
Authorization	string	鉴权

请求路径：

https://genaiapi.cloudsway.net/v1/ai/{endpoint}/audio/recognize

请求FormData：

参数名	类型	描述
file	file	需要识别的音频文件，音频时长不能超过30s。目前支持wav格式
recognitionLanguages	string	音频语言可能的范围，多个用逗号分隔。例如：en-US,es-MX
timeout	string	说话间隔超时时间

请求示例

curl --location 'https://genaiapi.cloudsway.net/v1/ai/QEnOdcLVKmTCP/audio/recognize' \
--header 'Authorization: Bearer ${AccessKey}' \
--header 'Connection: keep-alive' \
--form 'file=@path/to/your/audio/file' \
--form 'recognitionLanguages="en-US,es-MX"' \
--form 'timeout="100"'

响应结果：

参数名	类型	描述
text	string	识别的音频文本
language	string	识别的音频语言
duration	int	识别的音频时长，单位：
durationInSeconds	int	识别的音频时长，单位：秒

示例

{
    "text": "Cuando abrió los ojos por la mañana fue porque una joven empleada doméstica había entrado en su habitación para encender el fuego.",
    "language": "es-MX",
    "duration": 65200000,
    "durationInSeconds": 7
}

补充信息

支持语言

语言	区域设置 (BCP-47)
阿拉伯语	ar-AE ar-BH ar-DZ ar-EG ar-IQ ar-JO ar-KW ar-LY ar-MA ar-OM ar-QA ar-SA ar-SY ar-YE
丹麦语	da-DK
荷兰语	nl-NL
英语	en-AU
爱沙尼亚语	et-EE
芬兰语	fi-FI
法语	fr-CA fr-FR
德语	de-DE
希腊语	el-GR
古吉拉特语	gu-IN
希伯来语	he-IL
印地语	hi-IN
匈牙利语	hu-HU
印度尼西亚语	id-ID
孟加拉语	bn-IN
保加利亚语	bg-BG
加泰罗尼亚语	ca-ES
中文	zh-CN zh-HK zh-TW
丹麦语	da-DK
荷兰语	nl-NL
克罗地亚语	hr-HR
捷克语	cs-CZ
爱尔兰语	ga-IE
意大利语	it-IT
日语	ja-JP
卡纳达语	kn-IN
马拉雅拉姆语	ml-IN
韩语	ko-KR
拉脱维亚语	lv-LV
立陶宛语	lt-LT
马耳他语	mt-MT
马拉地语	mr-IN
挪威语	nb-NO
波兰语	pl-PL
葡萄牙语	pt-BR
	pt-PT
罗马尼亚语	ro-RO
俄语	ru-RU
斯洛伐克语	sk-SK
斯洛文尼亚语	sl-SI
西班牙语	es-AR es-BO es-CL es-CO es-CR es-CU es-DO es-EC es-SV es-GQ es-GT es-HN es-MX es-NI es-PA es-PY es-PE es-PR es-ES es-UY es-US es-VE
瑞典语	sv-SE
泰米尔语	ta-IN
泰卢固语	te-IN
泰语	th-TH
土耳其语	tr-TR
乌克兰语	uk-UA
越南语	vi-VN

MaaS-ASpeech-Translation

请求方法

POST

请求路径

{basePath}/v1/ai/{endpointPath}/audio/realtime/translation

请求header

参数	描述	示例
Authorization	AccessKey Bearer ${AccessKey}	Bearer RWxxxxxxxx0Gd
Host	服务的主机地址	genaiapi.cloudsway.net

请求参数

字段名	类型	是否必填	描述	示例值
targetLanguages	String	是	目标翻译语言列表	"en-US", "ja"
file	File	是	音频文件（30s内的音频文件）大于30秒的音频文件只会转录翻译前30秒	"C:\Users\zhcn_continuous_mode_sample.wav"
recognitionLanguages	String	是	识别语言列表，当传多个时，需要开启hasRecognize，否则按照第一个进行识别	"zh-CN", "en-US"
hasRecognize	String	否	是否需要识别，默认为false	"true"
SegmentationSilenceTimeoutMs	String	否	分段静默超时设置（毫秒） ,默认为2000	"1000"

返回值

字段名	类型	描述	示例值
text	String	原始的文本	"Good morning, Steve. Good morning, Katie. ..."
translations	Object	翻译结果，包含不同语言的翻译文本	{"ja": "おはようございます、スティーブ。おはようございます、ケイティ。..."}
language	String	音频原始语言	"en-US"
duration	Integer	音频时长，单位为百纳秒	286400000
resultId	String	任务结果的唯一标识符	"5518458c7dec4003b9281662d9c763a7"
durationInSeconds	Integer	音频时长，单位为秒	29

示例

请求

curl --location --request POST 'https://genaiapi.cloudsway.net/v1/ai/YAzGCqie/audio/realtime/translation' \
--header 'Authorization: Bearer ${AccessKey}' \
--header 'Connection: keep-alive' \
--form 'targetLanguages="en-US,ja"' \
--form 'file=@path/to/your/audio/file' \
--form 'recognitionLanguages="zh-CN,en-US"' \
--form 'hasRecognize="true"' \
--form 'SegmentationSilenceTimeoutMs="1000"'

返回值

{
  "text": "秋天总是那么那么富有诗意，树叶渐渐变红街道旁的银杏树开始落叶，人们穿上厚重的外套，享受着凉爽的秋风。黄昏时分，夕阳洒在街道上，给忙碌的一天增添了一抹温暖。无论是散步还是小憩，这个季节总能带来宁静和满足。",
  "translations": {
    "en-US": "Autumn is always so poetic, the leaves are turning red, the ginkgo trees along the streets are starting to lose their leaves, and people are wearing heavy coats and enjoying the cool autumn breeze. At dusk, the setting sun shines on the streets, adding a touch of warmth to a busy day. Whether it's a walk or a nap, this season always brings tranquility and fulfillment.",
    "ja": "秋はいつもとても詩的で、葉は赤く色づき、通り沿いのイチョウの木は葉を失い始め、人々は厚手のコートを着て涼しい秋の風を楽しんでいます。 夕暮れ時には、夕日が通りを照らし、忙しい一日に暖かさを加えます。 散歩でも昼寝でも、この季節はいつも静けさと充実感をもたらします。"
  },
  "language": "zh-CN",
  "duration": 260400000,
  "resultId": "ad03ee3a708e435dbe0ee808bb68f918",
  "durationInSeconds": 27
}