MaaS_Ele

MaaS_Ele_scribe_v2

请求协议

http

参数名	类型	必填	描述
`Content-Type`	string	是	固定为 `application/json`
`Authorization`	string	是	`Bearer {your_api_key}`

请求URL

POST https://genaiapi.cloudsway.net/v1/ai//stt-e/speech-to-text

查询参数

属性名	类型	必需/可选	描述
enable_logging	boolean	可选，默认为true	当enable_logging设置为false时，请求将使用零保留模式。这意味着此请求的日志和转录存储功能将不可用。零保留模式仅企业客户可以使用。

请求参数

属性名	类型	必需/可选	描述
file	files	可选	要转录的文件。支持所有主要的音频和视频格式。必须提供文件或云存储URL参数中的一个。文件 ze必须小于3.0GB。
language_code	string or null	可选	与音频文件语言对应的 ISO-639-1 或 ISO-639-3 语言代码。如果事先知道，有时可以提高转录性能。默认为 null，在这种情况下，语言将自动预测。
tag_audio_events	boolean	可选，默认为`true`	是否在转录中标记（笑声）、（脚步声）等音频事件。
num_speakers	integer or nul	可选1 - 32	上传文件中说话者的最大数量。有助于预测何时是谁在说话。可预测的说话者最大数量为32。默认值为null，在这种情况下，说话者数量将设置为模型支持的最大值。
timestamps_granularity	enum	可选默认值为`单词`	转录中时间戳的粒度。‘word’提供词级时间戳，‘character’提供每个词的字符级时间戳。允许的值：无、单词、字符
diarize	boolean	可选默认值为`false`	是否标注上传文件中当前正在说话的是哪位发言者。
diarization_threshold	double or null	可选`0.1 - 0.4`	在说话人分离过程中应用的分离阈值。较高的值意味着一个说话人被分离为两个不同说话人的可能性较低，但两个不同说话人被分离为一个说话人的可能性较高（预测的总说话人数量较少）。较低的值意味着一个说话人被分离为两个不同说话人的可能性较高，但两个不同说话人被分离为一个说话人的可能性较低（预测的总说话人数量较多）。只有在diarize=True且num_speakers=None时才能设置。默认为None，在这种情况下，我们将根据model_id选择一个阈值（通常为0.22）。
additional_formats	list of objects	可选	要将文稿导出为的其他格式列表。
file_format	enum	可选，默认为`其他`	输入音频的格式。选项为 ‘pcm_s16le_16’ 或 ‘other’。对于`pcm_s16le_16`，输入音频必须是 16kHz 采样率、单声道（单声道）、小端字节序的 16 位 PCM。与传递编码波形相比，延迟会更低。允许的值：pcm_s16le_16 其他
cloud_storage_url	string or null	可选官方已弃用	要转录的文件的 HTTPS URL。必须提供文件或 cloud_storage_url 参数中的一个。该文件必须可以通过 HTTPS 访问，且文件大小必须小于 2GB。任何有效的 HTTPS URL 都被接受，包括来自云存储提供商（AWS S3、Google Cloud Storage、Cloudflare R2 等）、CDN 或任何其他 HTTPS 源的 URL。URL 可以是预签名的，也可以在查询参数中包含身份验证令牌。
temperature	double or null	可选`0-2`	控制转录输出的随机性。接受0.0到2.0之间的值，值越高，结果越多样且越不确定。如果省略，我们将根据您选择的模型使用一个温度值，通常为0。
seed	integer or null	可选 0-2147483647	如果指定，我们的系统将尽力进行确定性采样，这样使用相同种子和参数的重复请求应该返回相同的结果。但不保证确定性。必须是介于 0 和 2147483647 之间的整数。
use_multi_channel	boolean	可选，默认为`false`	音频文件是否包含多个声道，且每个声道包含单个说话者。启用后，每个声道将独立进行转录，结果将合并。响应中的每个单词都将包含一个“channel_index”字段，指示该单词是在哪个声道中说出的。最多支持5个声道。
no_verbatim	boolean	可选，默认为`false`	仅支持 scribe_v2 型号。如果为true，转录不会有任何填充词、假开头和非语音音。
entity_detection	string or list of strings	可选	仅支持 scribe_v2 型号。检测记录中的实体。可以是"all"来检测所有实体，也可以是单个实体类型或类别字符串，或者是实体类型/类别列表。类别包括"pii"、"phi"、"pci"、"other"、"offensive_language"。具体可填参数见官网文档启用后，检测到的实体会返回entities字段，并显示其文本、类型和字符位置。使用该参数会产生额外成本。
entity_redaction	string or list of strings	可选	仅支持 scribe_v2 型号。从文字记录文本中删除实体。接受与 entity_detection 相同的格式："all"、类别（如"pii、"phi"）或特定实体类型。一定是 entity_detection 的一个子集。当启用时，entities中匹配到的字段值不会返回。
entity_redaction_mode	string	可选（redacted、entity_type、enumerated_entity_type），默认为`enumerated_entity_type`	仅支持 scribe_v2 型号。如何格式化被编辑的实体。'redacted' 替换为 {REDACT}，'entity_type' 替换为 {ENTITY_TYPE}，'enumerated_entity_type' 替换为 {ENTITY_TYPE_N}，其中 N 枚举每次出现。只在 entity_redaction 设定时使用。
keyterms	list of strings	可选	仅支持 scribe_v2 型号。一份关键词列表，用于偏向转录。关键词是你希望模型更准确识别的词语或短语。关键词数量不得超过 1000 个。每个关键词的长度必须小于 50 个字符。关键词最多包含 5 个词（归一化后）。例如[“hello”、“world”、“technical term”]。使用该参数会产生额外成本。当提供超过 100 个关键词时，每个请求的最低计费时长为 20 秒。
source_url	string	可选	用于转录的音频或视频文件的 URL。支持托管视频或音频文件、YouTube 视频 URL、TikTok 视频 URL 以及其他视频托管服务。

关于scribe_v2实体检测

实测原厂接口

请求参数包含entity_detection时，能触发实体检测机制，响应的header中character-cost会增加额外成本
请求包含entity_redaction但不包含entity_detection时，也能触发实体检测，但响应header中character-cost不会增加额外成本

additional_formats

属性名	类型	必需/可选	描述
format	string	Required	Enum value: segmented_json docx pdf txt html srt
max_characters_per_line	integer or null	OptionalDefaults to `100`	当格式为txt或srt时支持此参数
include_speakers	boolean	Optional Defaults to `true`
include_timestamps	boolean	Optional Defaults to `true`
segment_on_silence_longer_than_s	double or null	Optional
max_segment_duration_s	double or null	Optional
max_segment_chars	integer or null	Optional

请求示例

curl 'https://genaiapi.cloudsway.net/v1/ai/{Your EndpointPath}/stt-e/speech-to-text' \
-H 'Authorization: Bearer {Your AK}' \
-F 'file=@"postman-cloud:///1f0e2394-2f6d-4010-a1e8-ad821a2b9a3a"' \
-F 'tag_audio_events="true"'

响应示例

单通道响应

{
    "words": [
        {
            "start": 0.159,
            "type": "word",
            "logprob": 0.0,
            "end": 0.359,
            "text": "A"
        },
        {
            "start": 0.359,
            "type": "spacing",
            "logprob": 0.0,
            "end": 0.36,
            "text": " "
        },
        {
            "start": 0.36,
            "type": "word",
            "logprob": 0.0,
            "end": 0.679,
            "text": "shared"
        },
        {
            "start": 0.679,
            "type": "spacing",
            "logprob": 0.0,
            "end": 0.679,
            "text": " "
        },
        {
            "start": 0.679,
            "type": "word",
            "logprob": 0.0,
            "end": 0.919,
            "text": "goal"
        },
        {
            "start": 0.919,
            "type": "spacing",
            "logprob": 0.0,
            "end": 0.959,
            "text": " "
        },
        {
            "start": 0.959,
            "type": "word",
            "logprob": 0.0,
            "end": 1.039,
            "text": "is"
        },
        {
            "start": 1.039,
            "type": "spacing",
            "logprob": 0.0,
            "end": 1.059,
            "text": " "
        },
        {
            "start": 1.059,
            "type": "word",
            "logprob": 0.0,
            "end": 1.159,
            "text": "the"
        },
        {
            "start": 1.159,
            "type": "spacing",
            "logprob": 0.0,
            "end": 1.179,
            "text": " "
        },
        {
            "start": 1.179,
            "type": "word",
            "logprob": 0.0,
            "end": 1.519,
            "text": "heartbeat"
        },
        {
            "start": 1.519,
            "type": "spacing",
            "logprob": 0.0,
            "end": 1.539,
            "text": " "
        },
        {
            "start": 1.539,
            "type": "word",
            "logprob": 0.0,
            "end": 1.659,
            "text": "of"
        },
        {
            "start": 1.659,
            "type": "spacing",
            "logprob": 0.0,
            "end": 1.659,
            "text": " "
        },
        {
            "start": 1.659,
            "type": "word",
            "logprob": 0.0,
            "end": 2.199,
            "text": "teamwork."
        }
    ],
    "language_code": "eng",
    "transcription_id": "DH4a9n1brE06R8BUQPEx",
    "language_probability": 0.9907509684562683,
    "text": "A shared goal is the heartbeat of teamwork."
}

多渠道响应

{
  "transcripts": [
    {
      "language_code": "en",
      "language_probability": 0.98,
      "text": "Hello from channel one.",
      "words": [
        {
          "text": "Hello",
          "start": 0,
          "end": 0.5,
          "type": "word",
          "speaker_id": "speaker_0",
          "logprob": -0.124
        }
      ]
    },
    {
      "language_code": "en",
      "language_probability": 0.97,
      "text": "Greetings from channel two.",
      "words": [
        {
          "text": "Greetings",
          "start": 0.1,
          "end": 0.7,
          "type": "word",
          "speaker_id": "speaker_1",
          "logprob": -0.156
        }
      ]
    }
  ]
}

v2实体检测请求示例

curl 'https://genaiapi.cloudsway.net/v1/ai/{Your EndpointPath}/stt-e/speech-to-text' \
-H 'Authorization: Bearer {Your AK}' \
-H 'Content-Type: application/json' \
-F 'file=@"postman-cloud:///1f0e2394-2f6d-4010-a1e8-ad821a2b9a3a"' \
-F 'entity_detection="pii"'\
-F 'entity_redaction="name"'

v2实体检测响应实例

{
    "language_code": "eng",
    "language_probability": 0.9869821071624756,
    "text": "My name is {NAME_0}. My date of birth is the 12th of July 1987, and my credit card number is 4242-4242-4242-4242.",
    "words": [
        { "text": "My", "start": 0.099, "end": 0.259, "type": "word", "logprob": 0.0 },
        { "text": " ", "start": 0.259, "end": 0.299, "type": "spacing", "logprob": 0.0 },
        { "text": "name", "start": 0.299, "end": 0.42, "type": "word", "logprob": 0.0 },
        ...
    ],
    "transcription_id": "Y2ZX8AxHUzTPCIualYiE",
    "entities": [
        {
            "text": "{NAME_0}",
            "entity_type": "name",
            "start_char": 11,
            "end_char": 15
        },
        {
            "text": "12th of July 1987",
            "entity_type": "dob",
            "start_char": 41,
            "end_char": 58
        },
        {
            "text": "4242-4242-4242-4242",
            "entity_type": "credit_card",
            "start_char": 89,
            "end_char": 108
        }
    ]
}

MaaS_Ele_voice_clones

请求url

POST https://genaiapi.cloudsway.net/v1/ai//elevenlabs/voices/add

请求参数

属性名	类型	必需/可选	描述
name	string	必需	标识此语音的名称。该名称将显示在网站的下拉菜单中。
files	files	必需	用于语音克隆的音频录制文件路径列表。
remove_background_noise	boolean	可选	可选，默认为 false
description	string or null	可选	语音的描述。
labels	string or null	可选	语音的序列化标签字典。

请求示例

curl 'https://genaiapi.cloudsway.net/v1/ai/{Your EndpointPath}/elevenlabs/voices/add' \
-H 'Authorization: Bearer {Your AK}' \
-H 'Content-Type: application/json' \
-F 'name="girl-voice-1"' \
-F 'files=@"postman-cloud:///1f0e238b-1f03-4200-bab6-8616a7296adb"' \
-F 'remove_background_noise="false"'

响应示例

{
  "voice_id": "c38kUX8pkfYO2kHyqfFy",
  "requires_verification": false
}

MaaS_Ele_tts_v3/MaaS_Ele_tts_v1

请求url

POST https://genaiapi.cloudsway.net/v1/ai/}/tts-e/text-to-speech/{voice_id

参数

属性名	类型	必需/可选	描述
voice_id	string	必填	要使用的语音的 ID。

查询参数

属性名	类型	必需/可选	描述
enable_logging	boolean	可选，默认为`true`	当 enable_logging 设置为 false 时，请求将使用零保留模式。这意味着此请求的历史功能将不可用，包括请求拼接。零保留模式仅企业客户可以使用。
optimize_streaming_latency （Deprecated）	integer or null	可选，默认为None	你可以开启延迟优化，但会在一定程度上牺牲质量。最终延迟的最优值因模型而异。可能的值：0 - 默认模式（无延迟优化）1 - 普通延迟优化（约为选项3可能延迟改善的50%）2 - 强力延迟优化（约为选项3可能延迟改善的75%）3 - 最大延迟优化4 - 最大延迟优化，但同时关闭文本归一化器以进一步节省延迟（延迟最优，但可能会读错数字和日期等）
output_format	enum	可选，默认为`mp3_44100_128`	生成音频的输出格式。格式为编解码器_采样率_比特率。因此，采样率为22.05kHz、比特率为32kbps的MP3表示为mp3_22050_32。比特率为192kbps的MP3要求您订阅创作者级别或更高级别。采样率为44.1kHz的PCM要求您订阅专业级别或更高级别。请注意，μ-law格式（有时写作mu-law，常近似为u-law）通常用于Twilio音频输入。支持21个枚举值。 mp3_22050_32 mp3_24000_48 mp3_44100_32 mp3_44100_64 mp3_44100_96 mp3_44100_128 mp3_44100_192 pcm_8000 pcm_16000 pcm_22050 pcm_24000 pcm_32000 pcm_44100 pcm_48000 ulaw_8000 alaw_8000 opus_48000_32 opus_48000_64 opus_48000_96 opus_48000_128

请求参数

属性名	类型	必需/可选	描述
text	string	必填	将被转换为语音的文本。
model_id	string	否	模型ID
language_code	string or null	可选	语言代码（ISO 639-1）用于强制模型使用某种语言并进行文本规范化。如果模型不支持提供的语言代码，将返回错误。
voice_settings	object or null	可选	语音设置将覆盖给定语音的存储设置。这些设置仅在给定请求中应用。
pronunciation_dictionary_locators	list of objects or null	可选	要应用于文本的发音词典定位器列表（id，版本id）。它们将按顺序应用。每个请求最多可包含3个定位器
seed	integer or null	可选	如果指定，我们的系统将尽力进行确定性采样，这样使用相同种子和参数的重复请求应该返回相同的结果。但不保证确定性。必须是0到4294967295之间的整数。
previous_text	string or null	可选	当前请求文本之前的文本。可用于在拼接多个生成结果时提高语音的连贯性，或影响当前生成中语音的连贯性。
next_text	string or null	可选	当前请求文本之后的文本。在拼接多个生成结果时可用于提高语音的连贯性，或在当前生成中影响语音的连贯性。
previous_request_ids	list of strings or null	可选	本次生成之前生成的样本的请求 ID 列表。在将大型任务拆分为多个请求时，可用于提高语音的连贯性。在各次生成中使用相同模型时，效果最佳。如果同时发送 previous_text 和 previous_request_ids，则 previous_text 将被忽略。最多可发送 3 个请求 ID。
next_request_ids	list of strings or null	可选	此生成之后的样本的请求 ID 列表。next_request_ids 在重新生成存在音频质量问题的样本时，对于保持语音的连贯性特别有用。例如，如果您已经生成了 3 个语音片段，并且想要改进片段 2，将片段 3 的请求 ID 作为 next_request_id（将片段 1 的请求 ID 作为 previous_request_id）传递，将有助于保持合成语音的自然流畅性。在各代生成中使用相同模型时，效果最佳。如果同时发送 next_text 和 next_request_ids，则 next_text 将被忽略。最多可以发送 3 个请求 ID。
apply_text_normalization	enum	可选，默认为`自动`	此参数通过三种模式控制文本规范化：“自动”、“开启”和“关闭”。设置为“自动”时，系统将自动决定是否应用文本规范化（例如，将数字拼写出来）。设置为“开启”时，文本规范化将始终应用，而设置为“关闭”时，则会跳过。允许的值：自动、开、关

voice_settings

属性名	类型	必需/可选	描述
stability	double or null	可选`0-1`默认为`0.5`	决定语音的稳定性以及每次生成之间的随机性。较低的值会为语音引入更广泛的情感范围。较高的值可能导致语音单调，情感有限。
use_speaker_boost	boolean or null	可选，默认为`true`	此设置可增强与原始说话者的相似度。使用此设置需要稍高的计算负载，这反过来又会增加延迟。
similarity_boost	double or null	可选`0-1`默认值为`0.75`	确定AI在尝试复制原始声音时应与原始声音的贴合程度。
style	double or null	可选，默认为`0`	确定语音的风格夸张程度。此设置会尝试放大原说话者的风格。如果设置为非 0 值，确实会消耗额外的计算资源，并且可能会增加延迟。
speed	double or null	可选，默认为`1`	调整语音的速度。值为1.0是默认速度，小于1.0的值会减慢语速，大于1.0的值会加快语速。

请求示例

curl 'https://genaiapi.cloudsway.net/v1/ai/{Your EndpointPath}/tts-e/text-to-speech/{voice_id}?enable_logging=false' \
-H 'Authorization: Bearer {Your AK}' \
-H 'Content-Type: application/json' \
-d '{
    "text": "A shared goal is the heartbeat of teamwork."
}'

MaaS_Ele_eleven_multilingual_sts_v2

请求url

https://genaiapi.cloudsway.net/v1/ai/}/eleven-labs/voice-changer/{voice_id

参数

参数	类型	必需/可选	描述
voice_id	string	必填	要使用的语音的ID。

查询参数

参数	类型	必需/可选	描述
enable_logging	boolean	可选，默认为`true`	当 enable_logging 设置为 false 时，请求将使用零保留模式。这意味着此请求的历史功能将不可用，包括请求拼接。零保留模式仅企业客户可以使用。
optimize_streaming_latency （Deprecated）	integer or null	可选，默认为None	你可以开启延迟优化，但会在一定程度上牺牲质量。最终延迟的最优值因模型而异。可能的值：0 - 默认模式（无延迟优化）1 - 普通延迟优化（约为选项3可能延迟改善的50%）2 - 强力延迟优化（约为选项3可能延迟改善的75%）3 - 最大延迟优化4 - 最大延迟优化，但同时关闭文本归一化器以进一步节省延迟（延迟最优，但可能会读错数字和日期等）
output_format	enum	可选，默认为`mp3_44100_128`	生成音频的输出格式。格式为编解码器_采样率_比特率。因此，采样率为22.05kHz、比特率为32kbps的MP3表示为mp3_22050_32。比特率为192kbps的MP3要求您订阅创作者级别或更高级别。采样率为44.1kHz的PCM要求您订阅专业级别或更高级别。请注意，μ-law格式（有时写作mu-law，常近似为u-law）通常用于Twilio音频输入。支持21个枚举值。 mp3_22050_32 mp3_24000_48 mp3_44100_32 mp3_44100_64 mp3_44100_96 mp3_44100_128 mp3_44100_192 pcm_8000 pcm_16000 pcm_22050 pcm_24000 pcm_32000 pcm_44100 pcm_48000 ulaw_8000 alaw_8000 opus_48000_32 opus_48000_64 opus_48000_96 opus_48000_128 opus_48000_192

请求参数

参数	类型	必需/可选	描述
audio	file	必填	包含将控制生成语音的内容和情感的音频文件。
model_id	string	可选（默认值为`eleven_english_sts_v2`）	将使用的模型的标识符，你可以使用 GET /v1/models 查询它们。该模型需要支持语音到语音功能，你可以使用 can_do_voice_conversion 属性检查这一点。
voice_settings	string	可选	语音设置将覆盖给定语音的存储设置。这些设置仅在给定请求中应用。需要作为JSON编码字符串发送。
seed	integer	可选	如果指定，我们的系统将尽力进行确定性采样，这样使用相同种子和参数的重复请求应该返回相同的结果。但不保证确定性。必须是0到4294967295之间的整数。
remove_background_noise	boolean	可选（默认为`false`）	如果启用，将使用我们的音频隔离模型去除音频输入中的背景噪音。仅适用于变声器。
file_format	enum	可选	输入音频的格式。选项为 ‘pcm_s16le_16’ 或 ‘other’。对于`pcm_s16le_16`，输入音频必须是 16kHz 采样率、单声道（单声道）、小端字节序的 16 位 PCM。与传递编码波形相比，延迟会更低。允许的值：pcm_s16le_16 其他

响应

生成的音频文件

请求示例

curl 'https://genaiapi.cloudsway.net/v1/ai/{Your EndpointPath}/eleven-labs/voice-changer/{voice_id}?enable_logging=false' \
    -H 'Authorization: Bearer {Your AK}' \
    -H "Content-Type: multipart/form-data" \
    -F "audio=@/path/to/input.mp3" \
    -F "remove_background_noise=true" \
    -F "seed=12345"

附录：

可选voice_id

name	Voice ID	Introduce
Alice	Xb7hH8MSUJpSbSDYk0k2	Clear and engaging, friendly woman with a British accent suitable for e-learning.
Adam	pNInz6obpgDQGcFmaJgB	A bright tenor pitch that immediately cuts through. The delivery is brash and openly confident, speaking with unwavering certainty and a slightly aggressive self-assurance.
Bella	hpp4J3VqNfWAUOO0d1Us	This voice is warm, bright, and professional, characterized by a Standard American accent and a polished, narrative quality. It features a medium-high pitch with crisp diction and a deliberate, rhythmic pace that makes it highly intelligible and engaging for long-form listening.
Bill	pqHfZKP75CvOlQylNhV4	Friendly and comforting voice ready to narrate your stories
Brian	nPczCjzI2devNBz1zQrb	Middle-aged man with a resonant and comforting tone. Great for narrations and advertisements.
Callum	N2lVS1w4EtoT3dr4eOWO	Deceptively gravelly, yet unsettling edge.
Charlie	IKne3meq5aSn9XLyUdCD	A young Australian male with a confident and energetic voice.
Chris	iP95p4xoKVk53GoZ742B	Natural and real, this down-to-earth voice is great across many use-cases.
Daniel	onwK4e9ZLuTAKqWW03F9	A strong voice perfect for delivering a professional broadcast or news story.
Eric	cjVigY5qzO86Huf0OWal	A smooth tenor pitch from a man in his 40s - perfect for agentic use cases.
George	JBFqnCBsd6RMkjVDRZzb	Warm resonance that instantly captivates listeners. British male, storytelling
Harry	SOYHLrjzK2X1ezoPC6cr	An animated warrior ready to charge forward. Male, American, characters
Jessica	cgSgspJ2msm6clMCkdW9	Young and popular, this playful American female voice is perfect for trendy content.
Laura	FGY2WhTYpPnrIDTdsKH5	This young adult female voice delivers sunny enthusiasm with a quirky attitude.
Liam	TX3LPaxmHKxFdv7VOQHJ	A young adult with energy and warmth - suitable for reels and shorts.
Lily	pFZP5JQG7iQjIQuC4Bku	Velvety British female voice delivers news and narrations with warmth and clarity.
Matilda	XrExE9yKIg1WjnnlVkGX	A professional woman with a pleasing alto pitch. Suitable for many use cases.
River	SAz9YHcvj6GT2YYXdXww	A relaxed, neutral voice ready for narrations or conversational projects
Roger	CwhRBWXzGAHq8TQ4Fs17	Easy going and perfect for casual conversations. Male, American
Sarah	EXAVITQu4vr4xnSDxMaL	Young adult woman with a confident and warm, mature quality and a reassuring, professional tone.
Will	bIHbv24MWmeRgasZH58o	Conversational and laid back.

MaaS_Ele

MaaS_Ele_scribe_v2

请求协议

Header

请求URL

查询参数

请求参数

关于scribe_v2实体检测

MaaS_Ele_voice_clones

请求url

请求参数

请求示例

MaaS_Ele_tts_v3/MaaS_Ele_tts_v1

请求url

参数

查询参数

请求参数

请求示例

MaaS_Ele_eleven_multilingual_sts_v2

请求url

参数

查询参数

请求参数

响应

请求示例

附录：