MaaS_FishAudio
Request Protocol
Https
Text-to-Speech
Request URL
POST https://genaiapi.cloudsway.net/v1/ai/${ENDPOINT_PATH} /fishaudio/v1/tts
Request Parameters
Header
-
Authorization,Bearer \
-
Content-Type,支持application/json和application/msgpack
Note: When using application/msgpack, the request body must be converted to msgpack format for transmission
Body
| Parameter Name | Field Type | Is it required? | Default Value | Description |
|---|---|---|---|---|
| text | string | is | - | Text to be synthesized into speech |
| temperature | number | No | 0.7 | 0\~1, the higher the value, the more random; the lower the value, the more stable |
| top_p | number | No | 0.7 | 0\~1, nucleus sampling, controls diversity |
| references | object[] | No | - | ReferenceAudio Array |
| references.audio | file | is | - | Original audio ByteDance data, supported formats: WAV, MP3, FLAC. For best results, please use clear speech of 10-30 seconds and minimize background noise. |
| references.text | string | is | - | Accurate transcription of the content described in the audio sample. Accuracy is crucial for the quality of voice cloning. |
| reference_id | string | No | - | Voice Model ID |
| prosody | object | No | - | Output speed and volume adjustment |
| prosody.speed | string |
No | 1.0 | Speed. Effective range: 0.5 to 2.0. 1.0 = normal speed, 0.5 = half speed, 2.0 = double speed |
| prosody.volume | number | No | 0 | Volume, measured in decibels (dB). 0 = no change, positive values = louder, negative values = quieter |
| prosody.normalize_loudness | string | No | true | Normalize the output volume to make the perceived volume more consistent, only applicable to S2-Pro. Optional, default true |
| chunk_length | int | No | 300 | Size of text segment for processing. 100 \<= x \<= 300 |
| normalize | bool | No | true | Standardize English and Chinese texts while improving the stability of numbers. |
| format |
string | No | mp3 | Output audio formats: wav, pcm, mp3, opus |
| sample_rate | int |
No | Audio sampling rate is in Hz. When empty, the default value for the format is used (44100 Hz for most formats, 48000 Hz for opus). | |
| mp3_bitrate | int | No | 128 | MP3 bit rate, in kbps, with optional values: 64, 128, 192. Only applicable when the format is mp3. |
| opus_bitrate | int | No | -1000 | Opus bit rate, in bps, optional values: -1000, 24000, 32000, 48000, 64000. -1000 means automatic, only applicable when the format is opus. |
| latency | string | No | normal | Trade-off between latency and quality: low, normal, balanced, representing minimum latency, optimal quality, and balance respectively |
| max_new_tokens | int | No | 1024 | Maximum number of audio tokens that can be generated for each text segment. |
| repetition_penalty | number | No | 1.2 | Penalty mechanism for the repeated audio pattern. When the value is higher than 1.0, it will reduce repetition. |
| min_chunk_length | int | No | 50 | Minimum number of characters before splitting into a new block. 0 \<= x \<= 100 |
| condition_on_previous_chunks | boolean | No | true | Use the previous audio as context to maintain voice consistency. |
| early_stop_threshold | number | No | 1 | Early stopping threshold for batch processing: 0 \<= x \<= 1 |
Supports creating mono and multi-channel voices. Mono selection options:
-
Specify the voice model using reference_id
-
Upload user reference audio using references (ReferenceAudio array)
Multi-channel selection method:
-
reference_id: An array of voice model IDs, e.g.,
["speaker-0-id", "speaker-1-id"], text: Use speaker tags<|speaker:0|>,<|speaker:1|>, etc., to indicate speaker changes, e.g.,<|speaker:0|>Hello!<|speaker:1|>Hello there! -
references: A 2D array where each inner array contains references for a speaker, reference_id: An array of identifiers (for zero-shot learning, it can be any string)
Response Parameters
The return value is audio data
Request Example
curl --location 'https://genaiapi.cloudsway.net/v1/ai/${ENDPOINT_PATH}/fishaudio/v1/tts' \
--header 'Authorization: Bearer {YOUR_AK}' \
--header 'Content-Type: application/json' \
--data '{
"text": "Hello! Welcome to Fish Audio.",
"reference_id": "model-id",
"temperature": 0.7,
"top_p": 0.7,
"prosody": {
"speed": 1,
"volume": 0,
"normalize_loudness": true
},
"chunk_length": 300,
"normalize": true,
"format": "mp3",
"sample_rate": 44100,
"mp3_bitrate": 128,
"latency": "normal",
"max_new_tokens": 1024,
"repetition_penalty": 1.2,
"min_chunk_length": 50,
"condition_on_previous_chunks": true,
"early_stop_threshold": 1
}'
Response Example
Speech to Text
Request URL
POST https://genaiapi.cloudsway.net/v1/ai/${ENDPOINT_PATH}/ fishaudio/v1/asr
Request Parameters
Header
-
Authorization,Bearer \<YOUR_AK>
-
Content-Type,supports multipart/form-data and application/msgpack
Body
| Parameter Name | Field Type | Is it required? | Default Value | Description |
|---|---|---|---|---|
| audio | file | is | - | Audio file to be converted to text |
| language | string | No | null | Specify the language of the audio. If left blank, it will be automatically detected. |
| ignore_timestamps | boolean | No | true | Whether to return an exact timestamp in the text. For audio shorter than 30 seconds, enabling timestamps increases latency |
Response Parameter
| Parameter Name | Field Type | Description |
|---|---|---|
| text | string | Transformed text content |
| duration | number | Audio Duration (seconds) |
| segments | object[] | Segmented array, containing each segment of text and its timestamp |
| segments.text | string | Text content of this paragraph |
| segments.start | number | Start time of this segment (seconds) |
| segments.end | number | End time (seconds) of this segment |
Request Example
# form-data
curl --location 'https://genaiapi.cloudsway.net/v1/ai/${ENDPOINT_PATH}/fishaudio/v1/asr' \
--header 'Authorization: Bearer {YOUR_AK}' \
--form 'audio=@"/path/to/audio.mp3"' \
--form 'language="zh"' \
--form 'ignore_timestamps=false'
# msgpack
curl --location 'https://genaiapi.cloudsway.net/v1/ai/{ENDPOINT_PATH}/fishaudio/asr' \
--header 'Authorization: Bearer {YOUR_AK}' \
--header 'Content-Type: application/msgpack' \
--data-binary '@request.msgpack'
Response Example
{
"text": "你好,欢迎使用Fish Audio语音转文字服务。",
"duration": 5.2,
"segments": [
{
"text": "你好,",
"start": 0.0,
"end": 0.8
},
{
"text": "欢迎使用Fish Audio",
"start": 0.8,
"end": 3.5
},
{
"text": "语音转文字服务。",
"start": 3.5,
"end": 5.2
}
]
}
Create Sound Model
Request URL
POST https://genaiapi.cloudsway.net/v1/ai/${ENDPOINT_PATH}/ fishaudio/model
Request Parameters
Header
-
Authorization,Bearer \
-
Content-Type,supports multipart/form-data and application/msgpack
Body
| Parameter Name | Field Type | Is it required? | Default Value | Description |
|---|---|---|---|---|
| type | string | is | - | Model type, fixed as "tts" |
| title | string | is | - | Model Title or Name |
| train_mode | string | is | - | Training mode, fixed as "fast", indicates that the model can be used immediately after creation |
| voices |
file[] | is | - | Array of audio files for model tuning, supporting WAV, MP3, and FLAC formats |
| visibility | string | No | public | Model visibility: public (publicly displayed on the discovery page), unlist (accessible to anyone via a link), private (visible only to the creator) |
| description | string | No | Model Description | |
| cover_image | file | No | Model cover image, required when visibility is public | |
| texts | string[] | No | Text content corresponding to voices, if not specified, ASR will be performed on the audio | |
| tags | string[] | No | Model Label | |
| enhance_audio_quality | boolean | No | true | Whether to enhance audio quality |
| generate_sample | boolean | No | false | Whether to generate default text samples |
Response Parameter
| Parameter Name | Field Type | Description |
|---|---|---|
| _id | string | Model ID |
| type | string | Model Type (svc/tts) |
| title | string | Model Title |
| description | string | Model Description |
| cover_image | string | |
| train_mode | string | Training Mode (fast/full) |
| state | string | Model Status (created/training/trained/failed) |
| tags | string[] | tag array |
| samples | object[] | Sample Array |
| samples.title | string | |
| samples.text | string | |
| samples.task_id | string | |
| samples.audio | string | |
| created_at | string | Creation Time |
| updated_at | string | Update Time |
| languages | string[] | Language Array |
| visibility | string | Visibility (public/unlist/private) |
| lock_visibility | boolean | Is visibility locked? |
| dmca_taken_down | boolean | |
| default_text | string | |
| quality | object | |
| quality.created_at | string | |
| quality.updated_at | string | |
| quality.audios | object[] | |
| quality.audios.filename | string | |
| quality.audios.duration_ms | string | |
| quality.audios.language | string | |
| quality.audios.quality | object | |
| quality.audios.quality.{key} | string | |
| quality.audios.quality_passed | string | |
| quality.audios.quality_reason | string | |
| like_count | integer | Number of likes |
| mark_count | integer | Number of Collections |
| shared_count | integer | Share Count |
| task_count | integer | Number of Tasks |
| author | object | Author Information |
| author._id | string | |
| author.nickname | string | |
| author.avatar | string | |
| unliked | boolean | |
| liked | boolean | |
| marked | boolean |
Request Example
# multipart/form-data
curl --location 'https://genaiapi.cloudsway.net/v1/ai/${ENDPOINT_PATH}/fishaudio/model' \
--header 'Authorization: Bearer {YOUR_AK}' \
--form 'type="tts"' \
--form 'title="我的声音模型"' \
--form 'train_mode="fast"' \
--form 'voices=@"/path/to/audio1.wav"' \
--form 'voices=@"/path/to/audio2.wav"' \
--form 'texts="这是第一段音频的文本"' \
--form 'texts="这是第二段音频的文本"' \
--form 'visibility="private"' \
--form 'tags=["中文","女声"]' \
--form 'enhance_audio_quality=true'
# msgpack
curl --location 'https://genaiapi.cloudsway.net/v1/ai/{ENDPOINT_PATH}/fishaudio/model' \
--header 'Authorization: Bearer {YOUR_AK}' \
--header 'Content-Type: application/msgpack' \
--data-binary '@request.msgpack'
Response Example
{
"_id": "model-abc123",
"type": "tts",
"title": "我的声音模型",
"description": "",
"cover_image": "",
"train_mode": "fast",
"state": "created",
"tags": ["中文", "女声"],
"samples": [],
"created_at": "2024-01-01T00:00:00Z",
"updated_at": "2024-01-01T00:00:00Z",
"languages": ["zh"],
"visibility": "private",
"lock_visibility": false,
"dmca_taken_down": null,
"default_text": "",
"quality": null,
"like_count": 0,
"mark_count": 0,
"shared_count": 0,
"task_count": 0,
"unliked": false,
"liked": false,
"marked": false,
"author": {
"_id": "user-xyz789",
"nickname": "用户名",
"avatar": "https://example.com/avatar.jpg"
}
}
Get Model List
Request URL
GET https://genaiapi.cloudsway.net/v1/ai/${ENDPOINT_PATH}/ fishaudio/model
Request Parameters
Header
- Authorization,Bearer \
Query Parameters
| Parameter Name | Field Type | Is it required? | Default Value | Description |
|---|---|---|---|---|
| page_size | integer | No | 10 | Items per page, minimum value is 1 |
| page_number | integer | No | 1 | Page number, minimum value is 1 |
| title | string | No | null | Filter models by title |
| tag | string[] | No | null | Filter models by label, supporting a single label or an array of labels |
| self | boolean | No | false | If true, only return models created by the current user |
| author_id | string | No |
null | Filter by author ID, ignore this parameter when self is true |
| language | string[] | No | null | Filter models by language |
| title_language | string[] | No | null | Filter models by title language |
| sort_by | string |
No | score | Sorting methods: score (rating), task_count (number of tasks), created_at (creation time) |
Response Parameters
| Parameter Name | Field Type | Description |
|---|---|---|
| total | integer | Total number of models |
| items | object[] | Model List Array |
| items._id | string | Model ID |
| items.type | string | Model Type (svc/tts) |
| items.title | string | Model Title |
| items.state | string | Model Status (created/training/trained/failed) |
| items.tags | string[] | tag array |
| items.created_at | string | Creation Time |
| items.updated_at | string | Update Time |
| items.visibility | string | Visibility (public/unlist/private) |
| items.like_count | integer | Number of likes |
| items.mark_count | integer | Number of Collections |
| items.shared_count | integer | Share Count |
| items.task_count | integer | Number of Tasks |
| items.author | AuthorEntity | Author Information |
| items.author._id | string | Author ID |
| items.author.nickname | string | Author's Nickname |
| items.author.avatar | string | Author Avatar URL |
| items.description | string | Model Description |
| items.cover_image | string | Cover Image URL |
| items.train_mode | string | Training Mode (fast/full) |
| items.samples | object[] | Sample Array |
| items.samples.title | string | Sample Title |
| items.samples.text | string | Sample text content |
| items.samples.task_id | string | Task ID |
| items.samples.audio | string | Audio URL |
| items.languages | string[] | Language Array |
| items.lock_visibility | boolean | Is visibility locked? |
| items.dmca_taken_down | boolean | null |
| items.default_text | string | Default Text |
| items.quality | object | null |
| items.quality.audios | object[] | Audio Quality Assessment Array |
| items.quality.audios.filename | string | File Name |
| items.quality.audios.duration_ms | number | Duration (milliseconds) |
| items.quality.audios.language | string | Language |
| items.quality.audios.quality | object | Quality Rating Object |
| items.quality.audios.quality.{key} | number | |
| items.quality.audios.quality_passed | boolean | Whether it has passed the quality inspection |
| items.quality.audios.quality_reason | string | Reasons for Quality Assessment |
| items.quality.created_at | string | Quality Assessment Creation Time |
| items.quality.updated_at | string | Quality Assessment Update Time |
| items.unliked | boolean | Is it not liked? |
| items.liked | boolean | Has it been liked? |
| items.marked | boolean | Is it already bookmarked? |
| has_more | boolean | null |
Request Example
curl --location 'https://genaiapi.cloudsway.net/v1/ai/${ENDPOINT_PATH}/fishaudio/model?page_size=10&page_number=1&sort_by=created_at' \
--header 'Authorization: Bearer {YOUR_AK}'
Response Example
{
"total": 25,
"items": [
{
"_id": "model-abc123",
"type": "tts",
"title": "我的声音模型",
"description": "这是一个测试模型",
"cover_image": "https://example.com/cover.jpg",
"train_mode": "fast",
"state": "trained",
"tags": ["中文", "女声"],
"samples": [
{
"title": "样本1",
"text": "你好世界",
"task_id": "task-001",
"audio": "https://example.com/sample1.mp3"
}
],
"created_at": "2024-01-01T00:00:00Z",
"updated_at": "2024-01-02T00:00:00Z",
"languages": ["zh"],
"visibility": "public",
"lock_visibility": false,
"dmca_taken_down": false,
"default_text": "",
"quality": null,
"like_count": 10,
"mark_count": 5,
"shared_count": 3,
"task_count": 100,
"unliked": false,
"liked": true,
"marked": false,
"author": {
"_id": "user-xyz789",
"nickname": "用户名",
"avatar": "https://example.com/avatar.jpg"
}
}
],
"has_more": true
}
Get model details
Request URL
GET https://genaiapi.cloudsway.net/v1/ai/${ENDPOINT_PATH}/ fishaudio/model/{id}
Request Parameters
Header
- Authorization,Bearer \
Path Parameters
| Parameter Name | Field Type | Is it required? | Description |
|---|---|---|---|
| id | string | is | Model ID |
Response Parameters
| Parameter Name | Field Type | Description |
|---|---|---|
| _id | string | Model ID |
| type | string | Model Type (svc/tts) |
| title | string | Model Title |
| state | string | Model Status (created/training/trained/failed) |
| tags | string[] | tag array |
| created_at | string | Creation Time |
| updated_at | string | Update Time |
| visibility | string | Visibility (public/unlist/private) |
| like_count | integer | Number of likes |
| mark_count | integer | Number of Collections |
| shared_count | integer | Share Count |
| task_count | integer | Number of Tasks |
| author | AuthorEntity | Author Information |
| author._id | string | Author ID |
| author.nickname | string | Author's Nickname |
| author.avatar | string | Author Avatar URL |
| description | string | Model Description |
| cover_image | string | Cover Image URL |
| train_mode | string | Training Mode (fast/full) |
| samples | object[] | Sample Array |
| samples.title | string | Sample Title |
| samples.text | string | Sample text content |
| samples.task_id | string | Task ID |
| samples.audio | string | Audio URL |
| languages | string[] | Language Array |
| lock_visibility | boolean | Is visibility locked? |
| dmca_taken_down | boolean | null |
| default_text | string | Default Text |
| quality | object | null |
| quality.audios | object[] | Audio Quality Assessment Array |
| quality.audios.filename | string | File Name |
| quality.audios.duration_ms | number | Duration (milliseconds) |
| quality.audios.language | string | Language |
| quality.audios.quality | object | Quality Rating Object |
| quality.audios.quality.{key} | number | |
| quality.audios.quality_passed | boolean | Whether it has passed the quality inspection |
| quality.audios.quality_reason | string | Reasons for Quality Assessment |
| quality.created_at | string | Quality Assessment Creation Time |
| quality.updated_at | string | Quality Assessment Update Time |
| unliked | boolean | Is it not liked? |
| liked | boolean | Have you liked it? |
| marked | boolean | Is it already bookmarked? |
Request Example
curl --location 'https://genaiapi.cloudsway.net/v1/ai/${ENDPOINT_PATH}/fishaudio/model/{model_id}' \
--header 'Authorization: Bearer {YOUR_AK}'
Response Example
{
"_id": "model-abc123",
"type": "tts",
"title": "我的声音模型",
"description": "这是一个测试模型",
"cover_image": "https://example.com/cover.jpg",
"train_mode": "fast",
"state": "trained",
"tags": ["中文", "女声"],
"samples": [
{
"title": "样本1",
"text": "你好世界",
"task_id": "task-001",
"audio": "https://example.com/sample1.mp3"
}
],
"created_at": "2024-01-01T00:00:00Z",
"updated_at": "2024-01-02T00:00:00Z",
"languages": ["zh"],
"visibility": "public",
"lock_visibility": false,
"dmca_taken_down": false,
"default_text": "",
"quality": {
"audios": [
{
"filename": "sample1.wav",
"duration_ms": 5000,
"language": "zh",
"quality": {"score": 0.95},
"quality_passed": true,
"quality_reason": ""
}
],
"created_at": "2024-01-01T00:00:00Z",
"updated_at": "2024-01-01T00:00:00Z"
},
"like_count": 10,
"mark_count": 5,
"shared_count": 3,
"task_count": 100,
"unliked": false,
"liked": true,
"marked": false,
"author": {
"_id": "user-xyz789",
"nickname": "用户名",
"avatar": "https://example.com/avatar.jpg"
}
}
Update Model
Request URL
PATCH https://genaiapi.cloudsway.net/v1/ai/${ENDPOINT_PATH}/ fishaudio/model/{id}
Request Parameters
Header
-
Authorization,Bearer \
-
Content-Type,支持multipart/form-data
Path Parameters
| Parameter Name | Field Type | Is it required? | Description |
|---|---|---|---|
| id | string | is | Model ID |
body parameter
| Parameter Name | Field Type | Is it required? | Default Value | Description |
|---|---|---|---|---|
| title | string | No | null | Model Title |
| description | string | No | null | Model Description |
| cover_image | file | No | null | Model cover image |
| visibility | string | No | null | Visibility: public/unlist/private |
| tags | string[] | No | null | Model label array or string |
Response Parameters
Successful Response (200): No Return Content
Request Example
curl --location --request PATCH 'https://genaiapi.cloudsway.net/v1/ai/${ENDPOINT_PATH}/fishaudio/model/model-abc123' \
--header 'Authorization: Bearer {YOUR_AK}' \
--form 'cover_image=@"/path/to/new-cover.jpg"' \
--form 'visibility="private"'
Response Example
Successful Response (200): No Return Content
Delete Model
Request URL
DELETE https://genaiapi.cloudsway.net/v1/ai/${ENDPOINT_PATH}/ fishaudio/model/{id}
Request Parameters
Header
- Authorization,Bearer \
Path Parameters
| Parameter Name | Field Type | Is it required? | Description |
|---|---|---|---|
| id | string | is | Model ID |
Response Parameters
Successful Response (200): No Return Content
Request Example
curl --location --request DELETE 'https://genaiapi.cloudsway.net/v1/ai/${ENDPOINT_PATH}/fishaudio/model/model-abc123' \
--header 'Authorization: Bearer {YOUR_AK}'
Response Example
Successful Response (200): No Return Content