Skip to content

MaaS_FishAudio

Request Protocol

Https

Text-to-Speech

Request URL

POST https://genaiapi.cloudsway.net/v1/ai/${ENDPOINT_PATH} /fishaudio/v1/tts

Request Parameters

Header

  • Authorization,Bearer \

  • Content-Type,支持application/json和application/msgpack

Note: When using application/msgpack, the request body must be converted to msgpack format for transmission

Body

Parameter Name Field Type Is it required? Default Value Description
text string is - Text to be synthesized into speech
temperature number No 0.7 0\~1, the higher the value, the more random; the lower the value, the more stable
top_p number No 0.7 0\~1, nucleus sampling, controls diversity
references object[] No - ReferenceAudio Array
references.audio file is - Original audio ByteDance data, supported formats: WAV, MP3, FLAC. For best results, please use clear speech of 10-30 seconds and minimize background noise.
references.text string is - Accurate transcription of the content described in the audio sample. Accuracy is crucial for the quality of voice cloning.
reference_id string No - Voice Model ID
prosody object No - Output speed and volume adjustment
prosody.speed string
No 1.0 Speed. Effective range: 0.5 to 2.0. 1.0 = normal speed, 0.5 = half speed, 2.0 = double speed
prosody.volume number No 0 Volume, measured in decibels (dB). 0 = no change, positive values = louder, negative values = quieter
prosody.normalize_loudness string No true Normalize the output volume to make the perceived volume more consistent, only applicable to S2-Pro. Optional, default true
chunk_length int No 300 Size of text segment for processing. 100 \<= x \<= 300
normalize bool No true Standardize English and Chinese texts while improving the stability of numbers.
format
string No mp3 Output audio formats: wav, pcm, mp3, opus
sample_rate int
No Audio sampling rate is in Hz. When empty, the default value for the format is used (44100 Hz for most formats, 48000 Hz for opus).
mp3_bitrate int No 128 MP3 bit rate, in kbps, with optional values: 64, 128, 192. Only applicable when the format is mp3.
opus_bitrate int No -1000 Opus bit rate, in bps, optional values: -1000, 24000, 32000, 48000, 64000. -1000 means automatic, only applicable when the format is opus.
latency string No normal Trade-off between latency and quality: low, normal, balanced, representing minimum latency, optimal quality, and balance respectively
max_new_tokens int No 1024 Maximum number of audio tokens that can be generated for each text segment.
repetition_penalty number No 1.2 Penalty mechanism for the repeated audio pattern. When the value is higher than 1.0, it will reduce repetition.
min_chunk_length int No 50 Minimum number of characters before splitting into a new block. 0 \<= x \<= 100
condition_on_previous_chunks boolean No true Use the previous audio as context to maintain voice consistency.
early_stop_threshold number No 1 Early stopping threshold for batch processing: 0 \<= x \<= 1

Supports creating mono and multi-channel voices. Mono selection options:

  • Specify the voice model using reference_id

  • Upload user reference audio using references (ReferenceAudio array)

Multi-channel selection method:

  • reference_id: An array of voice model IDs, e.g., ["speaker-0-id", "speaker-1-id"], text: Use speaker tags <|speaker:0|>, <|speaker:1|>, etc., to indicate speaker changes, e.g., <|speaker:0|>Hello!<|speaker:1|>Hello there!

  • references: A 2D array where each inner array contains references for a speaker, reference_id: An array of identifiers (for zero-shot learning, it can be any string)

Response Parameters

The return value is audio data

Request Example

curl --location 'https://genaiapi.cloudsway.net/v1/ai/${ENDPOINT_PATH}/fishaudio/v1/tts' \
--header 'Authorization: Bearer {YOUR_AK}' \
--header 'Content-Type: application/json' \
--data '{
    "text": "Hello! Welcome to Fish Audio.",
    "reference_id": "model-id",
    "temperature": 0.7,
    "top_p": 0.7,
    "prosody": {
      "speed": 1,
      "volume": 0,
      "normalize_loudness": true
    },
    "chunk_length": 300,
    "normalize": true,
    "format": "mp3",
    "sample_rate": 44100,
    "mp3_bitrate": 128,
    "latency": "normal",
    "max_new_tokens": 1024,
    "repetition_penalty": 1.2,
    "min_chunk_length": 50,
    "condition_on_previous_chunks": true,
    "early_stop_threshold": 1
}'

Response Example

The return value is audio data.

Speech to Text

Request URL

POST https://genaiapi.cloudsway.net/v1/ai/${ENDPOINT_PATH}/ fishaudio/v1/asr

Request Parameters

Header

  • Authorization,Bearer \<YOUR_AK>

  • Content-Type,supports multipart/form-data and application/msgpack

Body

Parameter Name Field Type Is it required? Default Value Description
audio file is - Audio file to be converted to text
language string No null Specify the language of the audio. If left blank, it will be automatically detected.
ignore_timestamps boolean No true Whether to return an exact timestamp in the text. For audio shorter than 30 seconds, enabling timestamps increases latency

Response Parameter

Parameter Name Field Type Description
text string Transformed text content
duration number Audio Duration (seconds)
segments object[] Segmented array, containing each segment of text and its timestamp
segments.text string Text content of this paragraph
segments.start number Start time of this segment (seconds)
segments.end number End time (seconds) of this segment

Request Example

# form-data 
curl --location 'https://genaiapi.cloudsway.net/v1/ai/${ENDPOINT_PATH}/fishaudio/v1/asr' \
--header 'Authorization: Bearer {YOUR_AK}' \
--form 'audio=@"/path/to/audio.mp3"' \
--form 'language="zh"' \
--form 'ignore_timestamps=false'

# msgpack
curl --location 'https://genaiapi.cloudsway.net/v1/ai/{ENDPOINT_PATH}/fishaudio/asr' \
--header 'Authorization: Bearer {YOUR_AK}' \
--header 'Content-Type: application/msgpack' \
--data-binary '@request.msgpack'

Response Example

{
  "text": "你好,欢迎使用Fish Audio语音转文字服务。",
  "duration": 5.2,
  "segments": [
    {
      "text": "你好,",
      "start": 0.0,
      "end": 0.8
    },
    {
      "text": "欢迎使用Fish Audio",
      "start": 0.8,
      "end": 3.5
    },
    {
      "text": "语音转文字服务。",
      "start": 3.5,
      "end": 5.2
    }
  ]
}

Create Sound Model

Request URL

POST https://genaiapi.cloudsway.net/v1/ai/${ENDPOINT_PATH}/ fishaudio/model

Request Parameters

Header

  • Authorization,Bearer \

  • Content-Type,supports multipart/form-data and application/msgpack

Body

Parameter Name Field Type Is it required? Default Value Description
type string is - Model type, fixed as "tts"
title string is - Model Title or Name
train_mode string is - Training mode, fixed as "fast", indicates that the model can be used immediately after creation
voices
file[] is - Array of audio files for model tuning, supporting WAV, MP3, and FLAC formats
visibility string No public Model visibility: public (publicly displayed on the discovery page), unlist (accessible to anyone via a link), private (visible only to the creator)
description string No Model Description
cover_image file No Model cover image, required when visibility is public
texts string[] No Text content corresponding to voices, if not specified, ASR will be performed on the audio
tags string[] No Model Label
enhance_audio_quality boolean No true Whether to enhance audio quality
generate_sample boolean No false Whether to generate default text samples

Response Parameter

Parameter Name Field Type Description
_id string Model ID
type string Model Type (svc/tts)
title string Model Title
description string Model Description
cover_image string
train_mode string Training Mode (fast/full)
state string Model Status (created/training/trained/failed)
tags string[] tag array
samples object[] Sample Array
samples.title string
samples.text string
samples.task_id string
samples.audio string
created_at string Creation Time
updated_at string Update Time
languages string[] Language Array
visibility string Visibility (public/unlist/private)
lock_visibility boolean Is visibility locked?
dmca_taken_down boolean
default_text string
quality object
quality.created_at string
quality.updated_at string
quality.audios object[]
quality.audios.filename string
quality.audios.duration_ms string
quality.audios.language string
quality.audios.quality object
quality.audios.quality.{key} string
quality.audios.quality_passed string
quality.audios.quality_reason string
like_count integer Number of likes
mark_count integer Number of Collections
shared_count integer Share Count
task_count integer Number of Tasks
author object Author Information
author._id string
author.nickname string
author.avatar string
unliked boolean
liked boolean
marked boolean

Request Example

# multipart/form-data
curl --location 'https://genaiapi.cloudsway.net/v1/ai/${ENDPOINT_PATH}/fishaudio/model' \
--header 'Authorization: Bearer {YOUR_AK}' \
--form 'type="tts"' \
--form 'title="我的声音模型"' \
--form 'train_mode="fast"' \
--form 'voices=@"/path/to/audio1.wav"' \
--form 'voices=@"/path/to/audio2.wav"' \
--form 'texts="这是第一段音频的文本"' \
--form 'texts="这是第二段音频的文本"' \
--form 'visibility="private"' \
--form 'tags=["中文","女声"]' \
--form 'enhance_audio_quality=true'

# msgpack
curl --location 'https://genaiapi.cloudsway.net/v1/ai/{ENDPOINT_PATH}/fishaudio/model' \
--header 'Authorization: Bearer {YOUR_AK}' \
--header 'Content-Type: application/msgpack' \
--data-binary '@request.msgpack'

Response Example

{
  "_id": "model-abc123",
  "type": "tts",
  "title": "我的声音模型",
  "description": "",
  "cover_image": "",
  "train_mode": "fast",
  "state": "created",
  "tags": ["中文", "女声"],
  "samples": [],
  "created_at": "2024-01-01T00:00:00Z",
  "updated_at": "2024-01-01T00:00:00Z",
  "languages": ["zh"],
  "visibility": "private",
  "lock_visibility": false,
  "dmca_taken_down": null,
  "default_text": "",
  "quality": null,
  "like_count": 0,
  "mark_count": 0,
  "shared_count": 0,
  "task_count": 0,
  "unliked": false,
  "liked": false,
  "marked": false,
  "author": {
    "_id": "user-xyz789",
    "nickname": "用户名",
    "avatar": "https://example.com/avatar.jpg"
  }
}

Get Model List

Request URL

GET https://genaiapi.cloudsway.net/v1/ai/${ENDPOINT_PATH}/ fishaudio/model

Request Parameters

Header

  • Authorization,Bearer \

Query Parameters

Parameter Name Field Type Is it required? Default Value Description
page_size integer No 10 Items per page, minimum value is 1
page_number integer No 1 Page number, minimum value is 1
title string No null Filter models by title
tag string[] No null Filter models by label, supporting a single label or an array of labels
self boolean No false If true, only return models created by the current user
author_id string No
null Filter by author ID, ignore this parameter when self is true
language string[] No null Filter models by language
title_language string[] No null Filter models by title language
sort_by string
No score Sorting methods: score (rating), task_count (number of tasks), created_at (creation time)

Response Parameters

Parameter Name Field Type Description
total integer Total number of models
items object[] Model List Array
items._id string Model ID
items.type string Model Type (svc/tts)
items.title string Model Title
items.state string Model Status (created/training/trained/failed)
items.tags string[] tag array
items.created_at string Creation Time
items.updated_at string Update Time
items.visibility string Visibility (public/unlist/private)
items.like_count integer Number of likes
items.mark_count integer Number of Collections
items.shared_count integer Share Count
items.task_count integer Number of Tasks
items.author AuthorEntity Author Information
items.author._id string Author ID
items.author.nickname string Author's Nickname
items.author.avatar string Author Avatar URL
items.description string Model Description
items.cover_image string Cover Image URL
items.train_mode string Training Mode (fast/full)
items.samples object[] Sample Array
items.samples.title string Sample Title
items.samples.text string Sample text content
items.samples.task_id string Task ID
items.samples.audio string Audio URL
items.languages string[] Language Array
items.lock_visibility boolean Is visibility locked?
items.dmca_taken_down boolean null
items.default_text string Default Text
items.quality object null
items.quality.audios object[] Audio Quality Assessment Array
items.quality.audios.filename string File Name
items.quality.audios.duration_ms number Duration (milliseconds)
items.quality.audios.language string Language
items.quality.audios.quality object Quality Rating Object
items.quality.audios.quality.{key} number
items.quality.audios.quality_passed boolean Whether it has passed the quality inspection
items.quality.audios.quality_reason string Reasons for Quality Assessment
items.quality.created_at string Quality Assessment Creation Time
items.quality.updated_at string Quality Assessment Update Time
items.unliked boolean Is it not liked?
items.liked boolean Has it been liked?
items.marked boolean Is it already bookmarked?
has_more boolean null

Request Example

curl --location 'https://genaiapi.cloudsway.net/v1/ai/${ENDPOINT_PATH}/fishaudio/model?page_size=10&page_number=1&sort_by=created_at' \
--header 'Authorization: Bearer {YOUR_AK}'

Response Example

{
  "total": 25,
  "items": [
    {
      "_id": "model-abc123",
      "type": "tts",
      "title": "我的声音模型",
      "description": "这是一个测试模型",
      "cover_image": "https://example.com/cover.jpg",
      "train_mode": "fast",
      "state": "trained",
      "tags": ["中文", "女声"],
      "samples": [
        {
          "title": "样本1",
          "text": "你好世界",
          "task_id": "task-001",
          "audio": "https://example.com/sample1.mp3"
        }
      ],
      "created_at": "2024-01-01T00:00:00Z",
      "updated_at": "2024-01-02T00:00:00Z",
      "languages": ["zh"],
      "visibility": "public",
      "lock_visibility": false,
      "dmca_taken_down": false,
      "default_text": "",
      "quality": null,
      "like_count": 10,
      "mark_count": 5,
      "shared_count": 3,
      "task_count": 100,
      "unliked": false,
      "liked": true,
      "marked": false,
      "author": {
        "_id": "user-xyz789",
        "nickname": "用户名",
        "avatar": "https://example.com/avatar.jpg"
      }
    }
  ],
  "has_more": true
}

Get model details

Request URL

GET https://genaiapi.cloudsway.net/v1/ai/${ENDPOINT_PATH}/ fishaudio/model/{id}

Request Parameters

Header

  • Authorization,Bearer \

Path Parameters

Parameter Name Field Type Is it required? Description
id string is Model ID

Response Parameters

Parameter Name Field Type Description
_id string Model ID
type string Model Type (svc/tts)
title string Model Title
state string Model Status (created/training/trained/failed)
tags string[] tag array
created_at string Creation Time
updated_at string Update Time
visibility string Visibility (public/unlist/private)
like_count integer Number of likes
mark_count integer Number of Collections
shared_count integer Share Count
task_count integer Number of Tasks
author AuthorEntity Author Information
author._id string Author ID
author.nickname string Author's Nickname
author.avatar string Author Avatar URL
description string Model Description
cover_image string Cover Image URL
train_mode string Training Mode (fast/full)
samples object[] Sample Array
samples.title string Sample Title
samples.text string Sample text content
samples.task_id string Task ID
samples.audio string Audio URL
languages string[] Language Array
lock_visibility boolean Is visibility locked?
dmca_taken_down boolean null
default_text string Default Text
quality object null
quality.audios object[] Audio Quality Assessment Array
quality.audios.filename string File Name
quality.audios.duration_ms number Duration (milliseconds)
quality.audios.language string Language
quality.audios.quality object Quality Rating Object
quality.audios.quality.{key} number
quality.audios.quality_passed boolean Whether it has passed the quality inspection
quality.audios.quality_reason string Reasons for Quality Assessment
quality.created_at string Quality Assessment Creation Time
quality.updated_at string Quality Assessment Update Time
unliked boolean Is it not liked?
liked boolean Have you liked it?
marked boolean Is it already bookmarked?

Request Example

curl --location 'https://genaiapi.cloudsway.net/v1/ai/${ENDPOINT_PATH}/fishaudio/model/{model_id}' \
--header 'Authorization: Bearer {YOUR_AK}'

Response Example

{
  "_id": "model-abc123",
  "type": "tts",
  "title": "我的声音模型",
  "description": "这是一个测试模型",
  "cover_image": "https://example.com/cover.jpg",
  "train_mode": "fast",
  "state": "trained",
  "tags": ["中文", "女声"],
  "samples": [
    {
      "title": "样本1",
      "text": "你好世界",
      "task_id": "task-001",
      "audio": "https://example.com/sample1.mp3"
    }
  ],
  "created_at": "2024-01-01T00:00:00Z",
  "updated_at": "2024-01-02T00:00:00Z",
  "languages": ["zh"],
  "visibility": "public",
  "lock_visibility": false,
  "dmca_taken_down": false,
  "default_text": "",
  "quality": {
    "audios": [
      {
        "filename": "sample1.wav",
        "duration_ms": 5000,
        "language": "zh",
        "quality": {"score": 0.95},
        "quality_passed": true,
        "quality_reason": ""
      }
    ],
    "created_at": "2024-01-01T00:00:00Z",
    "updated_at": "2024-01-01T00:00:00Z"
  },
  "like_count": 10,
  "mark_count": 5,
  "shared_count": 3,
  "task_count": 100,
  "unliked": false,
  "liked": true,
  "marked": false,
  "author": {
    "_id": "user-xyz789",
    "nickname": "用户名",
    "avatar": "https://example.com/avatar.jpg"
  }
}

Update Model

Request URL

PATCH https://genaiapi.cloudsway.net/v1/ai/${ENDPOINT_PATH}/ fishaudio/model/{id}

Request Parameters

Header

  • Authorization,Bearer \

  • Content-Type,支持multipart/form-data

Path Parameters

Parameter Name Field Type Is it required? Description
id string is Model ID

body parameter

Parameter Name Field Type Is it required? Default Value Description
title string No null Model Title
description string No null Model Description
cover_image file No null Model cover image
visibility string No null Visibility: public/unlist/private
tags string[] No null Model label array or string

Response Parameters

Successful Response (200): No Return Content

Request Example

curl --location --request PATCH 'https://genaiapi.cloudsway.net/v1/ai/${ENDPOINT_PATH}/fishaudio/model/model-abc123' \
--header 'Authorization: Bearer {YOUR_AK}' \
--form 'cover_image=@"/path/to/new-cover.jpg"' \
--form 'visibility="private"'

Response Example

Successful Response (200): No Return Content

Delete Model

Request URL

DELETE https://genaiapi.cloudsway.net/v1/ai/${ENDPOINT_PATH}/ fishaudio/model/{id}

Request Parameters

Header

  • Authorization,Bearer \

Path Parameters

Parameter Name Field Type Is it required? Description
id string is Model ID

Response Parameters

Successful Response (200): No Return Content

Request Example

curl --location --request DELETE 'https://genaiapi.cloudsway.net/v1/ai/${ENDPOINT_PATH}/fishaudio/model/model-abc123' \
--header 'Authorization: Bearer {YOUR_AK}'

Response Example

Successful Response (200): No Return Content