Text-to-Speech

Introduction

The Text-to-Speech (TTS) model is a technology that converts written text into speech, enabling computers to read out the text. This conversion technology is highly practical in various scenarios, such as navigation systems, virtual assistants, audiobooks, accessibility technology, and customer service systems. The development of TTS technology has brought generated speech closer to natural language by considering different intonations, speeds, and tones, greatly enhancing the user experience.

Best Practices

Using MaaS-nar as an example

curl --request POST \
  --url 'https://genaiapijp.cloudsway.net/v1/ai/kXfKrPc/tts-n/text-to-speech/mp3?voice=Beatrice&voice-speed=fast&voice-volume=standard' \
  --header 'Accept: application/octet-stream' \
  --header 'Authorization: Bearer YOUR_API_KEY' \
  --header 'Content-Type: text/plain' \
  --data hello

Using MaaS-Ele as an example

curl --request POST \
  --url https://genaiapihk.cloudsway.net/v1/ai/ZMfNI/tts-e/text-to-speech/pqHfZKP75CvOlQylNhV4 \
  --header 'Authorization: Bearer YOUR_API_KEY' \
  --header 'Content-Type: application/json' \
  --data '{
    "text": "你好",
    "voice_settings": {
        "stability": 0,
        "similarity_boost": 1.0,
        "use_speaker_boost": false
    }
}'

The output will generate an MP3 file.