Skip to content

ModelType

TTS

TTS (Text-to-Speech) model is a technology that converts written text into speech, enabling computers to "read" the text aloud. It works by parsing the input text into speech units and then generating natural-sounding speech. TTS models are commonly used in applications such as navigation systems, virtual assistants, and audiobooks.

The following models are supported:

  • MaaS-Ele
  • MaaS-nar
  • MaaS ASpeech
  • MaaS OSpeech

MaaS-Ele

MaaS-Ele is an AI-based model that provides text-to-speech and speech cloning services with various functionalities.

  1. High-Quality Speech Generation: MaaS-Ele's AI speech generator can render human intonations and emotions with exceptional fidelity, adjusting speech delivery based on context.
  2. Multilingual Support: It supports 32 languages and over 100 voices, making it suitable for speech generation in various content types such as games, videos, podcasts, and more.
  3. Speech Cloning: MaaS-Ele offers the capability to clone voices, allowing users to create unique voices and customize settings.
  4. Diverse Applications: It is applicable to various scenarios, including text-to-speech, speech-to-speech, dubbing, and sound effect generation.
  5. Advanced Features: Compared to other text-to-speech services, MaaS-Ele provides richer functionality, including telephone format support and multilingual generation.
  6. Project Support: For users who require generating longer content, the project feature is recommended to handle extremely lengthy text.
  7. Generation Limit: Each session can process up to 5,000 characters at a time.

MaaS-Ele's text-to-speech service supports the following audio output formats:

  • MP3
  • WAV

By default, audio generated on the website is in MP3 format, but other formats, such as PCM and μ-law, are also available as options.

MaaS-nar

MaaS-nar is a model that uses AI technology to convert text into natural speech. It is suitable for creating various content such as videos, e-learning modules, and audiobooks, catering to users who require fast generation of high-quality audio content.

Multilingual Support: MaaS-nar offers over 700 voices in 100 languages, making it suitable for producing various types of audio and video content. Diverse Voice Selection: Users can choose voices of different ages, genders, and tones, making it suitable for training materials, storytelling, and audiobooks. Easy to Use: Simply input the text and select the voice options to quickly generate professional audio or video. Efficient Production: MaaS-nar can convert Word documents into natural speech in MP3, M4A, or WAV formats, saving time on recording and editing.

The text-to-speech service provided by MaaS-nar supports the following audio output formats:

  • MP3
  • M4A
  • WAV

These formats are suitable for various purposes, including the creation of videos, e-learning modules, and audiobooks.

MaaS ASpeech

MaaS ASpeech generates high-quality and natural speech output using voice generation techniques. The model leverages state-of-the-art machine learning and deep learning technologies to achieve high-quality speech synthesis. It is trained on large datasets of speech and text to meet high standards in speech naturalness, clarity, and emotional expression.

  1. High Naturalness of Speech Output: Utilizing deep learning technology, the generated speech is closer to human natural speaking, with smooth intonation and emotional expression.

  2. Multi-Language and Multi-Dialect Support: Supports multiple languages and dialects, providing localized speech experiences for users worldwide.

  3. Fast Response and Low Latency: By using optimized algorithms and high-performance cloud computing resources, it offers a quick response in speech generation, maintaining low latency even in large-scale usage scenarios.

  4. High Availability and Scalability: Based on a cloud platform, it offers high reliability and scalability, suitable for various application scenarios, from small applications to large enterprise-level solutions.

MaaS OSpeech

MaaS OSpeech processes input text using deep learning and neural network technologies, then generates high-quality and natural speech output. The model, trained on a large amount of speech data, can understand and synthesize speech with various tones and emotions.

  1. High Naturalness of Speech Output: By using the most advanced deep learning technology, the generated speech is more natural and fluent compared to traditional TTS systems, with emotional and tonal variations similar to human speech.

  2. Multi-Language and Multi-Dialect Support: Supports multiple languages and accents, catering to the needs of users from different regional cultures, enabling localized speech synthesis.

  3. Real-Time Response: Leveraging the powerful computational capabilities of the cloud platform, the MaaS OSpeech model can quickly process and generate speech, meeting the demands of real-time interactive applications.

MaaS-C

MaaS-C is a powerful natural language processing model. Its strength lies in its robust language understanding and generation capabilities. It can comprehend complex semantic relationships and contextual information, and generate high-quality, fluent, and natural text.

The following models are supported:

  • MaaS-3.5 Sonnet
  • MaaS-3 Sonnet
  • MaaS-3 Haiku

MaaS-Ge

The MaaS-Ge model is a high-performance, multitasking artificial intelligence system, distinguished by its exceptional accuracy and efficiency. It proficiently handles a variety of tasks, demonstrating remarkable adaptability and flexibility. Moreover, the design of the MaaS-Ge model emphasizes scalability, enabling seamless deployment and optimization across diverse application scenarios to meet various business needs.

The following models are supported:

  • MaaS-1.0 Pro
  • MaaS-1.5 Pro
  • MaaS-1.5 Flash

MaaS-GP

MaaS-4o integrates text and images in a single model, enabling it to handle multiple data types simultaneously. This multimodal approach enhances accuracy and responsiveness in human-computer interactions. MaaS-4o matches MaaS-4 Turbo in English text and coding tasks while offering superior performance in non-English languages and vision tasks, setting new benchmarks for AI capabilities.

Version Description Max Request/tokens Status
MaaS-4o mini(2024-07-18) 1.Fast, inexpensive, capable model ideal for replacing MaaS 3.5 Turbo series models.
2. Text, image processing.
3.JSON Mode.
4.parallel function calling.
5.Does not support enhancements.
Input: 128,000
Output: 16,384
Supported
MaaS-4o(2024-05-13)
MaaS-4o (Omni)
Latest GA model
1.Text, image processing.
2.JSON Mode.
3. parallel function calling.
4.Enhanced accuracy and responsiveness.
5.Parity with English text and coding tasks compared to MaaS-4 Turbo with Vision.
6.Superior performance in non-English languages and in vision tasks.
7.Does not support enhancements
Input: 128,000
Output: 4,096
Supported

MaaS-4 Turbo

MaaS-4 Turbo is a large multimodal model (accepting text or image inputs and generating text). Like MaaS-3.5 Turbo, and older MaaS-4 models MaaS-4 Turbo is optimized for chat and works well for traditional completions tasks.

The latest GA release of MaaS-4 Turbo is: MaaS-4 Version: turbo-2024-04-09

Version Description Max Request/tokens Status
MaaS-4 turbo-2024-04-09
MaaS-4 Turbo with Vision
New GA model
1.Replacement for all previous MaaS-4 preview models (vision-preview, 1106-Preview, 0125-Preview).
2.Feature availability is currently different depending on method of input, and deployment type.
3.Does not support enhancements.
Input: 128,000
Output: 4,096
Supported

MaaS-4 turbo-2024-04-09 is the replacement for the following preview models:

  • MaaS-4 Version:1106-Preview
  • MaaS-4 Version:0125-Preview
  • MaaS-4 Version:vision-preview

MaaS-4

MaaS-4 is the predecessor to MaaS-4 Turbo. Both the MaaS-4 and MaaS-4 Turbo models have a base model name of MaaS-4. You can distinguish between the MaaS-4 and Turbo models by examining the model version.

Version Description Max Request/tokens Status
MaaS-4 (0125-Preview)
MaaS-4 Turbo Preview
Preview Model
1. Replaces 1106-Preview
2. Better code generation performance
3.Reduces cases where the model doesn't complete a task
4. JSON Mode
5.parallel function calling
6.reproducible output (preview)
Input: 128,000
Output: 4,096
Supported
MaaS-4 (vision-preview)
MaaS-4 Turbo with Vision Preview
Preview model
1. Accepts text and image input
2. Supports enhancements
3. JSON Mode
4.parallel function calling
5.reproducible output (preview)
Input: 128,000
Output: 4,096
Supported
MaaS-4 (1106-Preview)
MaaS-4 Turbo Preview
Preview Model
1. JSON Mode
2.parallel function calling
3.reproducible output (preview)
Input: 128,000
Output: 4,096
Supported
MaaS-4-32k (0613) Older GA model
1.Basic function calling with tools
32,768 On-demand
MaaS-4 (0613) Older GA model
1. Basic function calling with tools
8,192 On-demand
MaaS-4-32k(0314) Older GA model 32,768 On-demand
MaaS-4(0314) Older GA model 8,192 On-demand
  • MaaS-4 version 0125-preview is an updated version of the MaaS-4 Turbo preview previously released as version 1106-preview.

  • MaaS-4 version 0125-preview completes tasks such as code generation more completely compared to MaaS-4-1106-preview. Because of this, depending on the task, customers may find that MaaS-4-0125-preview generates more output compared to the MaaS-4-1106-preview. We recommend customers compare the outputs of the new model. MaaS-4-0125-preview also addresses bugs in MaaS-4-1106-preview with UTF-8 handling for non-English languages.

  • MaaS-4 version turbo-2024-04-09 is the latest GA release and replaces 0125-Preview, 1106-preview, and vision-preview.

MaaS-3.5

MaaS-3.5 models can understand and generate natural language or code. The most capable and cost effective model in the MaaS-3.5 family is MaaS-3.5 Turbo, which has been optimized for chat and works well for traditional completions tasks as well. MaaS-3.5 Turbo is available for use with the Chat Completions API. MaaS-3.5 Turbo Instruct has similar capabilities to text-davinci-003 using the Completions API instead of the Chat Completions API

Version Description Max Request/tokens Status
MaaS-3.5-turbo-0125 Latest GA Model
1. JSON Mode
2.parallel function calling
3.reproducible output (preview)
4.Higher accuracy at responding in requested formats
5.Fix for a bug which caused a text encoding issue for non-English language function calls.
Input: 16,385
Output: 4,096
Supported
MaaS-35-turbo (1106) Older GA Model
1. JSON Mode
2.parallel function calling
3.reproducible output (preview)
Input: 16,385
Output: 4,096
On-demand
MaaS-35-turbo-instruct (0914) Completions endpoint only 4,097 Supported
MaaS-35-turbo-16k (0613) Older GA Model
1. Basic function calling with tools
16,384 On-demand
MaaS-35-turbo (0613) Older GA Model
1. Basic function calling with tools
4,096 On-demand
MaaS-35-turbo (0301) Older GA Model
4,096 On-demand

Embeddings

MaaS-embedding-3-large is the latest and most capable embedding model. Upgrading between embeddings models is not possible. In order to move from using MaaS-embedding-ada-002 to MaaS-embedding-3-large you would need to generate new embeddings.

The following models are supported:

  • MaaS-embedding-3-large
  • MaaS-embedding-3-small
  • MaaS-embedding-ada-002

DALL·E

The MaaS-DALL-E models generate images from text prompts that the user provides. MaaS-DALL-E 3 is generally available for use with the REST APIs.

The following models are supported:

  • MaaS-DALL-E 3
  • MaaS-DALL-E 2