ModelType
TTS
TTS (Text-to-Speech) model is a technology that converts written text into speech, enabling computers to "read" the text aloud. It works by parsing the input text into speech units and then generating natural-sounding speech. TTS models are commonly used in applications such as navigation systems, virtual assistants, and audiobooks.
The following models are supported:
- MaaS-Ele
- MaaS-nar
- MaaS ASpeech
- MaaS OSpeech
MaaS-Ele
MaaS-Ele is an AI-based model that provides text-to-speech and speech cloning services with various functionalities.
- High-Quality Speech Generation: MaaS-Ele's AI speech generator can render human intonations and emotions with exceptional fidelity, adjusting speech delivery based on context.
- Multilingual Support: It supports 32 languages and over 100 voices, making it suitable for speech generation in various content types such as games, videos, podcasts, and more.
- Speech Cloning: MaaS-Ele offers the capability to clone voices, allowing users to create unique voices and customize settings.
- Diverse Applications: It is applicable to various scenarios, including text-to-speech, speech-to-speech, dubbing, and sound effect generation.
- Advanced Features: Compared to other text-to-speech services, MaaS-Ele provides richer functionality, including telephone format support and multilingual generation.
- Project Support: For users who require generating longer content, the project feature is recommended to handle extremely lengthy text.
- Generation Limit: Each session can process up to 5,000 characters at a time.
MaaS-Ele's text-to-speech service supports the following audio output formats:
- MP3
- WAV
By default, audio generated on the website is in MP3 format, but other formats, such as PCM and μ-law, are also available as options.
MaaS-nar
MaaS-nar is a model that uses AI technology to convert text into natural speech. It is suitable for creating various content such as videos, e-learning modules, and audiobooks, catering to users who require fast generation of high-quality audio content.
Multilingual Support: MaaS-nar offers over 700 voices in 100 languages, making it suitable for producing various types of audio and video content. Diverse Voice Selection: Users can choose voices of different ages, genders, and tones, making it suitable for training materials, storytelling, and audiobooks. Easy to Use: Simply input the text and select the voice options to quickly generate professional audio or video. Efficient Production: MaaS-nar can convert Word documents into natural speech in MP3, M4A, or WAV formats, saving time on recording and editing.
The text-to-speech service provided by MaaS-nar supports the following audio output formats:
- MP3
- M4A
- WAV
These formats are suitable for various purposes, including the creation of videos, e-learning modules, and audiobooks.
MaaS ASpeech
MaaS ASpeech generates high-quality and natural speech output using voice generation techniques. The model leverages state-of-the-art machine learning and deep learning technologies to achieve high-quality speech synthesis. It is trained on large datasets of speech and text to meet high standards in speech naturalness, clarity, and emotional expression.
-
High Naturalness of Speech Output: Utilizing deep learning technology, the generated speech is closer to human natural speaking, with smooth intonation and emotional expression.
-
Multi-Language and Multi-Dialect Support: Supports multiple languages and dialects, providing localized speech experiences for users worldwide.
-
Fast Response and Low Latency: By using optimized algorithms and high-performance cloud computing resources, it offers a quick response in speech generation, maintaining low latency even in large-scale usage scenarios.
-
High Availability and Scalability: Based on a cloud platform, it offers high reliability and scalability, suitable for various application scenarios, from small applications to large enterprise-level solutions.
MaaS OSpeech
MaaS OSpeech processes input text using deep learning and neural network technologies, then generates high-quality and natural speech output. The model, trained on a large amount of speech data, can understand and synthesize speech with various tones and emotions.
-
High Naturalness of Speech Output: By using the most advanced deep learning technology, the generated speech is more natural and fluent compared to traditional TTS systems, with emotional and tonal variations similar to human speech.
-
Multi-Language and Multi-Dialect Support: Supports multiple languages and accents, catering to the needs of users from different regional cultures, enabling localized speech synthesis.
-
Real-Time Response: Leveraging the powerful computational capabilities of the cloud platform, the MaaS OSpeech model can quickly process and generate speech, meeting the demands of real-time interactive applications.
MaaS-C
MaaS-C is a powerful natural language processing model. Its strength lies in its robust language understanding and generation capabilities. It can comprehend complex semantic relationships and contextual information, and generate high-quality, fluent, and natural text.
The following models are supported:
- MaaS-3.5 Sonnet
- MaaS-3 Sonnet
- MaaS-3 Haiku
MaaS-Ge
The MaaS-Ge model is a high-performance, multitasking artificial intelligence system, distinguished by its exceptional accuracy and efficiency. It proficiently handles a variety of tasks, demonstrating remarkable adaptability and flexibility. Moreover, the design of the MaaS-Ge model emphasizes scalability, enabling seamless deployment and optimization across diverse application scenarios to meet various business needs.
The following models are supported:
- MaaS-1.0 Pro
- MaaS-1.5 Pro
- MaaS-1.5 Flash
MaaS-GP
MaaS-4o integrates text and images in a single model, enabling it to handle multiple data types simultaneously. This multimodal approach enhances accuracy and responsiveness in human-computer interactions. MaaS-4o matches MaaS-4 Turbo in English text and coding tasks while offering superior performance in non-English languages and vision tasks, setting new benchmarks for AI capabilities.
Version | Description | Max Request/tokens | Status |
---|---|---|---|
MaaS-4o mini(2024-07-18) | 1.Fast, inexpensive, capable model ideal for replacing MaaS 3.5 Turbo series models. 2. Text, image processing. 3.JSON Mode. 4.parallel function calling. 5.Does not support enhancements. |
Input: 128,000 Output: 16,384 |
Supported |
MaaS-4o(2024-05-13) MaaS-4o (Omni) |
Latest GA model 1.Text, image processing. 2.JSON Mode. 3. parallel function calling. 4.Enhanced accuracy and responsiveness. 5.Parity with English text and coding tasks compared to MaaS-4 Turbo with Vision. 6.Superior performance in non-English languages and in vision tasks. 7.Does not support enhancements |
Input: 128,000 Output: 4,096 |
Supported |
MaaS-4 Turbo
MaaS-4 Turbo is a large multimodal model (accepting text or image inputs and generating text). Like MaaS-3.5 Turbo, and older MaaS-4 models MaaS-4 Turbo is optimized for chat and works well for traditional completions tasks.
The latest GA release of MaaS-4 Turbo is: MaaS-4 Version: turbo-2024-04-09
Version | Description | Max Request/tokens | Status |
---|---|---|---|
MaaS-4 turbo-2024-04-09 MaaS-4 Turbo with Vision |
New GA model 1.Replacement for all previous MaaS-4 preview models (vision-preview, 1106-Preview, 0125-Preview). 2.Feature availability is currently different depending on method of input, and deployment type. 3.Does not support enhancements. |
Input: 128,000 Output: 4,096 |
Supported |
MaaS-4 turbo-2024-04-09 is the replacement for the following preview models:
- MaaS-4 Version:1106-Preview
- MaaS-4 Version:0125-Preview
- MaaS-4 Version:vision-preview
MaaS-4
MaaS-4 is the predecessor to MaaS-4 Turbo. Both the MaaS-4 and MaaS-4 Turbo models have a base model name of MaaS-4. You can distinguish between the MaaS-4 and Turbo models by examining the model version.
Version | Description | Max Request/tokens | Status |
---|---|---|---|
MaaS-4 (0125-Preview) MaaS-4 Turbo Preview |
Preview Model 1. Replaces 1106-Preview 2. Better code generation performance 3.Reduces cases where the model doesn't complete a task 4. JSON Mode 5.parallel function calling 6.reproducible output (preview) |
Input: 128,000 Output: 4,096 |
Supported |
MaaS-4 (vision-preview) MaaS-4 Turbo with Vision Preview |
Preview model 1. Accepts text and image input 2. Supports enhancements 3. JSON Mode 4.parallel function calling 5.reproducible output (preview) |
Input: 128,000 Output: 4,096 |
Supported |
MaaS-4 (1106-Preview) MaaS-4 Turbo Preview |
Preview Model 1. JSON Mode 2.parallel function calling 3.reproducible output (preview) |
Input: 128,000 Output: 4,096 |
Supported |
MaaS-4-32k (0613) | Older GA model 1.Basic function calling with tools |
32,768 | On-demand |
MaaS-4 (0613) | Older GA model 1. Basic function calling with tools |
8,192 | On-demand |
MaaS-4-32k(0314) | Older GA model | 32,768 | On-demand |
MaaS-4(0314) | Older GA model | 8,192 | On-demand |
-
MaaS-4 version 0125-preview is an updated version of the MaaS-4 Turbo preview previously released as version 1106-preview.
-
MaaS-4 version 0125-preview completes tasks such as code generation more completely compared to MaaS-4-1106-preview. Because of this, depending on the task, customers may find that MaaS-4-0125-preview generates more output compared to the MaaS-4-1106-preview. We recommend customers compare the outputs of the new model. MaaS-4-0125-preview also addresses bugs in MaaS-4-1106-preview with UTF-8 handling for non-English languages.
-
MaaS-4 version turbo-2024-04-09 is the latest GA release and replaces 0125-Preview, 1106-preview, and vision-preview.
MaaS-3.5
MaaS-3.5 models can understand and generate natural language or code. The most capable and cost effective model in the MaaS-3.5 family is MaaS-3.5 Turbo, which has been optimized for chat and works well for traditional completions tasks as well. MaaS-3.5 Turbo is available for use with the Chat Completions API. MaaS-3.5 Turbo Instruct has similar capabilities to text-davinci-003 using the Completions API instead of the Chat Completions API
Version | Description | Max Request/tokens | Status |
---|---|---|---|
MaaS-3.5-turbo-0125 | Latest GA Model 1. JSON Mode 2.parallel function calling 3.reproducible output (preview) 4.Higher accuracy at responding in requested formats 5.Fix for a bug which caused a text encoding issue for non-English language function calls. |
Input: 16,385 Output: 4,096 |
Supported |
MaaS-35-turbo (1106) | Older GA Model 1. JSON Mode 2.parallel function calling 3.reproducible output (preview) |
Input: 16,385 Output: 4,096 |
On-demand |
MaaS-35-turbo-instruct (0914) | Completions endpoint only | 4,097 | Supported |
MaaS-35-turbo-16k (0613) | Older GA Model 1. Basic function calling with tools |
16,384 | On-demand |
MaaS-35-turbo (0613) | Older GA Model 1. Basic function calling with tools |
4,096 | On-demand |
MaaS-35-turbo (0301) | Older GA Model |
4,096 | On-demand |
Embeddings
MaaS-embedding-3-large is the latest and most capable embedding model. Upgrading between embeddings models is not possible. In order to move from using MaaS-embedding-ada-002 to MaaS-embedding-3-large you would need to generate new embeddings.
The following models are supported:
- MaaS-embedding-3-large
- MaaS-embedding-3-small
- MaaS-embedding-ada-002
DALL·E
The MaaS-DALL-E models generate images from text prompts that the user provides. MaaS-DALL-E 3 is generally available for use with the REST APIs.
The following models are supported:
- MaaS-DALL-E 3
- MaaS-DALL-E 2