ModelType

TTS

TTS (Text-to-Speech) model is a technology that converts written text into speech, enabling computers to "read" the text aloud. It works by parsing the input text into speech units and then generating natural-sounding speech. TTS models are commonly used in applications such as navigation systems, virtual assistants, and audiobooks.

The following models are supported:

MaaS-Ele
MaaS-nar
MaaS ASpeech
MaaS OSpeech

MaaS-Ele

MaaS-Ele is an AI-based model that provides text-to-speech and speech cloning services with various functionalities.

High-Quality Speech Generation: MaaS-Ele's AI speech generator can render human intonations and emotions with exceptional fidelity, adjusting speech delivery based on context.
Multilingual Support: It supports 32 languages and over 100 voices, making it suitable for speech generation in various content types such as games, videos, podcasts, and more.
Speech Cloning: MaaS-Ele offers the capability to clone voices, allowing users to create unique voices and customize settings.
Diverse Applications: It is applicable to various scenarios, including text-to-speech, speech-to-speech, dubbing, and sound effect generation.
Advanced Features: Compared to other text-to-speech services, MaaS-Ele provides richer functionality, including telephone format support and multilingual generation.
Project Support: For users who require generating longer content, the project feature is recommended to handle extremely lengthy text.
Generation Limit: Each session can process up to 5,000 characters at a time.

MaaS-Ele's text-to-speech service supports the following audio output formats:

MP3
WAV

By default, audio generated on the website is in MP3 format, but other formats, such as PCM and μ-law, are also available as options.

MaaS-nar

MaaS-nar is a model that uses AI technology to convert text into natural speech. It is suitable for creating various content such as videos, e-learning modules, and audiobooks, catering to users who require fast generation of high-quality audio content.

Multilingual Support: MaaS-nar offers over 700 voices in 100 languages, making it suitable for producing various types of audio and video content. Diverse Voice Selection: Users can choose voices of different ages, genders, and tones, making it suitable for training materials, storytelling, and audiobooks. Easy to Use: Simply input the text and select the voice options to quickly generate professional audio or video. Efficient Production: MaaS-nar can convert Word documents into natural speech in MP3, M4A, or WAV formats, saving time on recording and editing.

The text-to-speech service provided by MaaS-nar supports the following audio output formats:

MP3
M4A
WAV

These formats are suitable for various purposes, including the creation of videos, e-learning modules, and audiobooks.

MaaS ASpeech

MaaS ASpeech generates high-quality and natural speech output using voice generation techniques. The model leverages state-of-the-art machine learning and deep learning technologies to achieve high-quality speech synthesis. It is trained on large datasets of speech and text to meet high standards in speech naturalness, clarity, and emotional expression.

High Naturalness of Speech Output: Utilizing deep learning technology, the generated speech is closer to human natural speaking, with smooth intonation and emotional expression.
Multi-Language and Multi-Dialect Support: Supports multiple languages and dialects, providing localized speech experiences for users worldwide.
Fast Response and Low Latency: By using optimized algorithms and high-performance cloud computing resources, it offers a quick response in speech generation, maintaining low latency even in large-scale usage scenarios.
High Availability and Scalability: Based on a cloud platform, it offers high reliability and scalability, suitable for various application scenarios, from small applications to large enterprise-level solutions.

MaaS OSpeech

MaaS OSpeech processes input text using deep learning and neural network technologies, then generates high-quality and natural speech output. The model, trained on a large amount of speech data, can understand and synthesize speech with various tones and emotions.

High Naturalness of Speech Output: By using the most advanced deep learning technology, the generated speech is more natural and fluent compared to traditional TTS systems, with emotional and tonal variations similar to human speech.
Multi-Language and Multi-Dialect Support: Supports multiple languages and accents, catering to the needs of users from different regional cultures, enabling localized speech synthesis.
Real-Time Response: Leveraging the powerful computational capabilities of the cloud platform, the MaaS OSpeech model can quickly process and generate speech, meeting the demands of real-time interactive applications.

MaaS-C

MaaS-C is a powerful natural language processing model. Its strength lies in its robust language understanding and generation capabilities. It can comprehend complex semantic relationships and contextual information, and generate high-quality, fluent, and natural text.

The following models are supported:

MaaS-3.5 Sonnet
MaaS-3 Sonnet
MaaS-3 Haiku

MaaS-Ge

The MaaS-Ge model is a high-performance, multitasking artificial intelligence system, distinguished by its exceptional accuracy and efficiency. It proficiently handles a variety of tasks, demonstrating remarkable adaptability and flexibility. Moreover, the design of the MaaS-Ge model emphasizes scalability, enabling seamless deployment and optimization across diverse application scenarios to meet various business needs.

The following models are supported:

MaaS-1.0 Pro
MaaS-1.5 Pro
MaaS-1.5 Flash

MaaS-GP

MaaS-4o integrates text and images in a single model, enabling it to handle multiple data types simultaneously. This multimodal approach enhances accuracy and responsiveness in human-computer interactions. MaaS-4o matches MaaS-4 Turbo in English text and coding tasks while offering superior performance in non-English languages and vision tasks, setting new benchmarks for AI capabilities.

Version	Description	Max Request/tokens	Status
MaaS-4o mini(2024-07-18)	1.Fast, inexpensive, capable model ideal for replacing MaaS 3.5 Turbo series models. 2. Text, image processing. 3.JSON Mode. 4.parallel function calling. 5.Does not support enhancements.	Input: 128,000 Output: 16,384	Supported
MaaS-4o(2024-05-13) MaaS-4o (Omni)	Latest GA model 1.Text, image processing. 2.JSON Mode. 3. parallel function calling. 4.Enhanced accuracy and responsiveness. 5.Parity with English text and coding tasks compared to MaaS-4 Turbo with Vision. 6.Superior performance in non-English languages and in vision tasks. 7.Does not support enhancements	Input: 128,000 Output: 4,096	Supported

MaaS-4 Turbo

MaaS-4 Turbo is a large multimodal model (accepting text or image inputs and generating text). Like MaaS-3.5 Turbo, and older MaaS-4 models MaaS-4 Turbo is optimized for chat and works well for traditional completions tasks.

The latest GA release of MaaS-4 Turbo is: MaaS-4 Version: turbo-2024-04-09

Version	Description	Max Request/tokens	Status
MaaS-4 turbo-2024-04-09 MaaS-4 Turbo with Vision	New GA model 1.Replacement for all previous MaaS-4 preview models (vision-preview, 1106-Preview, 0125-Preview). 2.Feature availability is currently different depending on method of input, and deployment type. 3.Does not support enhancements.	Input: 128,000 Output: 4,096	Supported

MaaS-4 turbo-2024-04-09 is the replacement for the following preview models:

MaaS-4 Version：1106-Preview
MaaS-4 Version：0125-Preview
MaaS-4 Version：vision-preview

MaaS-4

MaaS-4 is the predecessor to MaaS-4 Turbo. Both the MaaS-4 and MaaS-4 Turbo models have a base model name of MaaS-4. You can distinguish between the MaaS-4 and Turbo models by examining the model version.

Version	Description	Max Request/tokens	Status
MaaS-4 (0125-Preview) MaaS-4 Turbo Preview	Preview Model 1. Replaces 1106-Preview 2. Better code generation performance 3.Reduces cases where the model doesn't complete a task 4. JSON Mode 5.parallel function calling 6.reproducible output (preview)	Input: 128,000 Output: 4,096	Supported
MaaS-4 (vision-preview) MaaS-4 Turbo with Vision Preview	Preview model 1. Accepts text and image input 2. Supports enhancements 3. JSON Mode 4.parallel function calling 5.reproducible output (preview)	Input: 128,000 Output: 4,096	Supported
MaaS-4 (1106-Preview) MaaS-4 Turbo Preview	Preview Model 1. JSON Mode 2.parallel function calling 3.reproducible output (preview)	Input: 128,000 Output: 4,096	Supported
MaaS-4-32k (0613)	Older GA model 1.Basic function calling with tools	32,768	On-demand
MaaS-4 (0613)	Older GA model 1. Basic function calling with tools	8,192	On-demand
MaaS-4-32k(0314)	Older GA model	32,768	On-demand
MaaS-4(0314)	Older GA model	8,192	On-demand

MaaS-4 version 0125-preview is an updated version of the MaaS-4 Turbo preview previously released as version 1106-preview.
MaaS-4 version 0125-preview completes tasks such as code generation more completely compared to MaaS-4-1106-preview. Because of this, depending on the task, customers may find that MaaS-4-0125-preview generates more output compared to the MaaS-4-1106-preview. We recommend customers compare the outputs of the new model. MaaS-4-0125-preview also addresses bugs in MaaS-4-1106-preview with UTF-8 handling for non-English languages.
MaaS-4 version turbo-2024-04-09 is the latest GA release and replaces 0125-Preview, 1106-preview, and vision-preview.

MaaS-3.5

MaaS-3.5 models can understand and generate natural language or code. The most capable and cost effective model in the MaaS-3.5 family is MaaS-3.5 Turbo, which has been optimized for chat and works well for traditional completions tasks as well. MaaS-3.5 Turbo is available for use with the Chat Completions API. MaaS-3.5 Turbo Instruct has similar capabilities to text-davinci-003 using the Completions API instead of the Chat Completions API

Version	Description	Max Request/tokens	Status
MaaS-3.5-turbo-0125	Latest GA Model 1. JSON Mode 2.parallel function calling 3.reproducible output (preview) 4.Higher accuracy at responding in requested formats 5.Fix for a bug which caused a text encoding issue for non-English language function calls.	Input: 16,385 Output: 4,096	Supported
MaaS-35-turbo (1106)	Older GA Model 1. JSON Mode 2.parallel function calling 3.reproducible output (preview)	Input: 16,385 Output: 4,096	On-demand
MaaS-35-turbo-instruct (0914)	Completions endpoint only	4,097	Supported
MaaS-35-turbo-16k (0613)	Older GA Model 1. Basic function calling with tools	16,384	On-demand
MaaS-35-turbo (0613)	Older GA Model 1. Basic function calling with tools	4,096	On-demand
MaaS-35-turbo (0301)	Older GA Model	4,096	On-demand

Embeddings

MaaS-embedding-3-large is the latest and most capable embedding model. Upgrading between embeddings models is not possible. In order to move from using MaaS-embedding-ada-002 to MaaS-embedding-3-large you would need to generate new embeddings.

The following models are supported:

MaaS-embedding-3-large
MaaS-embedding-3-small
MaaS-embedding-ada-002

DALL·E

The MaaS-DALL-E models generate images from text prompts that the user provides. MaaS-DALL-E 3 is generally available for use with the REST APIs.

The following models are supported:

MaaS-DALL-E 3
MaaS-DALL-E 2