Go to MPAI-MMC V2.5 AI Modules
Function
Ref. Model
I/O Data
SubAIMs
JSON MData
Profiles
Ref. Software
Conformance
Performance
1 Functions
The Text-To-Speech (MMC‑TTS) AIM receives an input text and produces a synthetic speech version of it. The MMC‑TTS AIM may also receive the Personal Status to be used in the synthetic speech and a Speech Model:
| Receives | Text Object | Input Text. |
| Personal Status | To be contained in the Synthesised Speech Object. | |
| Speech Model | Used by AIM depending on Profile. | |
| Feeds | Text Object and Personal Status | To Speech Model. |
| Produces | Synthesised Speech Object | Output of AIM. |
2 Reference Model
Figure 1 depicts the Reference Model of the Text-To-Speech (MMC‑TTS) AIM.

Figure 1 – The Text-To-Speech (MMC‑TTS) AIM
3 I/O Data
Table 1 specifies the Input and Output Data of the Text-To-Speech (MMC‑TTS) AIM.
| Input | Description |
|---|---|
| Text Object | Input Text. |
| Personal Status | Input Personal Status of the Speech Modality. |
| Speech Model | NN Model used to produce Speech from Text and Personal Status. |
| Output | Description |
| Speech Object | Synthesised Speech output of the Text-To-Speech AIM. |
4 SubAIMs
No SubAIMs.
5 JSON Metadata
https://schemas.mpai.community/MMC/V2.5/AIMs/TextToSpeech.json
6 Profiles
The Text-To-Speech Profiles are specified.
7 Reference Software
7.1 Disclaimers
- This MMC‑TTS Reference Software Implementation is released with the BSD-3-Clause licence.
- The purpose of this MMC‑TTS Reference Software is to provide a working Implementation of MMC‑TTS, not to provide a ready-to-use product.
- MPAI disclaims the suitability of the Software for any other purposes and does not guarantee that it is secure.
- Use of this Reference Software may require acceptance of licences from the respective repositories. Users shall verify that they have the right to use any third-party software required by this Reference Software.
7.2 Guide to the MMC‑TTS code
Use of this AI Module is for developers who are familiar with Python and downloading models from HuggingFace.
A wrapper for the SpeechT5 NN Module:
- Manages input files and parameters: Text Object.
- Executes the SpeechT5 Module to perform Text-To-Speech synthesis on each input Text Object.
- Outputs Speech Object as answer.
The MMC‑TTS Reference Software is found at the MPAI gitlab site. It contains:
- The Python code implementing the AIM.
- Required libraries are: pytorch, transformers (HuggingFace), datasets (HuggingFace), and soundfile.
7.3 Acknowledgements
This version of the MMC‑TTS Reference Software has been developed by the MPAI Neural Network Watermarking Development Committee (NNW‑DC).
8 Conformance Testing
Table 2 provides the Conformance Testing Method for the Text-To-Speech (MMC‑TTS) AIM.
If a schema contains references to other schemas, conformance of data for the primary schema implies that any data referencing a secondary schema shall also validate against the relevant schema, if present, and conform with the Qualifier, if present.
| Receives | Text Object | Shall validate against Text Object schema. Text Data shall conform with Text Qualifier. |
| Personal Status | Shall validate against Personal Status schema. | |
| Speech Model | Shall validate against Machine Learning Model schema. Machine Learning Model Data shall conform with Machine Learning Model Qualifier. | |
| Produces | Synthesised Speech Object | Shall validate against Speech Object schema. Speech Data shall conform with Speech Qualifier. |
9 Performance Assessment
Not part of this specification.