1 Functions
Text-To-Speech (MMC-TTS):
Receives | Text Object | |
Personal Status | to be contained in the Synthesised Speech Object. | |
Speech Model | used by AIM depending on Profile. | |
Feeds | Text Object and Personal Status | to Speech Model. |
Produces | Synthesised Speech Object | output of AIM. |
2 Reference Model
Figure 1 specifies the Reference Model of the Text-To-Speech (MMC-TTS) AIM.
Figure 1 – The Text-To-Speech AIM Reference Model
3 Input/Output Data
Table 1 specifies the Input and Output Data of the Automatic Speech Recognition AIM.
Table 1 – I/O Data of the Automatic Speech Recognition AIM
Input | Description |
Text Object | Input Text. |
Personal Status | Input Personal Status of the Speech Modality. |
Speech Model | NN Model used to produce Speech from Text and Personal Status. |
Output | Description |
Speech Object | Output of the Text-To-Speech AIM, |
4 SubAIMs
No SubAIMs.
5 JSON Metadata
https://schemas.mpai.community/MMC/V2.2/AIMs/TextToSpeech.json
6 Profiles
The Text-To-Speech Profiles are specified.
7 Reference Software
7.1 Disclaimers
- The purpose of this MMC-TTS Reference Software is to provide a working Implementation of MMC-TTS, not to provide a ready-to-use product.
- MPAI disclaims the suitability of the Software for any other purposes and does not guarantee that it is secure.
- Use of this Reference Software may require acceptance of licences from the respective repositories. Users shall verify that they have the right to use any third-party software required by this Reference Software.
7.2 Guide to the MMC-TTS code
Use of this AI Module is for developers who are familiar with Python and downloading models from HuggingFace,
A wrapper for the speech5 NN Module
- Manages input files and parameters: Text Object
- Executes the BLIP Module to perform the Speech Recognition on each individual pair of Text and Visual Object.
- Outputs Speech Object as answer.
The MMC-TTS Reference Software is found at the MPAI-NNW gitlab site. It contains:
- The python code implementing the AIM
- Required libraries are: pytorch, transformers (HuggingFace), datasets (HuggingFace), and soundfile.
7.3 Acknowledgements
This version of the MMC-TTS Reference Software has been developed by the MPAI Neural Network Watermarking Development Committee (NNW-DC).
8 Conformance Testing
Input Data | Data Type | Input Conformance Testing Data |
Machine Text | Unicode | All input Text files to be drawn from Text files. |
Machine Emotion | JSON | All input JSON Emotion files to be drawn from Emotion JSON Files |
Output Data | Data Type | Output Conformance Testing Criteria |
Machine Speech | .wav | All Speech files produced shall conform with Speech. |