1     Function 2     Reference Model 3     Input/Output Data
4     SubAIMs 5     JSON Metadata 6     Profiles
7     Reference Software 8     Conformance Texting 9     Performance Assessment

1     Functions

Text-To-Speech (MMC-TTS):

Receives Text Object
Personal Status to be contained in the Synthesised Speech Object.
Speech Model
Feeds Text Object and Personal Status to Speech Model.
Produces Synthesised Speech Object.

2     Reference Model

Figure 1 depicts the Reference Model of the Text-To-Speech AIM.

 

Figure 1 – The Text-To-Speech AIM Reference Model

3    Input/Output Data

Table 1 specifies the Input and Output Data of the Automatic Speech Recognition AIM.

Table 1 – I/O Data of the Automatic Speech Recognition AIM

Input Description
Text Object Input Text.
Personal Status Input Personal Status of the Speech Modality.
Speech Model NN Model used to produce Speech from Text and Personal Status.
Output Description
Speech Object Output of the Text-To-Speech AIM,

4     SubAIMs

No SubAIMs.

5     JSON Metadata

https://schemas.mpai.community/MMC/V2.2/AIMs/TextToSpeech.json

6     Profiles

The Text-To-Speech Profiles are specified.

7     Reference Software

 7.1 Disclaimers

  1. The purpose of this Reference Software is to show a working Implementation of MMC-TTS, not to provide a ready-to-use product.
  2. MPAI disclaims the suitability of the Software for any other purposes and does not guarantee that it is secure.
  3. Use of this Reference Software may require acceptance of licences from the respective repositories. Users shall verify that they have the right to use any third-party software required by this Reference Software.

7.2    Guide to the MMC-TTS code

A wrapper for the speech5 NN Module

  1. Manages input files and parameters: Text Object
  2. Executes the BLIP Module to perform the Speech Recognition on each individual pair of Text and Visual Object.
  3. Outputs Speech Object as answer.

The MMC-TTS Reference Software is found at the MPAI-NNW gitlab site. It contains:

  1. The python code implementing the AIM
  2. Required libraries are: pytorch, transformers (HuggingFace), datasets (HuggingFace), and soundfile.

7.3 Acknowledgements

This version of the MMC-TTS Reference Software has been developed by the MPAI Neural Network Watermarking Development Committee (NNW-DC).

8     Conformance Testing

Input Data Data Type Input Conformance Testing Data
Machine Text Unicode All input Text files to be drawn from Text files.
Machine Emotion JSON All input JSON Emotion files to be drawn from Emotion JSON Files
Output Data Data Type Output Conformance Testing Criteria
Machine Speech .wav All Speech files produced shall conform with Speech.

9     Performance Assessment