MPAI-MMC V2.3 AIMs Text-To-Speech

Go To MPAI-MMC AI Modules

1 Function	2 Reference Model	3 Input/Output Data
4 SubAIMs	5 JSON Metadata	6 Profiles
7 Reference Software	8 Conformance Texting	9 Performance Assessment

1 Functions

Text-To-Speech (MMC-TTS):

Receives	Text Object
	Personal Status	to be contained in the Synthesised Speech Object.
	Speech Model	used by AIM depending on Profile.
Feeds	Text Object and Personal Status	to Speech Model.
Produces	Synthesised Speech Object	output of AIM.

2 Reference Model

Figure 1 specifies the Reference Model of the Text-To-Speech (MMC-TTS) AIM.

Figure 1 – The Text-To-Speech AIM Reference Model

3 Input/Output Data

Table 1 specifies the Input and Output Data of the Automatic Speech Recognition AIM.

Table 1 – I/O Data of the Automatic Speech Recognition AIM

Input	Description
Text Object	Input Text.
Personal Status	Input Personal Status of the Speech Modality.
Speech Model	NN Model used to produce Speech from Text and Personal Status.
Output	Description
Speech Object	Output of the Text-To-Speech AIM,

4 SubAIMs

No SubAIMs.

5 JSON Metadata

https://schemas.mpai.community/MMC/V2.3/AIMs/TextToSpeech.json

6 Profiles

The Text-To-Speech Profiles are specified.

7 Reference Software

7.1 Disclaimers

The purpose of this MMC-TTS Reference Software is to provide a working Implementation of MMC-TTS, not to provide a ready-to-use product.
MPAI disclaims the suitability of the Software for any other purposes and does not guarantee that it is secure.
Use of this Reference Software may require acceptance of licences from the respective repositories. Users shall verify that they have the right to use any third-party software required by this Reference Software.

7.2 Guide to the MMC-TTS code

Use of this AI Module is for developers who are familiar with Python and downloading models from HuggingFace,

A wrapper for the speech5 NN Module

Manages input files and parameters: Text Object
Executes the BLIP Module to perform the Speech Recognition on each individual pair of Text and Visual Object.
Outputs Speech Object as answer.

The MMC-TTS Reference Software is found at the MPAI-NNW gitlab site. It contains:

The python code implementing the AIM
Required libraries are: pytorch, transformers (HuggingFace), datasets (HuggingFace), and soundfile.

7.3 Acknowledgements

This version of the MMC-TTS Reference Software has been developed by the MPAI Neural Network Watermarking Development Committee (NNW-DC).

8 Conformance Testing

Table 2 provides the Conformance Testing Method for MMC-TTS AIM.

If a schema contains references to other schemas, conformance of data for the primary schema implies that any data referencing a secondary schema shall also validate against the relevant schema, if present and conform with the Qualifier, if present.

Table 2 – Conformance Testing Method for MMC-TTS AIM

Input	Text Object	Shall validate against Text Object schema. Text Data shall conform with Text Qualifier.
	Personal Status	Shall validate against Personal Status schema.
	Speech Model	Shall validate against Machine Learning Model schema. Machine Learning Model Data shall conform with Machine Learning Model Qualifier.
Output	Synthesised Speech Object	Shall validate against Speech Object schema. Speech Data shall conform with Speech Qualifier.

Table 3 provides an example of MMC-TTS AIM conformance testing.

Table 3 – An example MMC-TTS AIM conformance testing

Input Data	Data Type	Input Conformance Testing Data
Machine Text	Unicode	All input Text files to be drawn from Text files.
Machine Emotion	JSON	All input JSON Emotion files to be drawn from Emotion JSON Files
Output Data	Data Type	Output Conformance Testing Criteria
Machine Speech	.wav	All Speech files produced shall conform with Speech.

9 Performance Assessment