MPAI-MMC V2.2 AIWs Text and Speech Translation

<-Go to AI Workflows Go to ToC Virtual Meeting Secretary->

1 Functions	2 Reference Model	3 I/O Data
4 Functions of AI Modules	5 I/O Data of AI Modules	6 AIW, AIMs, and JSON Metadata
7. Reference Software	8 Conformance Testing	9 Performance Assessment

1 Functions

The goal of the Text and Speech Translation (MMC-UST) Use Case is to translate speech segments expressed in a source language into a target language or to produce the textual version of the translated speech. If the desired output is speech, the user can specify whether their speech features (voice colour, emotional charge, etc.) should be preserved in the translated speech.

The flow of control is from Input Speech or Input Text to Translated Text, and then to Output Speech and Output Text. Depending on the value of Input Selector:

Input Text in Language A is translated into Translated Text in Language B and pronounced as Speech in Language B.
The Speech features (voice colour, emotional charge, etc.) in Language A are preserved in Language B.

2 Reference Model

Figure 1 depicts the input/output data, the AIMs, and the data exchanged between AIMs of the Text and Speech Translation AIW.

Figure 1 – Reference Model of Text and Speech Translation (MMC-TST) AIW

In previous MPAI-MMC versions, this AIW was called Unidirectional Speech Translation (MMC-UST). The same previous versions included two variations of the Text and Speech Translation (MMC-TST): Bidirectional and One-to-Many. They are reported here to show that they are based on the same MMC-TST AI Workflow.


Figure 2 – Bidirectional Speech Translation (MMC-BST) V2.1	Figure 2 – One-to-Many Speech Translation (MMC-MST) V2.1

3 I/O Data

The input and output data of the Text and Speech Translation AI Workflow are:

Table 1 – I/O Data of Text and Speech Translation

Input	Descriptions
Media Selector	Determines whether the input will be in Text or Speech
Language Selector	Determines the input and output language
Feature Selector	Determines whether the Speakers vocal features should be added to synthetic speech.
Input Speech	Speech produced in Language A by a human desiring translation into language B.
Input Text	Alternative textual source information to be translated into and pronounced in language B depending on the value of Input Selector.
Media Selector	Determines whether: the Input Speech features are preserved in the Output Speech.
Output	Descriptions
Translated Speech	Input Speech translated into language B preserving the Input Speech features in the Output Speech, depending on the value of Input Selector.
Translated Text	Text of Input Speech or Input Text translated into language B, depending on the value of Input Selector.

4 Functions of AI Modules

Table 2 gives the functions of Text and Speech Translation AIMs.

Table 2 – Functions of Text and Speech Translation AI Modules

AIM	Functions
Automatic Speech Recognition	Recognises Speech
Text-to-Text Translation	Translates Recognised Text
Entity Speech Description	Extracts Speech Features
Text-to-Speech With Descriptors	Synthesises Translated Text adding Speech Features

5 I/O Data of AI Modules of Text and Speech Translation

The AI Modules of Text and Speech Translation are given in Table 3.

Table 3 – AI Modules of Text and Speech Translation

AIM	Receives	Produces
Automatic Speech Recognition	Input Speech	Recognised Text
Text-to-Text Translation	1. Input Text 2. Recognised Text	Translated Text
Entity Speech Description	Input Speech	Speaker’s Speech Descriptors
Text-to-Speech With Descriptors	1. Translated Text 2. Speech Descriptors	Produces Output Speech.

6 AIW, AIMs, and JSON Metadata

Table 4 – AIMs and JSON Metadata

AIW	AIMs	Name	JSON
MMC-TST		Text and Speech Translation	X
	MMC-ASR	Automatic Speech Recognition	X
	MMC-TTT	Text-to-Text Translation	X
	MMC-ESD	Entity Speech Description	X
	MMC-DTS	Text-to-Speech With Descriptors	X

7 Reference Software

8 Conformance Testing

Important note. This Conformance Testing Specification does not provide methods and datasets to Test the Conformance of the individual Speech Feature Extraction and Text-To-Speech Basic AIMs, only of their Descriptors Speech Translation Composite AIMs.

Input Data	Data Type	Input Conformance Testing Data
Input Selector	Selector	All Input Selectors to conform with Selector.
Requested Language	Selector	All Language Selectors to be drawn from Language Codes.
Input Text	Unicode	All input Text files shall be drawn from Text files.
Input Speech	.wav	All input Text files shall be drawn from Speech files.
Output Data	Data Type	Conformance Test
Machine Text	Unicode	All Text files produced shall conform with Text files.
Machine Speech	.wav	All Speech files produced shall conform with Speech files.

9 Performance Assessment