<-Go to AI Workflows       Go to ToC       Virtual Meeting Secretary->

1     Functions 2     Reference Model 3     I/O Data
4     Functions of AI Modules 5     I/O Data of AI Modules 6     AIW, AIMs, and JSON Metadata
7.     Reference Software 8     Conformance Testing 9     Performance Assessment

1      Functions

The goal of the Text and Speech Translation (MMC-UST) Use Case is to translate speech segments expressed in a source language into a target language or to produce the textual version of the translated speech. If the desired output is speech, the user can specify whether their speech features (voice colour, emotional charge, etc.) should be preserved in the translated speech.

The flow of control is from Input Speech or Input Text to Translated Text, and then to Output Speech and Output Text. Depending on the value of Input Selector:

  1. Input Text in Language A is translated into Translated Text in Language B and pronounced as Speech in Language B.
  2. The Speech features (voice colour, emotional charge, etc.) in Language A are preserved in Language B.

2      Reference Model

Figure 1 depicts the input/output data, the AIMs, and the data exchanged between AIMs of the Text and Speech Translation AIW.

Figure 1 – Reference Model of Text and Speech Translation (MMC-TST) AIW

In previous MPAI-MMC versions, this AIW was called Unidirectional Speech Translation (MMC-UST). The same previous versions included two variations of the Text and Speech Translation (MMC-TST): Bidirectional and One-to-Many. They are reported here to show that they are based on the same MMC-TST AI Workflow.

Figure 2 – Bidirectional Speech Translation (MMC-BST) V2.1 Figure 2 – One-to-Many Speech Translation (MMC-MST) V2.1

3      I/O Data

The input and output data of the Text and Speech Translation AI Workflow are:

Table 1 – I/O Data of Text and Speech Translation

Input Descriptions
Media Selector Determines whether the input will be in Text or Speech
Language Selector Determines the input and output language
Feature Selector Determines whether the Speakers vocal features should be added to synthetic speech.
Input Speech Speech produced in Language A by a human desiring translation into language B.
Input Text Alternative textual source information to be translated into and pron­ounced in language B depending on the value of Input Selector.
Media Selector Determines whether: the Input Speech features are preserved in the Output Speech.
Output Descriptions
Translated Speech Input Speech translated into language B preserving the Input Speech features in the Output Speech, depending on the value of Input Selector.
Translated Text Text of Input Speech or Input Text translated into language B, depending on the value of Input Selector.

4      Functions of AI Modules

Table 2 gives the functions of Text and Speech Translation AIMs.

Table 2 – Functions of Text and Speech Translation AI Modules

AIM Functions
Automatic Speech Recognition Recognises Speech
Text-to-Text Translation Translates Recognised Text
Entity Speech Description Extracts Speech Features
Text-to-Speech With Descriptors Synthesises Translated Text adding Speech Features

5      I/O Data of AI Modules of Text and Speech Translation

The AI Modules of Text and Speech Translation are given in Table 3.

Table 3 – AI Modules of Text and  Speech Translation

AIM Receives Produces
Automatic Speech Recognition Input Speech Recognised Text
Text-to-Text Translation 1.     Input Text
2.     Recognised Text
Translated Text
Entity Speech Description Input Speech Speaker’s Speech Descriptors
Text-to-Speech With Descriptors 1.     Translated Text
2.    Speech Descriptors
Produces Output Speech.

6      AIW, AIMs, and JSON Metadata

Table 4 – AIMs and JSON Metadata

AIW 
 AIMs Name JSON
MMC-TST Text and Speech Translation X
MMC-ASR Automatic Speech Recognition X
MMC-TTT Text-to-Text Translation X
MMC-ESD Entity Speech Description X
MMC-DTS Text-to-Speech With Descriptors X

7     Reference Software

8     Conformance Testing

Important note. This Conformance Testing Specification does not provide methods and datasets to Test the Conformance of the individual Speech Feature Extraction and Text-To-Speech Basic AIMs, only of their Descriptors Speech Translation Composite AIMs.

Input Data Data Type Input Conformance Testing Data
Input Selector Selector All Input Selectors to conform with Selector.
Requested Language Selector All Language Selectors to be drawn from Language Codes.
Input Text Unicode All input Text files shall be drawn from Text files.
Input Speech .wav All input Text files shall be drawn from Speech files.
Output Data Data Type Conformance Test
Machine Text Unicode All Text files produced shall conform with Text files.
Machine Speech .wav All Speech files produced shall conform with Speech files.

9     Performance Assessment

<-Go to AI Workflows       Go to ToC       Virtual Meeting Secretary->