<-Go to AI Workflows Go to ToC Virtual Meeting Secretary->
1 Functions | 2 Reference Model | 3 I/O Data |
4 Functions of AI Modules | 5 I/O Data of AI Modules | 6 AIW, AIMs, and JSON Metadata |
7. Reference Software | 8 Conformance Testing | 9 Performance Assessment |
1 Functions
The goal of the Text and Speech Translation (MMC-UST) Use Case is to translate speech segments expressed in a source language into a target language or to produce the textual version of the translated speech. If the desired output is speech, the user can specify whether their speech features (voice colour, emotional charge, etc.) should be preserved in the translated speech.
The flow of control is from Input Speech or Input Text to Translated Text, and then to Output Speech and Output Text. Depending on the value of Input Selector:
- Input Text in Language A is translated into Translated Text in Language B and pronounced as Speech in Language B.
- The Speech features (voice colour, emotional charge, etc.) in Language A are preserved in Language B.
2 Reference Model
Figure 1 depicts the input/output data, the AIMs, and the data exchanged between AIMs of the Text and Speech Translation AIW.
Figure 1 – Reference Model of Text and Speech Translation (MMC-TST) AIW
In previous MPAI-MMC versions, this AIW was called Unidirectional Speech Translation (MMC-UST). The same previous versions included two variations of the Text and Speech Translation (MMC-TST): Bidirectional and One-to-Many. They are reported here to show that they are based on the same MMC-TST AI Workflow.
Figure 2 – Bidirectional Speech Translation (MMC-BST) V2.1 | Figure 2 – One-to-Many Speech Translation (MMC-MST) V2.1 |
3 I/O Data
The input and output data of the Text and Speech Translation AI Workflow are:
Table 1 – I/O Data of Text and Speech Translation
Input | Descriptions |
Media Selector | Determines whether the input will be in Text or Speech |
Language Selector | Determines the input and output language |
Feature Selector | Determines whether the Speakers vocal features should be added to synthetic speech. |
Input Speech | Speech produced in Language A by a human desiring translation into language B. |
Input Text | Alternative textual source information to be translated into and pronounced in language B depending on the value of Input Selector. |
Media Selector | Determines whether: the Input Speech features are preserved in the Output Speech. |
Output | Descriptions |
Translated Speech | Input Speech translated into language B preserving the Input Speech features in the Output Speech, depending on the value of Input Selector. |
Translated Text | Text of Input Speech or Input Text translated into language B, depending on the value of Input Selector. |
4 Functions of AI Modules
Table 2 gives the functions of Text and Speech Translation AIMs.
Table 2 – Functions of Text and Speech Translation AI Modules
AIM | Functions |
Automatic Speech Recognition | Recognises Speech |
Text-to-Text Translation | Translates Recognised Text |
Entity Speech Description | Extracts Speech Features |
Text-to-Speech With Descriptors | Synthesises Translated Text adding Speech Features |
5 I/O Data of AI Modules of Text and Speech Translation
The AI Modules of Text and Speech Translation are given in Table 3.
Table 3 – AI Modules of Text and Speech Translation
AIM | Receives | Produces |
Automatic Speech Recognition | Input Speech | Recognised Text |
Text-to-Text Translation | 1. Input Text 2. Recognised Text |
Translated Text |
Entity Speech Description | Input Speech | Speaker’s Speech Descriptors |
Text-to-Speech With Descriptors | 1. Translated Text 2. Speech Descriptors |
Produces Output Speech. |
6 AIW, AIMs, and JSON Metadata
Table 4 – AIMs and JSON Metadata
AIW |
AIMs | Name | JSON |
MMC-TST | Text and Speech Translation | X | |
MMC-ASR | Automatic Speech Recognition | X | |
MMC-TTT | Text-to-Text Translation | X | |
MMC-ESD | Entity Speech Description | X | |
MMC-DTS | Text-to-Speech With Descriptors | X |
7 Reference Software
8 Conformance Testing
Important note. This Conformance Testing Specification does not provide methods and datasets to Test the Conformance of the individual Speech Feature Extraction and Text-To-Speech Basic AIMs, only of their Descriptors Speech Translation Composite AIMs.
Input Data | Data Type | Input Conformance Testing Data |
Input Selector | Selector | All Input Selectors to conform with Selector. |
Requested Language | Selector | All Language Selectors to be drawn from Language Codes. |
Input Text | Unicode | All input Text files shall be drawn from Text files. |
Input Speech | .wav | All input Text files shall be drawn from Speech files. |
Output Data | Data Type | Conformance Test |
Machine Text | Unicode | All Text files produced shall conform with Text files. |
Machine Speech | .wav | All Speech files produced shall conform with Speech files. |