1 Scope of Unidirectional Speech Translation
2 Reference Architecture of Unidirectional Speech Translation
3 I/O Data of Unidirectional Speech Translation
4 Functions of AI Modules of Unidirectional Speech Translation
5 I/O Data of AI Modules of Unidirectional Speech Translation
6 JSON Metadata of Unidirectional Speech Translation
1 Functions of Unidirectional Speech Translation
The goal of the Unidirectional Speech Translation (MMC-UST) Use Case is to translate speech segments expressed in a source language into a target language or to produce the textual version of the translated speech. If the desired output is speech, the user can specify whether their speech features (voice colour, emotional charge, etc.) should be preserved in the translated speech.
The flow of control is from Input Speech or Input Text to Translated Text, and then to Output Speech and Output Text. Depending on the value of Input Selector:
- Input Text in Language A is translated into Translated Text in Language B and pronounced as Speech in Language B.
- The Speech features (voice colour, emotional charge, etc.) in Language A are preserved in Language B.
2 Reference Architecture of Unidirectional Speech Translation
Figure 1 describes the input/output data, the AIMs and the data exchanged between AIMs.
Figure 1 – Reference Model of Unidirectional Speech Translation (UST)
3 I/O Data of Unidirectional Speech Translation
The input and output data of the Unidirectional Speech Translation Use Case are:
Table 1 – I/O Data of Unidirectional Speech Translation
Input | Descriptions |
Input Selector | Determines whether: 1. The input will be in Text or Speech 2. The Input Speech features are preserved in the Output Speech. |
Language Preferences | User-specified input Language (A) and output Language (B). |
Input Speech | Speech produced in Language A by a human desiring translation into language B. |
Input Text | Alternative textual source information to be translated into and pronounced in language B depending on the value of Input Selector. |
Output | Descriptions |
Translated Speech | Input Speech translated into language B preserving the Input Speech features in the Output Speech, depending on the value of Input Selector. |
Translated Text | Text of Input Speech or Input Text translated into language B, depending on the value of Input Selector. |
4 Functions of AI Modules of Unidirectional Speech Translation
Table 2 gives the functions of Unidirectional Speech Translation AIMs.
Table 2 – Functions of Unidirectional Speech Translation AI Modules
AIM | Functions |
Automatic Speech Recognition | Recognises Speech |
Text-to-Text Translation | Translates Recognised Text |
Input Speech Description | Extracts Speech Features |
Text-to-Speech (Features) | Synthesises Translated Text adding Speech Features |
5 I/O Data of AI Modules of Unidirectional Speech Translation
The AI Modules of Unidirectional Speech Translation are given in Table 3.
Table 3 – AI Modules of Unidirectional Speech Translation
AIM | Receives | Produces |
Automatic Speech Recognition | Input Speech | Recognised Text |
Text-to-Text Translation | 1. Input Text 2. Recognised Text (Based on Input Selector) |
Translated Text |
Input Speech Description | Input Speech | Speaker-specific Speech Descriptors |
Text-to-Speech (Features) | 1. Translated Text 2. Speech Descriptors (Based on Input Selector) |
Produces Output Speech. |
6 Specification of Unidirectional Speech Translation AIW, AIMs, and JSON Metadata
Table 4 – AIMs and JSON Metadata
AIW and AIMs | Name | JSON | ||
MMC-UST | Unidirectional Speech Translation | X | ||
– | MMC-ASR | Audio Scene Description | X | |
– | MMC-TTT | Text-to-Speech | X | |
– | MMC-ISD | Input Speech Description | X | |
– | MMC-TTS | Text-to-Speech | X |