1     Scope of Unidirectional Speech Translation

2     Reference Architecture of Unidirectional Speech Translation

3     I/O Data of Unidirectional Speech Translation

4     Functions of AI Modules of Unidirectional Speech Translation

5     I/O Data of AI Modules of Unidirectional Speech Translation

6     JSON Metadata of Unidirectional Speech Translation

1      Functions of Unidirectional Speech Translation

The goal of the Unidirectional Speech Translation (MMC-UST) Use Case is to translate speech segments expressed in a source language into a target language or to produce the textual version of the translated speech. If the desired output is speech, the user can specify whether their speech features (voice colour, emotional charge, etc.) should be preserved in the translated speech.

The flow of control is from Input Speech or Input Text to Translated Text, and then to Output Speech and Output Text. Depending on the value of Input Selector:

  1. Input Text in Language A is translated into Translated Text in Language B and pronounced as Speech in Language B.
  2. The Speech features (voice colour, emotional charge, etc.) in Language A are preserved in Language B.

2      Reference Architecture of Unidirectional Speech Translation

Figure 1 describes the input/output data, the AIMs and the data exchanged between AIMs.

Figure 1 – Reference Model of Unidirectional Speech Translation (UST)

3      I/O Data of Unidirectional Speech Translation

The input and output data of the Unidirectional Speech Translation Use Case are:

Table 1 – I/O Data of Unidirectional Speech Translation

Input Descriptions
Input Selector Determines whether:
1.     The input will be in Text or Speech
2.     The Input Speech features are preserved in the Output Speech.
Language Preferences User-specified input Language (A) and output Language (B).
Input Speech Speech produced in Language A by a human desiring translation into language B.
Input Text Alternative textual source information to be translated into and pron­ounced in language B depending on the value of Input Selector.
Output Descriptions
Translated Speech Input Speech translated into language B preserving the Input Speech features in the Output Speech, depending on the value of Input Selector.
Translated Text Text of Input Speech or Input Text translated into language B, depending on the value of Input Selector.

4      Functions of AI Modules of Unidirectional Speech Translation

Table 2 gives the functions of Unidirectional Speech Translation AIMs.

Table 2 – Functions of Unidirectional Speech Translation AI Modules

AIM Functions
Automatic Speech Recognition Recognises Speech
Text-to-Text Translation Translates Recognised Text
Input Speech Description Extracts Speech Features
Text-to-Speech (Features) Synthesises Translated Text adding Speech Features

5      I/O Data of AI Modules of Unidirectional Speech Translation

The AI Modules of Unidirectional Speech Translation are given in Table 3.

Table 3 – AI Modules of Unidirectional Speech Translation

AIM Receives Produces
Automatic Speech Recognition Input Speech Recognised Text
Text-to-Text Translation 1.     Input Text
2.     Recognised Text (Based on Input Selector)
Translated Text
Input Speech Description Input Speech Speaker-specific Speech Descriptors
Text-to-Speech (Features) 1.     Translated Text
2.    Speech Descriptors (Based on Input Selector)
Produces Output Speech.

6      Specification of Unidirectional Speech Translation AIW, AIMs, and JSON Metadata

Table 4 – AIMs and JSON Metadata

AIW and AIMs Name JSON
MMC-BST Unidirectional Speech Translation X
MMC-ASR Audio Scene Description X
MMC-TTT Text-to-Speech X
MMC-ISD Input Speech Description X
MMC-TTS Text-to-Speech X