The Text and Speech Translation Composite AIM is specified in the following six sections.

1     Functions of Text and Speech Translation

2     Reference Architecture of Text and Speech Translation

3     I/O Data of Text and Speech Translation

4     I/O Data of AI Modules of Text and Speech Translation

5     I/O Data of AI Modules of Text and Speech Translation

6     JSON Metadata of Text and Speech Translation and its AIMs

1      Functions of Text and Speech Translation

Text and Speech Translation (MMC-STT):

  1. Receives:
    • Input Selection determining whether the input is provided as text or speech. If the desired output is speech, the user can specify whether their speech features (voice colour, emotional charge, etc.) should be preserved in the translated speech.
    • Target language.
    • Input Text.
    • Input Speech.
  2. Produces Translated Text or Translated Speech Objects in the target language.

2      Reference Architecture of Text and Speech Translation

Figure 1 depicts the Reference Architecture of the Text-and-Speech Translation Composite AIM.

Figure 1 – The Text-and-Speech Translation Composite AIM

3      I/O Data of Text and Speech Translation

Table 1 provides the list of the I/O Data of the Text and Speech Translation Composite AIM.

Table 1 – I/O Data of Text and Speech Translation

Input Semantics
Input Selector Determines whether:
1.     The input will be in Text or Speech
2.     The Input Speech features are preserved in the Output Speech.
Language Preferences User-specified input Language (A) and output Language (B).
Input Speech Speech produced in Language A by a human desiring translation into language B.
Input Text Alternative textual source information to be translated into and pron­ounced in language B depending on the value of Input Selection.
Output Description
Translated Speech Input Speech in language A translated into language B preserving the Input Speech features in the Output Speech, depending on the value of Input Selec­tion.
Translated Text Text of Input Speech or Input Text translated into language B, depending on the value of Input Selection.

4      I/O Data of AI Modules of Text and Speech Translation

Table 2 gives the functions of Text-and-Speech Translation AIMs.

Table 2 – Functions of Text-and-Speech Translation AI Modules

AIM Functions
Automatic Speech Recognition Recognises Input Speech.
Text-to-Text Translation Translates Recognised Text into Translated Text.
Input Speech Description Extracts Speech Descriptors (a.k.a. Features) from Input Speech.
Text-to-Speech (Features) Synthesises Translated Text adding Speech Features

5      I/O Data of AI Modules of Text and Speech Translation

The AI Modules of Text-and-Speech Translation are given in Table 3.

Table 3 – AI Modules of Text-and-Speech Translation

AIM Receives Produces
Automatic Speech Recognition Input Speech Recognised Text
Text-to-Text Translation 1.     Input Text
2.     Recognised Text
(Based on Input Selector)
Translated Text
Input Speech Description Input Speech Speaker-specific Speech Descriptors
Text-to-Speech (Features) 1.     Translated Text
2.    Speech Descriptors
(Based on Input Selection)
Produces Output Speech.

6  Specification of Text and Speech Translation AIMs, and JSON Metadata

Table 4 – AIMs and JSON Metadata

AIM Name JSON
MMC-TST Text and Speech Translation X
–       MMC-ASR Automatic Speech Recognition X
–       MMC-TTT Text-to-Text Translation X
–       MMC-ISD Input Speech Description X
–       MMC-TTS Text-to-Speech X