1     Version

V2.1

2    Functions

Bidirectional Speech Translation (MMC-UST) enables two humans each speaking a different language to have a conversation where each human speaks and hears their own language:

  1. Receives
    1. Input Selector – indicates whether
      1. Input is Text of Speech
      2. Speech Features of Input Speech should be preserved in Translated Speech.
    2. Requested Language – Language of Speech and Target Speech.
    3. Input Text1 – Text to be translated
    4. Input Speech1 – Speech to be translated
    5. Input Text2 – Text to be translated
    6. Input Speech2 – Speech to be translate
  2. Produces Translated Text1 or Speech1 an Translated Text2 or Speech2.

3      Reference Model

Figure 1 depicts the AIMs and the data exchanged between AIMs.

Figure 1 – Reference Model of Bidirectional Speech Translation (BST)

4      I/O Data

The input and output data of the Bidirectional Speech Translation Use Case are given by Table 1:

Table 1 – I/O Data of Bidirectional Speech Translation

Input Descriptions
Input Selector Determines whether the input will be Text or Speech.
Language Preferences User-specified input language and output languages
Input Speech1 Speech by human1 desiring spoken translation in the specified language.
Input Text1 Alternative Input Text to be translated to the specified language.
Input Speech2 Speech by human2 desiring spoken translation in the specified language.
Input Text2 Alternative Input Text to be translated to the specified language.
Output Descriptions
Output Speech1 Translated Speech of Speaker 1.
Output Text1 Text of the translated Speech of Speaker 1.
Output Speech2 Translated Speech of Speaker 2.
Output Text2 Text of the translated Speech of Speaker 2.

5     JSON Metadata

https://schemas.mpai.community/MMC/V2.1/AIWs/BidirectionalSpeechTranslation.json

 AIMs Name JSON
MMC-ASR Audio Scene Description X
MMC-TTT Text-to-Speech X
MMC-ISD Input Speech Description X
MMC-TTS Text-to-Speech X