1     Version

V2.1

2    Functions

One-to-Many Speech Translation (MMC-MST) enables one human speaking their language to broadcast text or speech to two or more audience members, each reading or listening and responding in a different language. It the desired output is speech, humans can specify whether their speech features should be preserved in the translated speech.

One-to-Many Speech Translation:

  1. Receives
    1. Input Selector – indicates whether
      1. Input is Text of Speech
      2. Speech Features of Input Speech should be preserved in Translated Speech.
    2. Requested Language – Language of Speech and Target Speech.
    3. Input Text – Text to be translated.
  2. Produces
    1. Translated Text1 or Speech1
    2. Translated Text2 or Speech2
    3. etc..

3      Reference Model

Figure 1 depicts the AIMs and the data exchanged between AIMs.

Figure 1 – Bidirectional Speech Translation (MMC-BST) AIW

4      I/O Data

The input and output data of the One-to-Many Speech Translation Use Case are given by Table 1:

Table 1 – I/O Data of One-to-Many Speech Translation

Input Descriptions
Input Selector Determines whether:
1.     The input will be in Text or Speech
2.     The Input Speech features are preserved in the Output Speech.
Language Preferences User-specified input language and translated languages
Input Speech Speech produced by human desiring translation and interpretation in a specified set of languages.
Input Text Alternative textual source information.
Output Descriptions
Translated Speech Speech translated into the Requested Languages.
Translated Text Text translated into the Requested Languages.

5     JSON Metadata

https://schemas.mpai.community/MMC/V2.1/AIWs/OneToManySpeechTranslation.json

 AIMs Name JSON
MMC-ASR Audio Scene Description X
MMC-TTT Text-to-Speech X
MMC-ISD Input Speech Description X
MMC-TTS Text-to-Speech X