1     Version


2     Functions

The MPAI-HMC AI Workflow:

  1. Receives a sequence of either:
    • Audio-visual scenes that include the communicating Entity interpreted as Audio-Visual Scene Geometries and associated Audio-Visual Objects.
    • Communication Items containing an Avatar representing the Machine communicating with the Entity and Context information as supported by the Portable Avatar Format.
  2. Understands the information emitted by the Entity considering its Context.
  3. Produces and emits multimodal responses to the communicating Entity either by generating a Communication Item or an Audio-Visual Scene both of which may include a representation of itself.

3      Reference Model

Figure 1 depicts the MPAI-HMC Reference Architecture.

Figure 1 – The Human and Machine Communication AIM

4      I/O Data

The Input and Output Data of the Human and Machine Communication AIW is specified in Table 1.

Table 1 – Input/Output Data of MPAI-HMC

Input Description
Portable Avatar A Communication Item emitted by the communicating Entity.
Input Selector


Selector containing data that determines:
1.     Whether the communicating Entity uses Speech or Text as input.
2.     Which language is used as input.
3.     The target Language in translation.
Input Text Text Object generated by the communicating Entity as information additional to or in lieu of Speech Object.
Input Audio The audio scene captured by the Machine.
Input Visual The visual scene captured by the Machine.
Output Description
Portable Avatar The Communication Item produced by the Machine.
Output Audio The rendered audio corresponding to the Audio in the Communication Item.
Output Visual The rendered visual corresponding to the visual in the Communication Item.
Output Text The Text in the Communication Item.

5      SubAIMs

AV Scene Integration and Description
Audio-Visual Scene Description
Entity Context Understanding
Entity Dialogue Processing
Personal Status Display
Audio-Visual Scene Rendering

6    JSON Metadata