The MPAI-HMC AI Workflow:
- Receives a sequence of either:
- Audio-visual scenes that include the communicating Entity interpreted as Audio-Visual Scene Geometries and associated Audio-Visual Objects.
- Communication Items containing an Avatar representing the Machine communicating with the Entity and Context information as supported by the Portable Avatar Format.
- Understands the information emitted by the Entity considering its Context.
- Produces and emits multimodal responses to the communicating Entity either by generating a Communication Item or an Audio-Visual Scene both of which may include a representation of itself.
3 Reference Model
Figure 1 depicts the MPAI-HMC Reference Architecture.
Figure 1 – The Human and Machine Communication AIM
4 I/O Data
The Input and Output Data of the Human and Machine Communication AIW is specified in Table 1.
Table 1 – Input/Output Data of MPAI-HMC
|A Communication Item emitted by the communicating Entity.
|Selector containing data that determines:
1. Whether the communicating Entity uses Speech or Text as input.
2. Which language is used as input.
3. The target Language in translation.
|Text Object generated by the communicating Entity as information additional to or in lieu of Speech Object.
|The audio scene captured by the Machine.
|The visual scene captured by the Machine.
|The Communication Item produced by the Machine.
|The rendered audio corresponding to the Audio in the Communication Item.
|The rendered visual corresponding to the visual in the Communication Item.
|The Text in the Communication Item.
|AV Scene Integration and Description
|Audio-Visual Scene Description
|Entity Context Understanding
|Entity Dialogue Processing
|Personal Status Display
|Audio-Visual Scene Rendering