1 Version
V1.0
2 Functions
Entity Context Understanding (HMC-ECU)
- Receives Audio-Visual Scene Descriptors.
- Performs
- Demultiplexing of the input Audio-Visual Scene Descriptors
- Recognition of Entity’s Speech.
- Recognition of Audio Object and Visual Object.
- Understanding of Entity’s Natural Language expressed as Text in the Context of the Audio and Visual Instance.
- Extraction of Entity’s Personal Status.
- Translation of Entity’s Text.
- Produces:
- Audio-Visual Scene Geometry
- Entity ID
- Audio Instance ID
- Visual Instance ID
- Personal Status
- Translated Text
- Refined Text
- Meaning.
3 Reference Architecture
Figure 19 depicts the Reference Architecture of the Entity Context Understanding Composite AIM.
Figure 19 – Entity Context Understanding
Note that Output Data in italic are passed directly from the homonymous Input Data.
4 I/O Data
Table 14 specifies the Input and Output Data of the of the Entity Context Understanding AIM.
Table 14 – I/O Data of the Entity Context Understanding Composite AIM
Input | Description |
Audio-Visual Scene Descriptors | The digital representation of the spatial arrangement of the Audio, Visual, and Audio-Visual Objects of the Scene. |
Output | Description |
Personal Status | Personal Status of Entity having the Entity ID. |
Translated Text | Translated Text of Text Object or of Text conveyed by Speech Object. |
Refined Text | Refined Text of Speech Object. |
Meaning | Other name for Refined Text Descriptors. |
Visual Scene Geometry | As demultiplexed from Input Audio-Visual Scene Descriptors. |
Visual Instance Identifier | The Identifier of the specific Visual Object belonging to a level in the taxonomy. |
Audio Scene Geometry | As demultiplexed from Input Audio-Visual Scene Descriptors. |
Audio Instance Identifier | The Identifier of the specific Audio Object belonging to a level in the taxonomy. |
5 SubAIMs
6 JSON Metadata
https://schemas.mpai.community/HMC/V1.0/AIMs/EntityContextUnderstanding.json