1     Version

V1.0

2     Functions

Entity Context Understanding (HMC-ECU)

  1. Receives Audio-Visual Scene Descriptors.
  2. Performs
    • Demultiplexing of the input Audio-Visual Scene Descriptors
    • Recognition of Entity’s Speech.
    • Recognition of Audio Object and Visual Object.
    • Understanding of Entity’s Natural Language expressed as Text in the Context of the Audio and Visual Instance.
    • Extraction of Entity’s Personal Status.
    • Translation of Entity’s Text.
  3. Produces:
    • Audio-Visual Scene Geometry
    • Entity ID
    • Audio Instance ID
    • Visual Instance ID
    • Personal Status
    • Translated Text
    • Refined Text
    • Meaning.

3      Reference Architecture

Figure 19 depicts the Reference Architecture of the Entity Context Understanding Composite AIM.

Figure 19 – Entity Context Understanding

Note that Output Data in italic are passed directly from the homonymous Input Data.

4      I/O Data

Table 14 specifies the Input and Output Data of the of the Entity Context Understanding AIM.

Table 14 – I/O Data of the Entity Context Understanding Composite AIM

Input Description
Audio-Visual Scene Descriptors The digital representation of the spatial arrangement of the Audio, Visual, and Audio-Visual Objects of the Scene.
Output Description
Personal Status Personal Status of Entity having the Entity ID.
Translated Text Translated Text of Text Object or of Text conveyed by Speech Object.
Refined Text Refined Text of Speech Object.
Meaning Other name for Refined Text Descriptors.
Visual Scene Geometry As demultiplexed from Input Audio-Visual Scene Descriptors.
Visual Instance Identifier The Identifier of the specific Visual Object belonging to a level in the taxonomy.
Audio Scene Geometry As demultiplexed from Input Audio-Visual Scene Descriptors.
Audio Instance Identifier The Identifier of the specific Audio Object belonging to a level in the taxonomy.

5      SubAIMs

Entity Context Understanding
Audio-Visual Scene Demultiplexing
Automatic Speech Recognition
Visual Object Identification
Audio Object Identification
Natural Language Understanding
Personal Status Extraction
Text-to-Text Translation

6     JSON Metadata

https://schemas.mpai.community/HMC/V1.0/AIMs/EntityContextUnderstanding.json