1 Functions
The functions of Entity and Context Understanding (HMC-ECU) allow a Machine to achieve understanding the information conveyed by an Entity and its Context in order to enable the Entity Dialogue Processing AIM to produce a pertinent communication.
Therefore, Entity and Context Understanding (HMC-ECC):
Receives | Audio-Visual Scene Descriptors. |
Separates | The components of the Audio-Visual Scene Descriptors. |
Performs | Recognition of Speaker ID. |
Recognition of Face ID. | |
Recognition of Entity’s Speech. | |
Recognition of Audio Object and Visual Object. | |
Understanding of Entity’s Natural Language expressed as Text in the Context of Audio and/or Visual Instance. | |
Extraction of the Entity’s Personal Status. | |
Translation of the Entity’s Text. | |
Produces | Entity ID |
Personal Status | |
Translated and Refined Text | |
Meaning | |
Audio Instance ID | |
Visual Instance ID | |
Audio-Visual Scene Descriptors (same as input) |
2 Reference Model
Figure 1 depicts the Reference Architecture of the Entity and Context Understanding AIM.
Figure 1 – The Entity and Context Understanding Composite AIM
3 I/O Data
Table 1 specifies the Input and Output Data of the of the Entity Context Understanding AIM.
Table 1 – I/O Data of the Entity Context Understanding Composite AIM
Input | Description |
Audio-Visual Scene Descriptors | The digital representation of the Audio, Visual, and Audio-Visual Objects of the Scene and their spatial arrangement . |
Output | Description |
Entity ID | |
Personal Status | Personal Status of Entity having the Entity ID. |
Translated Text | Translated Text of Text Object or of Text conveyed by Speech Object. |
Refined Text | Refined Text of Speech Object. |
Meaning | Other name for Refined Text Descriptors. |
Visual Instance ID | The Identifier of the specific Visual Object belonging to a level in the taxonomy. |
Audio-Visual Scene Descriptors | As in Input |
Audio Instance ID | The Identifier of the specific Audio Object belonging to a level in the taxonomy. |
4 SubAIMs
HMC-ECU is a Composite AIM having the Reference Model depicted in Figure 2
Figure 2 – The Entity and Context Understanding Composite AIM
Table 2 provides the list of AIMs – both Basic and Composite – included in the Entity and Context Understanding Composite AIM.
Table 2 – AIW, AIMs, and JSON Metadata
AIMs | Name |
HMC-ECU | Entity And Context Understanding |
OSD-SDX | Audio-Visual Scene Demultiplexing |
MMC-SIR | Speaker Identity Recognition |
PAF-FIR | Face Identity Recognition |
MMC-ASR | Automatic Speech Recognition |
OSD-VOI | Visual Object Identification |
CAE-AOI | Visual Object Identification |
MMC-NLU | Natural Language Understanding |
MMC-PSE | Personal Status Extraction |
MMC-TTT | Text-to-Text Translation |
5 JSON Metadata
https://schemas.mpai.community/HMC/V1.1/AIMs/EntityAndContextUnderstanding.json
6 Profiles
The Profiles of Entity Context Understanding AIM are specified.