1 Functions
The functions of Entity and Context Understanding (HMC-ECU) allow a Machine to achieve understanding the information conveyed by an Entity and its Context in order to enable the Entity Dialogue Processing AIM to produce a pertinent communication.
Therefore, Entity and Context Understanding (HMC-ECC):
| Receives | Audio-Visual Scene Descriptors. |
| Separates | The components of the Audio-Visual Scene Descriptors. |
| Performs | Recognition of Speaker ID. |
| Recognition of Face ID. | |
| Recognition of Entity’s Speech. | |
| Recognition of Audio Object and Visual Object. | |
| Understanding of Entity’s Natural Language expressed as Text in the Context of Audio and/or Visual Instance. | |
| Extraction of the Entity’s Personal Status. | |
| Translation of the Entity’s Text. | |
| Produces | Entity ID |
| Personal Status | |
| Translated and Refined Text | |
| Meaning | |
| Audio Instance ID | |
| Visual Instance ID | |
| Audio-Visual Scene Descriptors (same as input) |
2 Reference Model
Figure 1 depicts the Reference Architecture of the Entity and Context Understanding AIM.

Figure 1 – The Entity and Context Understanding Composite AIM
3 I/O Data
Table 1 specifies the Input and Output Data of the of the Entity Context Understanding AIM.
Table 1 – I/O Data of the Entity Context Understanding Composite AIM
| Input | Description |
| Audio-Visual Scene Descriptors | The digital representation of the Audio, Visual, and Audio-Visual Objects of the Scene and their spatial arrangement . |
| Output | Description |
| Entity ID | |
| Personal Status | Personal Status of Entity having the Entity ID. |
| Translated Text | Translated Text of Text Object or of Text conveyed by Speech Object. |
| Refined Text | Refined Text of Speech Object. |
| Meaning | Other name for Refined Text Descriptors. |
| Visual Instance ID | The Identifier of the specific Visual Object belonging to a level in the taxonomy. |
| Audio-Visual Scene Descriptors | As in Input |
| Audio Instance ID | The Identifier of the specific Audio Object belonging to a level in the taxonomy. |
4 SubAIMs
HMC-ECU is a Composite AIM having the Reference Model depicted in Figure 2

Figure 2 – The Entity and Context Understanding Composite AIM
Table 2 provides the list of AIMs – both Basic and Composite – included in the Entity and Context Understanding Composite AIM.
Table 2 – AIW, AIMs, and JSON Metadata
| AIMs | Name |
| HMC-ECU | Entity And Context Understanding |
| OSD-SDX | Audio-Visual Scene Demultiplexing |
| MMC-SIR | Speaker Identity Recognition |
| PAF-FIR | Face Identity Recognition |
| MMC-ASR | Automatic Speech Recognition |
| OSD-VOI | Visual Object Identification |
| CAE-AOI | Visual Object Identification |
| MMC-NLU | Natural Language Understanding |
| MMC-PSE | Personal Status Extraction |
| MMC-TTT | Text-to-Text Translation |
5 JSON Metadata
https://schemas.mpai.community/HMC/V1.1/AIMs/EntityAndContextUnderstanding.json
6 Profiles
The Profiles of Entity Context Understanding AIM are specified.