Go To AI Modules

1 Function 2 Reference Model 3 Input/Output Data
4 SubAIMs 5 JSON Metadata 6 Profiles
7 Reference Software 8 Conformance Texting 9 Performance Assessment

1      Function

The function of Entity and Context Understanding (HMC-ECU) enables a Machine to understand the information conveyed by an Entity and its Context to enable the Entity Dialogue Processing AIM to produce a pertinent response (Communication Item).

Therefore, Entity and Context Understanding (HMC-ECC)::

  1. Receives the Audio-Visual Scene Descriptors.
  2. Separates the components of the Audio-Visual Scene Descriptors.
  3. Performs
    • Recognition of Entity’s Speech.
    • Recognition of Audio Object and Visual Object.
    • Understanding of Entity’s Natural Language expressed as Text in the Context of the Audio and Visual Instance.
    • Extraction of the Entity’s Personal Status.
    • Translation of the Entity’s Text.
  4. Produces:
    • Audio-Visual Scene Geometry
    • Entity ID
    • Audio Instance ID
    • Visual Instance ID
    • Personal Status
    • Translated and Refined Text
    • Meaning.

2      Reference Model

Entity and Context Understanding is an AIM whose Reference Model is depicted in Figure 2.

Figure 1 – The Entity and Context Understanding Composite AIM

3      Input and Output Data

Table 1 specifies the Input and Output Data of the of the Entity Context Understanding AIM.

Table 1 – I/O Data of the Entity Context Understanding Composite AIM

Input Description
Body Descriptors The Descriptors of the Body Objects of Entities in the Visual Scene.
Face Descriptors The Descriptors of the Face Objects of Entities in the Visual Scene.
Speech Object The digital representation of the speech emitted by the Entity.
Audio-Visual Scene Geometry The digital representation of the spatial arrangement of the Audio, Visual, and Audio-Visual Objects of the Scene.
Visual Objects The Visual Objects of the Scene.
Audio Object The Audio Objects of the Scene.
Text Object Text of Entity with Entity ID.
Output Description
Personal Status Personal Status of Entity having the Entity ID.
Translated Text Translated Text of Text Object or of Text conveyed by Speech Object.
Refined Text Refined Text of Speech Object.
Meaning Other name for Refined Text Descriptors.
Visual Instance ID The Identifier of the specific Visual Object belonging to a level in the taxonomy.
Audio-Visual Scene Geometry As in Input
Audio Instance ID The Identifier of the specific Audio Object belonging to a level in the taxonomy.

4      SubAIMs

Entity and Context Understanding is a Composite AIM whose Reference Model is depicted in Figure 2.

Figure 2 – The Entity and Context Understanding Composite AIM

Note that Output Data in italic are passed directly from the homonymous Input Data.

Table 2 – AIMs and JSON Metadata

AIM/1 AIM/2 AIM Name JSON
HMC-ECU Entity and Context Understanding X
OSD-SDX Audio-Visual Scene Demultiplexing X
MMC-ASR Automatic Speech Recognition X
OSD-VOI Visual Object Identification X
CAE-AOI Audio Object Identification X
MMC-NLU Natural Language Understanding X
MMC-PSE Personal Status Extraction X
MMC-ETD Entity Text Description X
MMC-ESD Entity Speech Description X
PAF-EFD Entity Face Description X
PAF-EBD Entity Body Description X
MMC-PTI PS-Text Interpretation X
MMC-PSI PS-Speech Interpretation X
PAF-PFI PS-Face Interpretation X
PAF-PGI PS-Gesture Interpretation X
MMC-PMX Personal Status Multiplexing X
MMC-TTT Text-to-Text Translation X

5      JSON Metadata

https://schemas.mpai.community/HMC/V2.0/AIMs/EntityAndContextUnderstanding.json

6      Profiles

Entity Context Understanding Profiles are defined.

7. Reference Software

8. Conformance Testing

Table 2 provides the Conformance Testing Method for the HMC-ECU AIM.

If a schema contains references to other schemas, conformance of data for the primary schema implies that any data referencing a secondary schema shall also validate against the relevant schema, if present and conform with the Qualifier, if present.

Table 2 – Conformance Testing Method for CAE-ECU AIM

Receives Body Descriptors Shall validate against Body Descriptors XML Schema.
Face Descriptors Shall validate against Face Descriptors Schema.
Speech Object Shall validate against Speech Object Schema.
Speech Data shall conform with Speech Qualifier.
Audio-Visual Scene Geometry Shall validate against Audio-Visual Scene Geometry Schema.
Visual Objects Shall validate against Visual Object Schema.
Visual Data shall conform with Visual Qualifier.
Audio Object Shall validate against Audio Object Schema.
Audio Data shall conform with Visual Qualifier.
Text Object Shall validate against Text Object Schema.
Text Data shall conform with Visual Qualifier.
Produces Personal Status Shall validate against Personal Status Schema.
Translated Text Shall validate against Text Object Schema.
Text Data shall conform with Visual Qualifier.
Refined Text Shall validate against Text Object Schema.
Text Data shall conform with Visual Qualifier.
Meaning Shall validate against Meaning schema
Visual Instance ID Shall validate against Instance ID schema.
Audio-Visual Scene Geometry Shall validate against Audio-Visual Scene Geometry Schema.
Audio Instance ID Shall validate against Instance ID schema.

9. Performance Assessment

Go To AI Modules