<- Use Case     Go to ToC      Data Types ->

The Entity and Context Understanding Composite AIM is specified in the following six sections.

1     Functions

2     Reference Model

3     Iinput/Output Data of AIW

4     Functions of AIMs

5     Input/Output Data of AIMs

6     AIW, AIMs and JSON Metadata

1      Functions

The functions of Entity and Context Understanding (HMC-ECU) allow a Machine to achieve understanding the information conveyed by an Entity and its Context in order to enable the Entity Dialogue Processing AIM to produce a pertinent communication.

Therefore, Entity and Context Understanding (HMC-ECC)::

  1. Receives the Audio-Visual Scene Descriptors.
  2. Separates the components of the Audio-Visual Scene Descriptors.
  3. Performs
    • Recognition of Entity’s Speech.
    • Recognition of Audio Object and Visual Object.
    • Understanding of Entity’s Natural Language expressed as Text in the Context of the Audio and Visual Instance.
    • Extraction of the Entity’s Personal Status.
    • Translation of the Entity’s Text.
  4. Produces:
    • Audio-Visual Scene Geometry
    • Entity ID
    • Audio Instance ID
    • Visual Instance ID
    • Personal Status
    • Translated and Refined Text
    • Meaning.

2      Reference Model of Entity Context Understanding

Figure 1 depicts the Reference Architecture of the Entity and Context Understanding Composite AIM.

Figure 1 – The Entity and Context Understanding Composite AIM

Note that Output Data in italic are passed directly from the homonymous Input Data.

3      I/O Data of Entity Context Understanding

Table 1 specifies the Input and Output Data of the of the Entity Context Understanding AIM.

Table 1 – I/O Data of the Entity Context Understanding Composite AIM

Input Description
Body Descriptors The Descriptors of the Body Objects of Entities in the Visual Scene.
Face Descriptors The Descriptors of the Face Objects of Entities in the Visual Scene.
Speech Object The digital representation of the speech emitted by the Entity.
Audio-Visual Scene Geometry The digital representation of the spatial arrangement of the Audio, Visual, and Audio-Visual Objects of the Scene.
Visual Objects The Visual Objects of the Scene.
Audio Objects The Audio Objects of the Scene.
Text Object Text of Entity with Entity ID.
Output Description
Personal Status Personal Status of Entity having the Entity ID.
Translated Text Translated Text of Text Object or of Text conveyed by Speech Object.
Refined Text Refined Text of Speech Object.
Meaning Other name for Refined Text Descriptors.
Visual Instance ID The Identifier of the specific Visual Object belonging to a level in the taxonomy.
Audio-Visual Scene Geometry As in Input
Audio Instance ID The Identifier of the specific Audio Object belonging to a level in the taxonomy.

4      Functions

Table 2 gives the functions of the AI Modules of the Avatar Videoconference Server AIW.

Table 2 – AI Modules of Avatar Videoconference Server AIW

AIM Functions
Audio-Visual Scene Demultiplexing Makes available Body and Face Descriptors, Speech and Text Object, Audio and Visual Scene Geometry, Audio and Visual Object to the AIMs of the Composite AIM.
Automatic Speech Recognition Produces Recognised Text
Visual Object Identification Identifies the Visual Object.
Audio Object Identification Identifies the Audio Object.
Natural Language Understanding Understand the text and speech information of the Entity
Personal Status Extraction Extracts the Personal Status of the Entity.
Text-to-Text Translation Translates the text to another language.

5      Input/Output Data

Table 3 gives the Input/Output Data of the AI Modules of the Avatar Videoconference Server AIW.

Table 3 – AI Modules of Avatar Videoconference Server AIW

AIM Input Output
Audio-Visual Scene Demultiplexing Audio-Visual Scene Descriptors Body Descriptors
Face Descriptors
Speech Object
Text ObjectAudio Scene Geometry
Visual Scene Geometry
Audio Objects
Visual Objects
Automatic Speech Recognition Speech Object Recognised Text
Visual Object Identification Body Descriptors
Visual Scene Geometry
Visual Objects
Visual Instance Identifier
Audio Object Identification Audio Scene Geometry
Audio Objects
Audio Instance Identifier
Natural Language Understanding Recognised Text
Visual Instance Identifier
Visual Scene Geometry
Visual Instance Identifier
Audio Instance Identifier
Audio Scene Geometry
Meaning
Refined Text
Visual Scene Geometry
Visual Instance Identifier
Audio Instance Identifier
Audio Scene Geometry
Personal Status Extraction Body Descriptors
Face Descriptors
Speech Object
Text Object
Meaning
Personal Status
Text-to-Text Translation Refined Text Translated Text

6      AIW, AIMs and JSON Metadata

Table 4 – AIW, AIMs, and JSON Metadata

AIW/AIMs Name JSON
HMC-ECU Entity Context Understanding X
OSD-SDX Audio-Visual Scene Demultiplexing X
MMC-ASR Automatic Speech Recognition X
OSD-VOI Visual Object Identification X
CAE-AOI Audio Object Identification X
MMC-NLU Natural Language Understanding X
MMC-PSE Personal Status Extraction X
MMC-TTT Text-to-Text Translation X

7     Profiles

The Profiles of Entity and Context Understanding Composite AIM are specified

<- Use Case     Go to ToC      Data Types ->