Communicating Entities in Context (HMC-CEC) AI Workflow (AIW) specifies Functions, Reference Model, and Input and Output Data of the AIW, an the Functions and the Input and Output Data of its AI Modules (AIM). Each Input and Output Data of the HMC-CEC AIW and its AIMs is linked to its online specification.
1 Functions
2 Reference Model
3 Input/Output Data
4 Functions of AI Modules
5 Input/Output Data of AI Modules
6 AIW, AIMs, and JSON Metadata
1. Functions
The Communicating Entities in Context (HMC-CEC) AI Workflow enables communication of Machines with Entities in different Contexts where:
- The term Entity refers to one of:
- A human in a real audio-visual scene.
- A human in a real scene represented as a Digitised Human in an Audio Visual Scene.
- A Machine represented as a Virtual Human in an Audio-Visual Scene.
- A Digital Human in an Audio-Visual Scene rendered as a real audio-visual scene.
- Context is information describing attributes of an Entity, such as language, culture etc.
- Digital Human is either a Digitised or a Virtual Human.
An Entity communicates with another Entity in one of the following ways:
- A human:
- Uses their body, speech, Context, and the audio-visual scene they are immersed in.
- Uses an HMC-CEC-enabled Machine emitting Communication Items represented as Portable Avatars.
- A Machine:
- Renders itself as a speaking humanoid in an audiovisual scene.
- Emits Communication Items.
HMC-CEC assumes that:
- Input Audio is an Audio Object.
- Input Visual is a Visual Object.
- Output Audio and Output Visual convey audio and visual information rendered by the Audio-Visual Rendering AIM.
- The real space is digitally represented as an Audio-Visual Scene that includes the communicating human and may include other humans and generic objects.
- The Virtual Space contains a Digital Humans and/or its Audio components and may include may include other Digital Humans and generic Objects in an Audio-Visual Scene.
- The Machine can:
- Understand the semantics of the communicated information at different layers of depth.
- Produce a multimodal response expected to be congruent with the received information.
- Render itself and the produced information as a speaking humanoid in an Audio-Visual Scene.
- Convert the semantics of the information produced by an Entity in a Context to a form that is compatible with the Context of another Entity.
An AI Module is specified only by its Functions and Interfaces. Implementers are free to use their preferred technologies to achieve the Functions providing the features while respecting the constraints of the Interfaces.
Usage Scenarios offers a collection of example applications enabled by HMC-CEC.
2 Reference Model
Figure 1 depicts the Reference Model of the Communicating Entities in Context (HMC-CEC) Use Case implemented as an AI Workflow (AIW) that includes AI Modules (AIM) per Technical Specification: AI Framework (MPAI-AIF). Three out of six AIMs in Figure 1 (Audio-Visual Scene Description, Entity Context Understanding, and Personal Status Display) are Composite AIMs, i.e., they include interconnected AIMs. An introduction to MPAI-AIF is provided here.
Figure 1 – Human-Machine Communication AIW
Note that:
- Words beginning with a capital are defined in Definitions, Words beginning with a small letter have the commonly understood meaning.
- The Input Selector enables the Entity to inform the Machine through the Entity and Context Understanding AIM about use of Text vs. Speech in the communication, Language Preferences, and Selected Language in translation.
- The Machine captures the information emitted by the Entity and its Context through Input Text, Input Speech, Input Audio and Input Visual.
- The Input Portable Avatar is the Communication Item emitted by a communicating Machine.
- The Audio-Visual Scene Descriptors are digital representations of a real audio-visual scene or a Virtual Audio-Visual Scene produced either by the Audio-Visual Scene Description AIM or the Audio-Visual Scene Integration and Description AIM.
- To facilitate identification, AIMs are labelled with three letters indicating three letters of the Technical Specification specifying it followed by a hyphen “-” followed by three letters uniquely identifying the AIM defined by that Technical Specification. For instance, Portable Avatar Demultiplexing is indicated as PAF-PDX where PAF refers to Technical Specification: Portable Avatar Format (MPAI-PAF) and PDX refers to the Portable Avatar Demultiplexing AIM specified by MPAI-PAF.
3 Input/Output Data
Table 1 gives the Input/Output Data of the MPAI-HMC AIW.
Table 1 – Input/Output Data of the MPAI-HMC AIW
Input | Description |
Portable Avatar | A Communication Item emitted by the Entity communicating with the ego Entity. |
Input Selector
|
Selector containing data specifying the media and the language used in the communication. |
Input Text | Text Object generated by the communicating Entity as information additional to or in lieu of Speech Object. |
Input Audio | The audio scene captured by the Machine. |
Input Visual | The visual scene captured by the Machine. |
Output | Description |
Portable Avatar | The Communication Item produced by the Machine. |
Output Audio | The rendered audio corresponding to the Audio in the Communication Item. |
Output Visual | The rendered visual corresponding to the visual in the Communication Item. |
Output Text | The Text contained in a Communication Item or associated with Output Audio and Output Visual. |
4 Functions of AI Modules
Table 2 gives the functions of HMC-CEC AIMs.
Table 2 – Functions of AI Modules
AIM | Functions |
Audio-Visual Scene Integration and Description | Adds Avatar to Audio-Visual Scene in Portable Avatar providing Audio-Visual Scene Descriptors. |
Audio-Visual Scene Description | Provides Audio-Visual Scene Descriptors. |
Entity Context Understanding | Understands the information emitted by the Entity and its Context. |
Entity Dialogue Processing | Produces Text and Personal Status of Machine in response to inputs. |
Personal Status Display | Produces Portable Avatar. |
Audio-Visual Scene Rendering | Renders the content of the Portable Avatar. |
5 Input/Output Data of AI Modules
Table 3 gives the I/O Data of the AIMs of HMC-CEC. Note that an ID can either be specified as an Instance Identifier or refer to a generic identifier.
Table 3 – Input/Output Data of AI Modules
AIM | Receives | Produces |
Audio-Visual Scene Integration and Description | Input Portable Avatar | Audio-Visual Scene Descriptors |
Audio-Visual Scene Description | Input Audio Input Visual |
Audio-Visual Scene Descriptors |
Entity Context Understanding | Input Visual | Audio-Visual Scene Geometry Personal Status Entity ID Text Meaning Instance Identifier |
Entity Dialogue Processing | Audio-Visual Scene Geometry Personal Status Entity ID Text Meaning Instance Identifier |
Machine Personal Status Machine Avatar ID Machine Text |
Personal Status Display | Machine Personal Status Machine Avatar ID Machine Text |
Output Portable Avatar |
Audio-Visual Scene Rendering | Output Portable Avatar | Output Text Output Audio Input Visual |
6 AIW, AIMs, and JSON Metadata
Table 4 – AIW, AIMs, and JSON Metadata