<-Go to AI Workflows       Go to ToC       Conversation with Emotion->

1     Functions 2     Reference Model 3     I/O Data
4     Functions of AI Modules 5     I/O Data of AI Modules 6     AIW, AIMx, and JSON Metadata
7     Reference Software 8     Conformance Texting 9     Performance Assessment

1      Functions of Conversation with Personal Status

When humans have a conversation with other humans, they use speech and, in constrained cases, text. Their interlocutors perceive speech and/or text supplemented by visual information related to the speaker’s face and gesture of a conversing human. Text, speech, face, and gesture may convey information about the internal state of the speaker that MPAI calls Personal Status. Therefore, handling of Personal Status information in a human-machine conversation and, in the future, even machine-machine conversation, is a key feature of a machine trying to understand what the speakers’ utterances mean because Personal Status recognition can improve understanding of the speaker’s utterance and help a machine produce better replies.

Conversation with Personal Status (MMC-CPS) is a general Use Case of an Entity – a real human or Digital Human – conversing with and asking questions to a machine. The machine captures and understands Text and Speech, extracts Personal Status from the Text, Speech, Face, and Gesture Factors, fuses the Factors’ Personal Statuses into an estimated Personal Status of the Entity to achieve a better understanding of the context in which the Entity utters Speech.

2      Reference Architecture of Conversation with Personal Status

Figure 1 gives the Conversation with Personal Status Reference Model including the input/output data, the AIMs, and the data exchanged between and among the AIMs.

The operation of the Conversation with Personal Status Use Case develops as follows:

  1. Input Selector is used to inform the machine whether the human employs Text or Speech in conversation with the machine.
  2. Visual Scene Description extracts the Scene Geometry, the Visual Objects and the Face and Body Descriptors of humans in the Scene.
  3. Audio Scene Description extracts the Scene Geometry, and the Speech Objects in the Scene.
  4. Visual Object Identification assigns an Identifier to each Visual Object indicated by a human.
  5. Audio-Visual Alignment uses the Audio Scene Description and Visual Scene Description to assign unique Identifiers to Audio, Visual, and Audio-Visual Objects.
  6. Automatic Speech Recognition recognises Speech utterances.
  7. Natural Language Understanding refines Text and extracts Meaning.
  8. Personal Status Extraction extracts a human’s Personal Status.
  9. Entity Dialogue Processing produces the machine’s response and its Personal Status.
  10. Personal Status Display produces a speaking Avatar expressing Personal Status.
  11. Audio-Visual Scene Rendering produces Machine Text, Speech, and Visual.

Figure 1 – Reference Model of Conversation with Personal Status

The operation of the Conversation with Personal Status Use Case develops as follows:

  1. Selector is used to inform the machine whether the human employs Text or Speech in conversation with the machine.
  2. Audio-Visual Scene Description extracts Audio Scene Geometry, Visual Scene Geometry, Audio Objects, Visual Objects, Face Descriptors  and Body Descriptors of human in the Scene.
  3. Visual Object Identification assigns an Identifier to each Visual Object indicated by a human.
  4. Audio-Visual Alignment uses the Audio Scene Descriptors and Visual Scene Descriptors to assign unique Identifiers to Audio, Visual, and Audio-Visual Objects.
  5. Automatic Speech Recognition recognises Speech utterances.
  6. Natural Language Understanding refines Text and extracts Meaning.
  7. Personal Status Extraction extracts the  human’s Personal Status.
  8. Entity Dialogue Processing produces the machine’s response as Text and Personal Status.
  9. Personal Status Display produces a speaking Portable Avatar expressing Personal Status.
  10. Audio-Visual Rendering produces Audio, Visual, and Text.

3      I/O Data of Conversation with Personal Status

Table 1 gives the input and output data of the Conversation with Personal Status Use Case:

Table 1 – I/O Data of Conversation with Personal Status

Input Descriptions
Input Text Text typed by the human as additional information stream or as a replacement of the Speech.
Input Speech Speech of the human having a conversation with the machine.
Input Visual Visual information of the Face and Body of the human having a conversation with the machine.
Media Selector Data determining the use of Speech vs Text.
Output Descriptions
Output Text Machine’s Text
Output Speech Machine’s Audio (Speech)
Output Visual Machine’s Visual

4      Functions of AI Modules of Conversation with Personal Status

Table 2 provides the functions of the Conversation with Personal Status Use Case.

Table 2 – Functions of AI Modules of Conversation with Personal Status

AIM Function
Visual Scene Description 1.     Receives Input Visual.
2.     Provides Visual Objects and Visual Scene Geometry.
Speech Scene Description 1.     Receives Input Speech.
2.     Provides Speech Object.
Visual Object Identification 1.     Receives Visual Scene Geometry, Body Descriptors, and Visual Objects.
2.     Provides Visual Object Instance IDs.
Automatic Speech Recognition 1.     Receives Speech Object.
2.     Extracts Recognised Text.
Natural Language Understanding 1.     Receives Recognised Text, Visual Object ID, and Visual Scene Geometry
2.     Refines Text and extracts Meaning.
Personal Status Extraction 1.     Receives Meaning, Refined Text, Body Descriptors, and Face Descriptors.
2.     Extracts Personal Status.
Entity Dialogue Processing 1.     Receives Refined Text, Personal Status, Visual Object ID, and Visual Scene Geometry.
2.     Produces Machine’s Text and Personal Status.
Personal Status Displays 1.     Receives Machine Text and Personal Status.
2.     Multiplexes Machine Text and Personal Status into Machine Portable Avatar.
Audio-Visual Scene Rendering 1. Receives Portable Avatar
2. Produces Machine Text, Machine Speech, and Machine Visual.

5      I/O Data of AI Modules of Conversation with Personal Status

Table 3 provides the I/O Data of the AI Modules of the Conversation with Personal Status Use Case.

Table 3 – I/O Data of AI Modules of Conversation with Personal Status

AIM Receives Produces
Visual Scene Description 1. Input Visual 1.  Face Descriptors
2.  Body Descriptors
3.  Audio-Visual Scene Descriptors
4.  Visual Objects
Speech Scene Description 1. Input Speech 1.  Speech Object
Visual Object Identification 1. Body Descriptors
2. Visual Scene Geometry
3. Visual Objects
1. Visual Object ID
Automatic Speech Recognition 1. Input Speech 1. Recognised Text
Natural Language Understanding 1. Visual Object ID
2. Input Speech
3. Recognised Text
4. Input Selector
1. Meaning
2. Refined Text
Personal Status Extraction 1. Face Descriptors
2. Body Descriptors
3. Meaning
4. Speech
1. Input Personal Status
Entity Dialogue Processing 1. Input Speech
2. Refined Speech
3. Input Personal Status
4. Input Selector
1. Machine Personal Status
2. Machine Speech
Personal Status Displays 1. Machine Speech 
2. Machine Personal Status
1. Machine Portable Avatar
Audio-Visual Scene Rendering 1. Machine Portable Avatar 1. Machine Text
2. Machine Speech
3. Machine Visual

6      JSON Metadata of Conversation with Personal Status

Table 4 provides the links to the AIW and AIM specifications and to the JSON syntaxes. AIMs/1 indicates that the column contains Composite AIMs and AIMs/2 indicates that the column contains their Basic AIMs.

Table 4 – Acronyms and URLs of JSON Metadata

AIW AIMs/1 AIMs/2 Name JSON
MMC-CPS Conversation With Personal Status X
OSD-AVS Audio-Visual Scene Description X
MMC-SSD Speech Scene Description X
OSD-VSD Visual Scene Description X
OSD-VOI Visual Object Identification X
OSD-VDI Visual Direction Identification X
OSD-VOE Visual Object Extraction X
OSD-VII Visual Instance Identification X
MMC-ASR Automatic Speech Recognition X
MMC-NLU Natural Language Understanding X
MMC-PSE Personal Status Extraction X
MMC-ETD Entity Text Description X
MMC-ESD Entity Speech Description X
PAF-EFD Entity Face Description X
PAF-EBD Entity Body Description X
MMC-PTI PS-Text Interpretation X
MMC-PSI PS-Speech Interpretation X
PAF-PFI PS-Face Interpretation X
PAF-PGI PS-Gesture Interpretation X
MMC-PMX Personal Status Multiplexing X
MMC-EDP Entity Dialogue Processing X
PAF-PSD Personal Status Display X
MMC-TTS Text-to-Speech X
PAF-EFD Entity Face Description X
PAF-EBD Entity Body Description X
PAF-PMX Portable Avatar Multiplexing X
OSD-AVR Audio-Visual Scene Rendering X

 

<-Go to AI Workflows       Go to ToC       Conversation with Emotion->