MPAI-MMC V2.2 AIWs Conversation with Personal Status

<-Go to AI Workflows Go to ToC Conversation with Emotion->

1 Functions	2 Reference Model	3 I/O Data
4 Functions of AI Modules	5 I/O Data of AI Modules	6 AIW, AIMx, and JSON Metadata
7 Reference Software	8 Conformance Texting	9 Performance Assessment

1 Functions of Conversation with Personal Status

When humans have a conversation with other humans, they use speech and, in constrained cases, text. Their interlocutors perceive speech and/or text supplemented by visual information related to the speaker’s face and gesture of a conversing human. Text, speech, face, and gesture may convey information about the internal state of the speaker that MPAI calls Personal Status. Therefore, handling of Personal Status information in a human-machine conversation and, in the future, even machine-machine conversation, is a key feature of a machine trying to understand what the speakers’ utterances mean because Personal Status recognition can improve understanding of the speaker’s utterance and help a machine produce better replies.

Conversation with Personal Status (MMC-CPS) is a general Use Case of an Entity – a real human or Digital Human – conversing with and asking questions to a machine. The machine captures and understands Text and Speech, extracts Personal Status from the Text, Speech, Face, and Gesture Factors, fuses the Factors’ Personal Statuses into an estimated Personal Status of the Entity to achieve a better understanding of the context in which the Entity utters Speech.

2 Reference Architecture of Conversation with Personal Status

Figure 1 gives the Conversation with Personal Status Reference Model including the input/output data, the AIMs, and the data exchanged between and among the AIMs.

The operation of the Conversation with Personal Status Use Case develops as follows:

Input Selector is used to inform the machine whether the human employs Text or Speech in conversation with the machine.
Visual Scene Description extracts the Scene Geometry, the Visual Objects and the Face and Body Descriptors of humans in the Scene.
Audio Scene Description extracts the Scene Geometry, and the Speech Objects in the Scene.
Visual Object Identification assigns an Identifier to each Visual Object indicated by a human.
Audio-Visual Alignment uses the Audio Scene Description and Visual Scene Description to assign unique Identifiers to Audio, Visual, and Audio-Visual Objects.
Automatic Speech Recognition recognises Speech utterances.
Natural Language Understanding refines Text and extracts Meaning.
Personal Status Extraction extracts a human’s Personal Status.
Entity Dialogue Processing produces the machine’s response and its Personal Status.
Personal Status Display produces a speaking Avatar expressing Personal Status.
Audio-Visual Scene Rendering produces Machine Text, Speech, and Visual.

Figure 1 – Reference Model of Conversation with Personal Status

The operation of the Conversation with Personal Status Use Case develops as follows:

Selector is used to inform the machine whether the human employs Text or Speech in conversation with the machine.
Audio-Visual Scene Description extracts Audio Scene Geometry, Visual Scene Geometry, Audio Objects, Visual Objects, Face Descriptors and Body Descriptors of human in the Scene.
Visual Object Identification assigns an Identifier to each Visual Object indicated by a human.
Audio-Visual Alignment uses the Audio Scene Descriptors and Visual Scene Descriptors to assign unique Identifiers to Audio, Visual, and Audio-Visual Objects.
Automatic Speech Recognition recognises Speech utterances.
Natural Language Understanding refines Text and extracts Meaning.
Personal Status Extraction extracts the human’s Personal Status.
Entity Dialogue Processing produces the machine’s response as Text and Personal Status.
Personal Status Display produces a speaking Portable Avatar expressing Personal Status.
Audio-Visual Rendering produces Audio, Visual, and Text.

3 I/O Data of Conversation with Personal Status

Table 1 gives the input and output data of the Conversation with Personal Status Use Case:

Table 1 – I/O Data of Conversation with Personal Status

Input	Descriptions
Input Text	Text typed by the human as additional information stream or as a replacement of the Speech.
Input Speech	Speech of the human having a conversation with the machine.
Input Visual	Visual information of the Face and Body of the human having a conversation with the machine.
Media Selector	Data determining the use of Speech vs Text.
Output	Descriptions
Output Text	Machine’s Text
Output Speech	Machine’s Audio (Speech)
Output Visual	Machine’s Visual

4 Functions of AI Modules of Conversation with Personal Status

Table 2 provides the functions of the Conversation with Personal Status Use Case.

Table 2 – Functions of AI Modules of Conversation with Personal Status

AIM	Function
Visual Scene Description	1. Receives Input Visual. 2. Provides Visual Objects and Visual Scene Geometry.
Speech Scene Description	1. Receives Input Speech. 2. Provides Speech Object.
Visual Object Identification	1. Receives Visual Scene Geometry, Body Descriptors, and Visual Objects. 2. Provides Visual Object Instance IDs.
Automatic Speech Recognition	1. Receives Speech Object. 2. Extracts Recognised Text.
Natural Language Understanding	1. Receives Recognised Text, Visual Object ID, and Visual Scene Geometry 2. Refines Text and extracts Meaning.
Personal Status Extraction	1. Receives Meaning, Refined Text, Body Descriptors, and Face Descriptors. 2. Extracts Personal Status.
Entity Dialogue Processing	1. Receives Refined Text, Personal Status, Visual Object ID, and Visual Scene Geometry. 2. Produces Machine’s Text and Personal Status.
Personal Status Displays	1. Receives Machine Text and Personal Status. 2. Multiplexes Machine Text and Personal Status into Machine Portable Avatar.
Audio-Visual Scene Rendering	1. Receives Portable Avatar 2. Produces Machine Text, Machine Speech, and Machine Visual.

5 I/O Data of AI Modules of Conversation with Personal Status

Table 3 provides the I/O Data of the AI Modules of the Conversation with Personal Status Use Case.

Table 3 – I/O Data of AI Modules of Conversation with Personal Status

AIM	Receives	Produces
Visual Scene Description	1. Input Visual	1. Face Descriptors 2. Body Descriptors 3. Audio-Visual Scene Descriptors 4. Visual Objects
Speech Scene Description	1. Input Speech	1. Speech Object
Visual Object Identification	1. Body Descriptors 2. Visual Scene Geometry 3. Visual Objects	1. Visual Object ID
Automatic Speech Recognition	1. Input Speech	1. Recognised Text
Natural Language Understanding	1. Visual Object ID 2. Input Speech 3. Recognised Text 4. Input Selector	1. Meaning 2. Refined Text
Personal Status Extraction	1. Face Descriptors 2. Body Descriptors 3. Meaning 4. Speech	1. Input Personal Status
Entity Dialogue Processing	1. Input Speech 2. Refined Speech 3. Input Personal Status 4. Input Selector	1. Machine Personal Status 2. Machine Speech
Personal Status Displays	1. Machine Speech 2. Machine Personal Status	1. Machine Portable Avatar
Audio-Visual Scene Rendering	1. Machine Portable Avatar	1. Machine Text 2. Machine Speech 3. Machine Visual

6 JSON Metadata of Conversation with Personal Status

Table 4 provides the links to the AIW and AIM specifications and to the JSON syntaxes. AIMs/1 indicates that the column contains Composite AIMs and AIMs/2 indicates that the column contains their Basic AIMs.

Table 4 – Acronyms and URLs of JSON Metadata

AIW	AIMs/1	AIMs/2	Name	JSON
MMC-CPS			Conversation With Personal Status	X
	OSD-AVS		Audio-Visual Scene Description	X
	MMC-SSD		Speech Scene Description	X
	OSD-VSD		Visual Scene Description	X
	OSD-VOI		Visual Object Identification	X
		OSD-VDI	Visual Direction Identification	X
		OSD-VOE	Visual Object Extraction	X
		OSD-VII	Visual Instance Identification	X
	MMC-ASR		Automatic Speech Recognition	X
	MMC-NLU		Natural Language Understanding	X
	MMC-PSE		Personal Status Extraction	X
		MMC-ETD	Entity Text Description	X
		MMC-ESD	Entity Speech Description	X
		PAF-EFD	Entity Face Description	X
		PAF-EBD	Entity Body Description	X
		MMC-PTI	PS-Text Interpretation	X
		MMC-PSI	PS-Speech Interpretation	X
		PAF-PFI	PS-Face Interpretation	X
		PAF-PGI	PS-Gesture Interpretation	X
		MMC-PMX	Personal Status Multiplexing	X
	MMC-EDP		Entity Dialogue Processing	X
	PAF-PSD		Personal Status Display	X
		MMC-TTS	Text-to-Speech	X
		PAF-EFD	Entity Face Description	X
		PAF-EBD	Entity Body Description	X
		PAF-PMX	Portable Avatar Multiplexing	X
	OSD-AVR		Audio-Visual Scene Rendering	X