<-Go to AI Workflows       Go to ToC       Human-CAV Interaction->

1     Functions 2     Reference Model 3     I/O Data.
4     Functions of AI Modules 5     I/O Data of AI Modules 6     AIW, AIMs, and JSON Metadata 
7     Reference Software 8     Conformance Texting 9     Performance Assessment

1      Functions

In the Conversation with Emotion (MMC-CWE) Use Case, a machine responds to a human’s textual and/or vocal utterance in a manner consistent with the human’s utterance and emotional state, as detected from the human’s text, speech, or face. The machine responds using text, synthetic speech, and a face whose lip movements are synchronised with the synthetic speech and the synthetic machine emotion.

2      Reference Model

Figure 1 gives the Reference Model of Conversation With Emotion including the input/output data, the AIMs, the AIM topology, and the data exchanged between and among the AIMs.

Figure 1 – Reference Model of Conversation With Emotion

The operation of Conversation with Emotion develops as follows:

  1. Automatic Speech Recognition produces Recognised Text
  2. Input Speech Description and PS-Face Interpretation produce Emotion (Speech).
  3. Input Face Description and PS-Face Interpretation produce Emotion (Face).
  4. Natural Language Understanding refines Recognised Text and produces Meaning.
  5. Input Text Description and PS-Text Interpretation produce Emotion (Text).
  6. Multimodal Emotion Fusion AIM fuses all Emotions into the Fused Emotion.
  7. The Entity Dialogue Processing AIM produces a reply based on the Fused Emotion and Meaning.
  8. The Text-To-Speech (Emotion) AIM produces Output Speech from Text with Emotion.
  9. The Lips Animation AIM animates the lips of a Face drawn from the Video of Faces KB consistently with the Output Speech and the Output Emotion.

3      I/O Data

The input and output data of the Conversation with Emotion Use Case are:

Table 1 – I/O Data of Conversation with Emotion 

Input Descriptions
Input Selector Data determining the use of Speech vs Text.
Text Object Text typed by the human as additional information stream or as a replacement of the speech depending on the value of Input Selector.
Speech Object Speech of the human having a conversation with the machine.
Face Object Visual information of the Face of the human having a conversation with the machine.
Output Descriptions
Text Object Text of the Speech produced by the Machine.
Speech Object Synthetic Speech produced by the Machine.
Face Object Video of a Face whose lip movements are synchronised with the Output Speech and the synthetic machine emotion.

4      Functions of AI Modules

Table 2 provides the functions of the Conversation with Emotion AIMs.

Table 2 – Functions of AI Modules of Conversation with Emotion

AIM Function
Automatic Speech Recognition 1.     Receives Speech Object.
2.     Produces Recognised Text.
Entity Speech Description 1.     Receives Speech Object.
2.     Produces Speech Descriptors
Entity Face Description 1.     Receives Face Object.
2.     Extracts Face Descriptors.
Natural Language Understanding 1.     Receives Input Selector, Text Object, Recognised Text.
2.     Produces Meaning (i.e., Text Descriptors), Refined Text.
PS-Text Interpretation 1.     Receives Text Descriptors.
2.     Provides the Emotion of the Text.
PS-Speech Interpretation 1.     Receives Speech Descriptors.
2.     Provides the Emotion of the Speech.
PS-Face Interpretation 1.     Receives Face Descriptors.
2.     Provides the Emotion of the Face.
Multimodal Emotion Fusion 1.     Receives Emotion (Text), Emotion (Speech), Emotion (Face).
2.     Provides human’s Input Emotion by fusing Emotion (Text), Emotion (Speech), and Emotion (Video).
Entity Dialogue Processing 1.     Receives Refined Text, Meaning, Input Emotion.
2.     Analyses Meaning and Input Text or Refined Text, depending on the value of Input Selector.
3.     Produces Machine Emotion and Machine Text.
Text-to-Speech 1.     Receives Machine Text and Machine Emotion.
2.     Produces Output Speech.
Video Lip Animation 1.     Receives Machine Speech and Machine Emotion.
2.     Animates the lips of a video obtained by querying the Video Faces KB, using the Output Emotion.
3.     Produces Face Object with synchronised Speech Object (Machine Object).

5      I/O Data of AI Modules

The AI Modules of Conversation with Emotion perform the Functions specified in Table 21.

Table 3 – AI Modules of Conversation with Emotion

AIM Receives Produces
Automatic Speech Recognition Speech Object Recognised Text
Entity Speech Description Speech Object Speech Descriptors
Entity Face Description Face Object Face Descriptors
Natural Language Understanding Recognised Text Refined Text
Text Descriptors
PS-Text Interpretation Text Descriptors Emotion  (Text)
PS-Speech Interpretation Speech Descriptors Emotion  (Speech)
PS-Face Interpretation Face Descriptors Emotion  (Face)
Multimodal Emotion Fusion Emotion (Text)
Emotion (Speech)
Emotion (Face)
Fused Emotion
Entity Dialogue Processing 1.    Text Descriptors
2.       Based on Input Selector
2.1.      Refined Text
2.2.      Input Text
3.       Input Emotion
1.     Machine Text
2.     Machine Emotion
Text-to-Speech 1.     Machine Text
2.     Machine Emotion
Output Speech.
Video Lip Animation 1.     Machine Emotion
2.     Machine Speech
Output Visual

6     AIW, AIMs, and JSON Metadata

Table 4 – AIMs and JSON Metadata

AIW AIMs Name JSON
MMC-CWE Conversation With Emotion X
MMC-ASR Automatic Speech Recognition X
MMC-EDP Entity Dialogue Processing X
MMC-ESD Entity Speech Description X
MMC-MEF Multimodal Emotion Fusion X
MMC-NLU Natural Language Understanding X
MMC-PFI PS-Face Interpretation X
MMC-PTI PS-Text Interpretation X
MMC-SPE Speech Personal Status Extraction X
MMC-TTS Text-to-Speech X
MMC-VLA Video Lip Animation X
PAF-EFD Entity Face Description X
PAF-FPE Face Personal Status Extraction X
PAF-PSI PS-Speech Interpretation X

7     Reference Software

8     Conformance Testing

Important note. MMC-CWE Conformance Testing Specification V1.0 does not provide methods and datasets to Test the Conformance of:

  1. Input Speech Description and PS-Speech Interpretation, but only of the Speech Personal Status Extraction (MMC-SPE) Composite AIM for Emotion.
  2. Input Face Description and PS-Face Interpretation, but only of the Face Personal Status Extraction Composite AIM for Emotion.

Table 5 gives the input/output data of the MMC-CWE AI Workflow.

Table 5 – I/O data of MMC-CWE

Input Data Data Type Input Conformance Testing Data
Input Selector Binary data All Input Selectors shall conform with Selector.
Text Object Unicode All input Text files to be drawn from Text files.
Speech Object .wav All input Speech files to be drawn from Speech files.
Face Object AVC All input Video files to be drawn from Video files.
Output Data Data Type Conformance Test
Machine Text Unicode All Text files produced shall conform with Text files.
Machine Speech .wav All Speech files produced shall conform with Speech files.
Machine Video AVC All Video files produced shall conform with Video files.

9     Performance Assessment

 

<-Go to AI Workflows       Go to ToC       Human-CAV Interaction->