<-Go to AI Workflows Go to ToC Audio Recording Preservation->
1 Functions | 2 Reference Model | 3 I/O data of AI Workflow |
4 Functions of AI Modules | 5 I/O Data of AI Modules | 6 AIW, AIMs, and JSON Metadata |
1 Functions
Speech carries information not only about its lexical content, but also about several other aspects including age, gender, identity, and emotional state of the speaker. Speech synthesis is evolving towards support of these aspects. In many use cases, emotional force can usefully be added to speech which by default would be neutral or emotionless, possibly with grades of a particular emotion. For instance, in a human-machine dialogue, messages conveyed by the machine can be more effective if they carry emotions appropriately related to the emotions detected in the human speaker.
Emotion-Enhanced Speech (EES):
- Enables a user to indicate a model utterance or an Emotion to obtain an emotionally charged version of a given utterance.
- Converts an individual emotionless speech segment to a segment that has a specified emotion. Both input and output speech segments are contained in files. The desired emotion is expressed either as a tag belonging to a standard list of emotions or derived by extracting features from a model utterance. EES produces an output speech segment with emotion.
CAE-EES implementations can be used to create virtual agents communicating as naturally as possible, and thus improve the quality of human-machine interaction by bringing it closer to human-human interchange.
2 Reference Model
The Emotion-Enhanced Speech Reference Model depicted in Figure 1 supports two Modes or pathways enabling addition of emotional charge to an emotionless or neutral input utterance (Emotion-less Speech).
Figure 1 – Emotion-Enhanced Speech Reference Model
3 I/O data of AI Workflow
Table 1 gives the input and output data of Emotion-Enhanced Speech.
Table 1 – I/O data of Emotion-Enhanced Speech
Input data | Comments |
Model Selector | Selects pathway 1 operation. |
Model Utterance | An Audio Segment used as a model or demonstration of the Emotion to be added to Emotionless Speech in order to produce Speech with Emotion. |
Emotion List | A set of Emotion values. |
Emotionless Speech | A File containing Speech without music and other sounds, and in which little or no identifiable emotion is perceptible by native listeners. |
Mode Selector | Selects pathway 2 operation. |
Emotion List | A set of Emotion values. |
Language Selector | Selects the language of the Enhanced Speech. |
Output data | Comments |
Speech with Emotion | A File containing Speech with emotional features. |
4 Functions of AI Modules
The AI Modules perform the functions described in Table 2.
Table 2 – AI Modules of Emotion-Enhanced Speech
AIM | Function |
Speech Feature Analysis 1 | Extracts Neural Speech Features of a model emotional utterance and transfers them to the Prosodic Emotion Insertion AIM. |
Speech Feature Analysis 2 | Extracts Emotionless Speech Features of an emotionless input utterance, passing these to the Emotion Feature Production AIM. |
Emotion Feature Production | Receives the Emotionless Speech Features produced by Speech Feature Analysis2 plus a list of Emotions to be added. (If the Degree of an Emotion is not specified, the Medium value is used.) |
Prosodic Emotion Insertion | Integrates the (emotional) Prosodic Speech Features with those of the Emotionless Speech input, yielding and delivering an emotionally modified utterance. |
Neural Emotion Insertion | Integrates the (emotional) Neural Speech Features with those of the Emotionless Speech input, yielding and delivering an emotionally modified utterance. |
5 I/O Data of AI Modules
Table 3 – CAE-EES AIMs and their data
AIM | Input Data | Output Data |
Speech Feature Analysis 1 | Mode Selector Model Utterance |
Prosodic Speech Features |
Speech Feature Analysis 2 | Mode Selector Emotionless Speech |
Emotionless Speech Features |
Emotion Feature Production | Emotionless Speech Features Language Selector Emotion List |
Neural Speech Features |
Prosodic Emotion Insertion | Emotionless Speech Prosodic Speech Features |
Speech with Emotion |
Neural Emotion Insertion | Emotionless Speech Neural Speech Features |
Speech with Emotion |
6 AIW, AIMs, and JSON Metadata
Table 4 – AIW, AIMs, and JSON Metadata
AIW | AIMs | Name | JSON |
CAE-EES | Emotion Enhanced Speech | File | |
CAE-SF1 | Speech Feature Analysis 1 | File | |
CAE-SF2 | Speech Feature Analysis 2 | File | |
CAE-EFP | Emotion Feature Production | File | |
CAE-PEI | Prosodic Emotion Insertion | File | |
CAE-NEI | Neural Emotion Insertion | File |
<-Go to AI Workflows Go to ToC Audio Recording Preservation->