CAE-USC V2.4 AIMs Prosodic Emotion Insertion

1. Functions	2. Reference Model	3. Input/Output Data
4. JSON Metadata	5. SubAIMs	6. Profiles
7. Reference Software	8. Conformance Testing	9. Performance Assessment

1 Functions

Prosodic Emotion Insertion (CAE-PEI)

Receives	Prosodic Speech Features
	Emotionless Speech
Integrates	(Emotional) Prosodic Speech Features with those of the Emotionless Speech input.
Produces	Emotionally modified utterance Speech with Emotion

2 Reference Model

Figure 1 depicts the Prosodic Emotion Insertion (CAE-PEI) AIM

Figure 1 – Reference Model of Prosodic Emotion Insertion (CAE-PEI) AIM

3 Input/Output Data

Table 1 provides the Input/Output Data of the Prosodic Emotion Insertion (CAE-PEI) AIM

Table 1 – Input/Output Data of the Prosodic Emotion Insertion (CAE-PEI) AIM

Input data	Semantics
Prosodic Speech Features	Speech Features from Speech Feature Analyser 1.
Emotionless Speech	The speech without emotion to which Emotion is added.
Output data	Semantics
Speech with Emotion	The Emotionless Speech to which emotion has been added

4 JSON Metadata

https://schemas.mpai.community/CAE1/V2.4/AIMs/ProsodicEmotionInsertion.json

5 SubAIMs

No SubAIMs.

6 Profiles

No Profiles

7 Reference Software

Reference Software not available.

8 Conformance Testing

Receives	Prosodic Speech Features	Shall validate against the Speech Features schema.
	Emotionless Speech	Shall validate against the Speech Object schema. The Qualifier shall validate against the Speech Qualifier schema. The values of any Sub-Type, Format, and Attribute of the Qualifier shall correspond with the Sub-Type, Format, and Attributes of the Speech Object Qualifier schema.
Produces	Speech with Emotion	Shall validate against the Speech Object schema.

9 Performance Assessment

Table 9 gives the Emotion Enhanced Speech (EES) Prosodic Emotion Insertion Means and how they are used.

Table 9 – AIM Means and use of Emotion Enhanced Speech (EES) Prosodic Emotion Insertion (AIM2 in Figure 3)

Means	Actions
Conformance Testing Dataset	DS1: a dataset of at least x > M Emotionless Speeches. DS2: a dataset of x Speech Features 1, each corresponding to a specific Emotionless Speech.
Procedure	For each of the x input pairs of DS1 and DS2: Feed the Emotion Inserter 1 under test with an Emotionless Speech and its corresponding array of Speech Features 1. Feed the reference Speech Feature Analyser 1 (ID: S) with the Speech with Emotion came as output from the Emotion Inserter 1 under test. Verify that the number of features in Speech Features 1 array coming as output from the reference Speech Feature Analyser 1 equals the corresponding one in DS2. For each feature of the output Speech Features 1 array, compute the delta (absolute difference) between: the pitch property and the corresponding DS2 data in Hz. the intensity property and the corresponding DS2 data in dB. the duration property and the corresponding DS2 data in ms. 5. Compute the Average of: The deltas of the pitch property. The deltas of the intensity property. The deltas of the duration property. Then, compute the Average for each of the three properties among the n Model Utterances.
Evaluation	1. Condition 3 shall be respected. 2. Given the three Averages computed at the end of the Procedure and denoting them with , where p represents one among the three properties (pitch, intensity and duration), if: the Neural Emotion Insertion module under test has passed the Conformance Test. 3. Otherwise, the submitter of Emotion Inserter 1 is given the opportunity to submit an implementation of Speech Feature Analyser 1. 4. The MPAI Store will test the combination of the two submitted AIMs. 5. If the quality of the output of the submitted combination of AIM1 and AIM2 is above threshold, Emotion Inserter 1 passes the Conformance Test as long as the corresponding Speech Feature Analyser 1 is made available to the MPAI Store. 6. Else, Emotion Inserter 1 does not pass the Conformance Test.

Figure 4 – EES Prosodic Emotion Insertion

After the Tests, Conformance Tester shall fill out Table 10.

Table 10 – Conformance Testing form of Emotion Enhanced Speech (EES) Prosodic Emotion Insertion

Conformance Tester ID

Unique Conformance Tester Identifier assigned by MPAI

Standard, Use Case ID and Version

Standard ID and Use Case ID, Version and Profile of the standard in the form “CAE:EES:V:P”.

Name of AIM

Prosodic Emotion Insertion

Implementer ID

Unique Implementer Identifier assigned by MPAI Store.

AIM Implementation Version

Unique Implementation Identifier assigned by Implementer.

Neural Network Version*

Unique Neural Network Identifier assigned by Implementer.

Identifier of Conformance Testing Dataset

Unique Dataset Identifier assigned by MPAI Store.

Test ID

Unique Test Identifier assigned by Conformance Tester.

Actual output

Actual output provided as a matrix of n+1 rows containing all computed Average values:

#	Pitch	Intensity	Duration
1
…	…	…	…
n
Averages

Result:

Threshold: m

Final evaluation: Passed / Not passed

Execution time*

Duration of test execution.

Test comment*

Test Date

yyyy/mm/dd.

* Optional field

Cookie	Duration	Description
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Technical".
CookieLawInfoConsent	1 year	The cookie is set by the GDPR Cookie Consent plug-in and is used to store whether the user has consented to the use of cookies or not. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_pk_id.6.08a8	13 months	Used to store a few details about the user such as the unique visitor ID
_pk_ses.6.08a8	30 minutes	Short lived cookies used to temporarily store data for the visit