CAE-USC V2.4 AIMs Neural Emotion Insertion

1. Functions	2. Reference Model	3. Input/Output Data
4. JSON Metadata	5. SubAIMs	6. Profiles
7. Reference Software	8. Conformance Testing	9. Performance Assessment

1 Functions

Neural Emotion Insertion (CAE-NEI)

Receives	Neural Speech Features from Emotion Feature Producer.
	Emotionless Speech.
Integrates	(Emotional) Neural Speech Features with those of the Emotionless Speech input.
Produces	Emotionally modified utterance Speech with Emotion.

2 Reference Model

Figure 1 depicts the Reference Model of Neural Emotion Insertion (CAE-NEI)

Figure 1 – Reference Model of Neural Emotion Insertion (CAE-NEI)

3 Input/Output Data

Table 1 provides the Input/Output Data of Neural Emotion Insertion (CAE-NEI)

Table 1 – Input/Output Data of Neural Emotion Insertion (CAE-NEI)

Input data	Semantics
Neural Speech Features	Speech Features of the Emotion Feature Producer.
Emotionless Speech	The speech without emotion to which emotion is added,
Output data	Semantics
Speech with Emotion	The Emotionless Speech to which emotion has been added

4 JSON Metadata

https://schemas.mpai.community/CAE1/V2.4/AIMs/NeuralEmotionInsertion.json

5 SubAIMs

No SubAIMs.

6 Profiles

No Profiles

7 Reference Software

No Reference Software.

8 Conformance Testing

Receives	Neural Speech Features
	Emotionless Speech	Shall validate against the Speech Object schema. The Qualifier shall validate against the Speech Qualifier schema. The values of any Sub-Type, Format, and Attribute of the Qualifier shall correspond with the Sub-Type, Format, and Attributes of the Speech Object Qualifier schema.
Produces	Speech with Emotion	Shall validate against the Speech Object schema. The Qualifier shall validate against the Speech Qualifier schema. The values of any Sub-Type, Format, and Attribute of the Qualifier shall correspond with the Sub-Type, Format, and Attributes of the Speech Object Qualifier schema.

9 Performance Assessment

Table 18 gives the Emotion Enhanced Speech (EES) Neural Emotion Insertion Means and how they are used.

Table 18 – AIM Means and use of Emotion Enhanced Speech (EES) Neural Emotion Insertion

Means	Actions
Conformance Testing Dataset	DS1: a dataset of at least y > N Emotionless Speech Segments. DS2: a dataset of y Emotion Lists. DS3: a dataset of one element, specifying the Language in question. DS4: a dataset of y Speech with Emotion Segments, where each is associated with specific elements of DS1, DS2, and DS3 used as input, and thus represents one correct output, given this input.
Procedure	Given a reference Speech Feature Analyser 2 (ID: sfa2), a reference Emotion Feature Producer (ID: efp) and an Emotion Inserter 2 module that we want to test, we measure the quality of Emotion Inserter 2 in relation to the reference modules as follows: Connect the three modules. Repeat many times: Select an input set comprised of a DS1 (Emotionless Speech segment), a DS2 (an Emotion List), and a DS3 (a Language). Feed that set to the system composed by the connected modules. Measure the quality of the Speech with Emotion output generated by the system by comparing it with the corresponding “correct” result in DS4 as measured by PESQ [6]. The quality of Emotion Inserter 2 is then the average value of the multiple quality measurements of 2c.
Evaluation	If the average value of the quality measurements is above a threshold above 2.0 as specified by PESQ, Emotion Inserter 2 has passed the Conformance Test. If the quality is below threshold, the submitter of Emotion Inserter 2 is given the opportunity to submit an implementation of Speech Feature Analyser 2 and Emotion Feature Producer. The MPAI Store will test the combination of the three submitted AIMs. If the quality of the output of the submitted combination is above threshold, Emotion Inserter 2 passes the Conformance Test as long as the corresponding Speech Feature Analyser 2 and Emotion Feature Producer are made available to the MPAI Store. Else, Emotion Inserter 2 doesn’t pass the Conformance Test.

Figure 8 – Neural Emotion Inserter.

After the Tests, Conformance Tester shall fill out Table 19.

Table 19 – Conformance Testing form of Emotion Enhanced Speech (EES) Neural Emotion Insertion

Conformance Tester ID

Unique Conformance Tester Identifier assigned by MPAI

Standard, Use Case ID and Version

Standard ID and Use Case ID, Version and Profile of the standard in the form “CAE-EES-V2.4”.

Name of AIM

Neural Emotion Insertion

Implementer ID

Unique Implementer Identifier assigned by MPAI Store.

AIM Implementation Version

Unique Implementation Identifier assigned by Implementer.

Neural Network Version*

Unique Neural Network Identifier assigned by Implementer.

Identifier of Conformance Testing Dataset

Unique Dataset Identifier assigned by MPAI Store.

Test ID

Unique Test Identifier assigned by Conformance Tester.

Actual output

The Conformance Tester will provide the following matrix related to the modules utilized for the tests. Denoting with i and j, and , the record number in DS1 and DS2 respectively, the matrices reflect the results obtained with a limited number of random multiple inputs and the corresponding outputs.

Example:

DS1	DS2	DS4	Emotion Inserter2 output value
DS1[i]	DS2[j]	DS4[i, j]	SpeechWithEmotion[i, j]

Language: DS3

Execution time*

Duration of test execution.

Test comment*

In case step 1 of Conformance Testing fails, the Conformance Tester shall request the implementer to provide a Speech Feature Analyser2 and Emotion Feature Producer AIMs.

In case step 4 or 5 of Conformance Testing also fails, the Conformance Tester shall inform the implementer that the Emotion Inserter2 did not pass the CT.

Test Date

yyyy/mm/dd.

* Optional field

Cookie	Duration	Description
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Technical".
CookieLawInfoConsent	1 year	The cookie is set by the GDPR Cookie Consent plug-in and is used to store whether the user has consented to the use of cookies or not. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_pk_id.6.08a8	13 months	Used to store a few details about the user such as the unique visitor ID
_pk_ses.6.08a8	30 minutes	Short lived cookies used to temporarily store data for the visit