Moving Picture, Audio and Data Coding
by Artificial Intelligence

1. Functions 2. Reference Model 3. Input/Output Data
4. JSON Metadata 5. SubAIMs 6. Profiles
7. Reference Software 8. Conformance Testing 9. Performance Assessment

1. Function

Speech Feature Analysis 2 (CAE-SF2):

Receives Emotionless Speech.
Extracts Emotionless Speech Features from Emotionless Speech.
Produces Prosodic Speech Features.

2. Reference Model

Figure 1 depicts the Speech Feature Analysis2 (CAE-SF2) AIM:

Figure 1 – Speech Feature Analysis 2 (CAE-SF2) AIM

3. Input/Output Data

Table 1 gives the Input/Output Data of the Speech Feature Analysis 2 (CAE-SF2) AIM.

Table 1 – Input/Output Data of the Speech Feature Analysis 2 (CAE-SF2) AIM

Input data Semantics
Emotionless Speech Utterance provided as a model.
Output data Semantics
Emotionless Speech Features Descriptors of the Soeech without Emotion..

4     JSON Metadata

https://schemas.mpai.community/CAE1/V2.4/AIMs/SpeechFeatureAnalysis2.json

5     SubAIMs

No SubAIMs.

6     Profiles

No Profiles

7. Reference Software

Reference Software not available.

8. Conformance Testing

Receives Emotionless Speech Shall validate against the Audio Object schema.
The Qualifier shall validate against the Audio Qualifier schema.
The values of any Sub-Type, Format, and Attribute of the Qualifier shall correspond with the Sub-Type, Format, and Attributes of the Audio Object Qualifier schema.
Produces Emotionless Speech Features Shall validate against the Speech Features Schema.

9     Performance Assessment

Table 12 gives the Emotion Enhanced Speech (EES) Speech Feature Analysis 2 Means (verification procedures) and how they are used.

Table 12Means and use of Emotion Enhanced Speech (EES) Speech Feature Analysis2 AIM

Means Actions
Conformance Testing Dataset DS1: a dataset of at least y > N Emotionless Speech Segments.

DS2: a dataset of y Emotion Lists.

DS3: a dataset of one element, specifying the Language in question.

DS4: a dataset of y Speech with Emotion Segments, where each is associated with specific elements of DS1, DS2, and DS3 used as input, and thus represents one correct output, given this input.

Procedure Given a reference Emotion Feature Producer (ID: efp), a reference Emotion Inserter 2 (ID: ei2) and a Speech Feature Analysis 2 module that we want to test, we measure the quality of Speech Feature Analysis 2 in relation to the reference modules as follows:

  1. Connect the three modules.
  2. Repeat many times:
    1. Select an input set comprised of a DS1 (Emotionless Speech segment), a DS2 (an Emotion List), and a DS3 (a Language).
    2. Feed that set to the system composed by the connected modules.
    3. Measure the quality of the Speech with Emotion output generated by the system by comparing it with the corresponding “correct” result in DS4 as measured with PESQ [6].
  3. The quality of Speech Feature Analyser 2 is then the average value of the multiple quality measurements of 2c.
Evaluation
  1. If the average value of the quality measurements is above a threshold greater than 2.0 as specified by PESQ, Speech Feature Analyser 2 has passed the Conformance Test.
  2. If the quality is below threshold, the submitter of Speech Feature Analyser 2 is given the opportunity to submit an implementation of Emotion Feature Producer and Emotion Inserter 2.
  3. The MPAI Store will test the combination of the three submitted AIMs.
  4. If the quality of the output of the submitted combination is above threshold, Speech Feature Analyser 2 passes the Conformance Test as long as the corresponding Emotion Feature Producer and Emotion Inserter 2 are made available to the MPAI Store.
  5. Else, Speech Feature Analysis 2 doesn’t pass the Conformance Test.

Figure 5 – EES path 2

Figure 6 – EES Speech Feature Analyser2.

After the Tests, Conformance Tester shall fill out Table 13.

Table 13Conformance Testing form of Emotion Enhanced Speech (EES) Speech Feature Analysis2 AIM

Conformance Tester ID Unique Conformance Tester Identifier assigned by MPAI
Standard, Use Case ID and Version Standard ID and Use Case ID, Version and Profile of the standard in the form “CAE:EES:1:0”.
Name of AIM Speech Feature Analyser2
Implementer ID Unique Implementer Identifier assigned by MPAI Store.
AIM Implementation Version Unique Implementation Identifier assigned by Implementer.
Neural Network Version* Unique Neural Network Identifier assigned by Implementer.
Identifier of Conformance Testing Dataset Unique Dataset Identifier assigned by MPAI Store.
Test ID Unique Test Identifier assigned by Conformance Tester.
Actual output The Conformance Tester will provide the following matrix related to the modules utilized for the tests. Denoting with i and j,  0≤i<x and 0≤j<y, the record number in DS1 and DS2 respectively, the matrices reflect the results obtained with a limited number of random  multiple inputs and the corresponding outputs.

Example:

DS1 DS2 DS4 Emotion Inserter2 output value
DS1[i] DS2[j] DS4[i, j] SpeechWithEmotion[i, j]

Language: DS3

Execution time* Duration of test execution.
Test comment* In case step 1 of Conformance Testing fails, the Conformance Tester shall request the implementer to provide an Emotion Feature Producer AIM (AIM2).

In case step 4 or 5 of Conformance Testing also fails, the Conformance Tester shall inform the implementer that the Speech Feature Analyser2 (AIM1) did not pass the CT.

Test Date yyyy/mm/dd.

* Optional field