1     Function 2     Reference Model 3     Input/Output Data
4     SubAIMs 5     JSON Metadata 6     Profiles
7     Reference Software 8     Conformance Testing 9     Performance Assessment

1     Functions

Audio Synthesis Transform (CAE-AST) receives an Enhanced Audio Object in the Transform domain, transforms the Audio Object back to the time domain and produces an Enhanced Audio Object with associated Microphone Array info:

Receives Enhanced Audio Object (Transform)  with associated Microphone Array info.
Transforms Enhanced Audio Object (Transform)  from the frequency domain to the time domain via an Inverse Fast Fourier Transform (IFFT).
Produces  Enhanced Audio Object  with associated Microphone Array info.

2     Reference Model

Figure 1 depicts the Reference Architecture of the Audio Synthesis Transform (CAE-AST) AIM.

Figure 1 – The Audio Synthesis Transform (CAE-AST) AIM

3    Input/Output Data

Table 1 specifies the Input and Output Data of the Audio Synthesis Transform (CAE-AST) AIM.

Table 1 – I/O Data of Synthesis Transform (CAE-AST) AIM

Input Description
Enhanced Audio Objects (time-frequency) Audio Objects in the time-frequency domain.
Output Description
Enhanced Audio Objects (time) Audio Objects in the time domain.

4     SubAIMs

No SubAIMs.

5     JSON Metadata

https://schemas.mpai.community/CAE1/V2.4/AIMs/AudioSynthesisTransform.json

6     Profiles

No Profiles.

7     Reference Software

The Audio Synthesis Transform Reference Software can be downloaded from the MPAI Git.

8     Conformance Testing

Receives Enhanced Audio Objects (time-frequency) Shall validate against the Audio Object schema.
The Qualifier shall validate against the Audio Qualifier schema.
The values of any Sub-Type, Format, and Attribute of the Qualifier shall correspond with the Sub-Type, Format, and Attributes of the Audio Object Qualifier schema.
Produces Enhanced Audio Objects (time) Shall validate against the Audio Object schema.
The Qualifier shall validate against the Audio Qualifier schema.
The values of any Sub-Type, Format, and Attribute of the Qualifier shall correspond with the Sub-Type, Format, and Attributes of the Audio Object Qualifier schema.

9     Performance Assessment

Table 61 gives the Enhanced Audioconference Experience (CAE-EAE) Synthesis Transform Means and how they are used.

Table 61AIM Means and use of Enhanced Audioconference Experience (CAE-EAE) Synthesis Transform

Means Actions
Performance Testing Dataset DS1: n Test files including data in Denoised Transform Speech format.

DS2: n Expected Output files including data in Denoised Speech format.

Procedure 1.     Feed the AIM under test with the Test files (DS1).

2.     Analyse the Denoised Speech with the Expected Output files (DS2).

Evaluation 1.     Check the Denoised Speech data format with the given Expected Output files format.

2.     Calculate the peak-to-peak Amplitude (A) of each Audio block in the Expected Output files.

3.     Calculate the RMSE of each Audio block by comparing the output (x) with the Expected Output files (y) Audio blocks.

4.     Accept the AIM under test if, for each audio block, these the two conditions are satisfied:

a.     Data format of the Denoised Speech is the same with the Expected Output Files and

b.     RMSE < A* 0.1%

Figure 24 – Synthesis Transform Testing Flow

After the Tests, Performance Assessor shall fill out Table 62Table 62

Table 62 – Performance Assessment form of Enhanced Audioconference Experience (CAE-EAE) Synthesis Transform

Performance Assessor ID Unique Performance Assessor Identifier assigned by MPAI
Standard, Use Case ID and Version Standard ID and Use Case ID, Version and Profile of the standard in the form “CAE:EAE:1:0”.
Name of AIM Synthesis Transform
Implementer ID Unique Implementer Identifier assigned by MPAI Store.
AIM Implementation Version Unique Implementation Identifier assigned by Implementer.
Neural Network Version* Unique Neural Network Identifier assigned by Implementer.
Identifier of Performance Testing Dataset Unique Dataset Identifier assigned by MPAI Store.
Test ID Unique Test Identifier assigned by Performance Assessor.
Actual output The Performance Assessor will provide the following matrix with a limited number of input records (n) with the corresponding outputs. If an input record fails, the tester would specify the reason why the test case fails.

Input data

(DS1)

Expected Output Data (DS2) Data Format RMSE
Denoised Transform Speech ID1 Denoised Speech ID1

 

T/F < A* 0.1%
Denoised Transform Speech ID2 Denoised Speech ID2

 

T/F < A* 0.1%
Denoised Transform Speech ID3 Denoised Speech ID3

 

T/F < A* 0.1%
Denoised Transfom Speech IDn Denoised Speech IDn

 

T/F < A* 0.1%

Final evaluation: T/F

Denoting with i, the record number in DS1 and DS2, the matrices reflect the results obtained with input records i with the corresponding outputs i.

DS1 DS2 Synthesis Transform output value
(obtained through the AIM under test)
DS1[i] DS2[i] SynthesisTransform[i]

 

Execution time* Duration of test execution.
Test comment* Comments on test results and possible needed actions.
Test Date yyyy/mm/dd.

* Optional field