1 Function 2 Reference Model 3 Input/Output Data
4 SubAIMs 5 JSON Metadata 6 Profiles
7 Reference Software 8 Conformance Testing 9 Performance Assessment

1 Functions

  1. Receives Microphone Array Geometry, Enhanced Audio Objects, and Audio Scene Geometry.
  2. Produces Multichannel Audio Stream.

2 Reference Model

Figure 1 – Audio Descriptors Packaging

3 Input/Output Data

Input data Semantics
Microphone Array Geometry Information on the Microphone Array
Enhanced Audio Objects Enhanced-quality Audio Object
Audio Scene Geometry Arrangement of Audio Objects
Output data Semantics
Multichannel Audio Stream Output Audio

4 SubAIMs

No SubAIMs.

5 JSON Metadata

https://schemas.mpai.community/CAE1/V2.4/AIMs/AudioDescriptionPackaging.json

6     Profiles

No Profiles

7     Reference Software

The Audio Description Packaging Reference Software can be downloaded from the MPAI Git.

8     Conformance Testing

Receives Microphone Array Geometry Shall validate against Microphone Array schema.
Enhanced Audio Objects Shall validate against the Audio Object schema.
The Qualifier shall validate against the Audio Qualifier schema.
The values of any Sub-Type, Format, and Attribute of the Qualifier shall correspond with the Sub-Type, Format, and Attributes of the Audio Object Qualifier schema.
Audio Scene Geometry Shall validate against Audio Scene Geometry schema.
Produces Multichannel Audio Stream Shall validate against the Audio Object schema.
The Qualifier shall validate against the Audio Qualifier schema.
The values of any Sub-Type, Format, and Attribute of the Qualifier shall correspond with the Sub-Type, Format, and Attributes of the Audio Object Qualifier schema.

9     Performance Assessment

Table 64 gives the Enhanced Audioconference Experience (CAE-EAE) Audio Description Packaging Means and how they are used.

Table 64AIM Means and use of Enhanced Audioconference Experience (CAE-EAE) Audio Description Packaging (CAE-ADP)

Means Actions
Performance Testing Dataset DS1: n Test files including data in Denoised Speech format.
DS2: n Microphone Array Geometry.
DS3: n Audio Scene Geometry associated with the Denoised Speech.
DS4: n Expected Output files including data in Multichannel Audio Stream format.
Procedure 1.     Feed the AIM under test with the Test files (DS1, DS2, DS3).
2.     Analyse the Multichannel Audio + Audio Scene Geometry with the Expected Output files (DS4).
Evaluation 1.     Check the Multichannel Audio + Audio Scene Geometry data format with the given Expected Output files format.
2.     Calculate the peak-to-peak Amplitude (A) of each Audio block in the Expected Output files.
3.     Calculate the RMSE of each Audio block by comparing the output (x) with the Expected Output files (y) Audio blocks.
4.     Accept the AIM under test if, for each audio block, these the two conditions are satisfied:
a.     Data format of the Multichannel Audio + Audio Scene Geometry is the same as the Expected Output Files and
b.     RMSE < A * 0.1%

Figure 25 – Audio Description Packaging Testing Flow

After the Assessment, Performance Assessor shall fill out Table 65.

Table 65 – Performance Assessment of Enhanced Audioconference Experience (CAE-EAE) Audio Description Packaging (CAE-ADP)

Performance Assessor ID Unique Performance Assessor Identifier assigned by MPAI
Standard, Use Case ID and Version Standard ID and Use Case ID, Version and Profile of the standard in the form “CAE:EAE:1:0”.
Name of AIM Packager
Implementer ID Unique Implementer Identifier assigned by MPAI Store.
AIM Implementation Version Unique Implementation Identifier assigned by Implementer.
Neural Network Version* Unique Neural Network Identifier assigned by Implementer.
Identifier of Performance Assessment Dataset Unique Dataset Identifier assigned by MPAI Store.
Test ID Unique Test Identifier assigned by Performance Assessor .
Actual output The Performance Assessor will provide the following matrix with a limited number of input records (n) with the corresponding outputs. If an input record fails, the tester would specify the reason why the test case fails.

Input data (DS1, DS2, DS3) Expected Output Data (DS4) Data Format RMSE
Denoised Speech ID1

Microphone Array Geometry ID1

Audio Scene Geometry ID1

Interleaved Multichannel Audio + Audio Scene Geometry ID1

 

T/F < A * 0.1%
Denoised Speech ID2

Microphone Array Geometry ID2

Audio Scene Geometry ID2

Interleaved Multichannel Audio + Audio Scene Geometry ID2

 

T/F < A * 0.1%
Denoised Speech ID3

Microphone Array Geometry ID3

Audio Scene Geometry ID3

Interleaved Multichannel Audio + Audio Scene Geometry ID3

 

T/F < A * 0.1%
Denoised Speech IDn

Microphone Array Geometry IDn

Audio Scene Geometry IDn

Interleaved Multichannel Audio + Audio Scene Geometry IDn

 

T/F < A * 0.1%

Final evaluation : T/F

Denoting with i, the record number in DS1, DS2, and DS3, the matrices reflect the results obtained with input records i with the corresponding outputs i.

DS1 DS2 DS3 DS4 Packager output value
(obtained through the AIM under test)
DS1[i] DS2[i] DS3[i] DS4[i] Packager[i]
Execution time* Duration of test execution.
Test comment* Comments on test results and possible needed actions.
Test Date yyyy/mm/dd.

* Optional field