CAE-USC V2.3 AIMs Audio Separation and Enhancement

1 Function	2 Reference Model	3 Input/Output Data
4 SubAIMs	5 JSON Metadata	6 Profiles
7 Reference Software	8 Conformance Texting	9 Performance Assessment

1 Functions

Audio Separation and Enhancement (CAE-ASE):

Receives	Audio Object	in the Transform domain.
	Audio Spatial Attitudes	Spatial Attitudes of the input Audio Objects.
Separates	Audio Objects	by using their Spatial Attitudes.
Produces	Enhanced Audio Object	in the Transform domain.
	Audio Scene Geometry	The Geometry of Audio Objects in the Scene.

2 Reference Model

Figure 1 depicts the Reference Architecture of the Audio Separation and Enhancement AIM.

Figure 1 – Audio Separation and Enhancement AIM

3 Input/Output Data

Table 11 specifies the Input and Output Data of the Audio Separation and Enhancement AIM.

Table 1 – I/O Data of Audio Separation and Enhancement

Input	Description
Audio Object	The result of the application of the Fast Fourier Transform to the Multichannel Audio.
Audio Spatial Attitudes	The Spatial Attitudes of Audio Objects.
Output	Description
Enhanced Audio Object	Enhanced Multichannel Audio in the transform domain.
Audio Scene Geometry	The spatial arrangement of the Audio Objects.

4 SubAIMs

No SubAIMs.

5 JSON Metadata

https://schemas.mpai.community/CAE/V2.3/AIMs/AudioSeparationAndEnhancement.json

6 Profiles

No Profiles.

7 Reference Software

8 Conformance Testing

The following steps shall be followed when testing the Conformance of a CAE-ASE AIM instance.

Use the following datasets:
1. DS1: n Test files containing Audio Objects (Transform).
2. DS4: n Test files containing the Spatial Attitudes of Audio Objects.
3. DS2: n Expected Enhanced Audio Objects (Transform) Files.
4. DS3: n Expected Audio Scene Geometries.
Feed the AIM under test with the Test Audio Objects (Transform) and Spatial Attitudes.
Analyse the Audio Scene Geometry and Enhanced Audio (Transform).
1. Control the Audio Scene Geometry with the Expected Audio Scene Geometry:
  1. Count the number of Audio Objects in the Audio Scene Geometry.
  2. Calculate the angle difference (AD) in degrees between the Audio Objects (u) in the Audio Scene Geometry and the Audio Objects (v) in the Expected Audio Scene Geometry.
2. Compare the number of Audio Blocks in the Expected Audio Objects with the number of Audio Blocks in the Audio Objects (Transform).
3. Calculate Signal to Interference Ratio (SIR), Signal to Distortion Ratio (SDR), and Signal to Artefacts Ratio (SAR) between the Expected Audio Objects (Transform) and Output Audio Objects (Transform).
4. Accept the CAE-ASE AIM under test if these four conditions are satisfied:
  1. The number of Audio Objects (Transform) in the Audio Scene Geometry is equal to the number of Audio Objects (Transform) in the Expected Audio Scene Geometry.
  2. The number of Audio Blocks in the Audio Objects (Transform) is equal to the number of Audio Blocks in the Expected Audio Objects (Transform).
  3. Compare each Audio Objects (Transform) in the Audio Scene Geometry with the Audio Objects (Transform) in the Expected Audio Scene Geometry.
    1. Each Audio Objects (Transform)’s AD between the Expected and Output is less than 5 degrees.
  4. Compare each Audio Objects (Transform) with the Audio Objects (Transform) in the Expected Audio Objects (Transform).
    1. If the room reverb time (T60) is greater than 0.5 seconds.
      1. Each object’s SIR between the Expected and Output is greater than or equal to 10 dB.
      2. Each object’s SDR between the Expected and Output is greater than or equal to 3 dB.
      3. Each object’s SAR between the Expected and Output is greater than or equal to 3 dB.
    2. If the room reverb time (T60) is less than 0.5 seconds.
      1. Each object’s SIR between the Expected and Output is greater than or equal to 15 dB.
      2. Each object’s SDR between the Expected and Output is greater than or equal to 6 dB.
The Conformance Tester will provide the following matrix containing a limited number of input records (n) with the corresponding outputs. If an input record fails, the tester would specify the reason why the test case fails.

Input data (DS1, DS4)	Expected Output Data (DS2, DS3)	Data Format	Audio Scene Geometry	Source Separation Metrics
Spatial Attitude (ID₁) Audio Object (Transform) ID₁	Enhanced Audio Object (Transform) ID₁Audio Scene Geometry ID₁	T/F	T/F	T/F
Spatial Attitude (ID₂) Audio Object (Transform) ID₂	Enhanced Audio Object (Transform) ID₂Audio Scene Geometry ID₂	T/F	T/F	T/F
Spatial Attitude (ID₃) Audio Object (Transform) ID₃	Enhanced Audio Object (Transform ID₃Audio Scene Geometry ID₃	T/F	T/F	T/F
…	…	…	…	…
Spatial Attitude (ID_n) Audio Object (Transform) ID_n	Enhanced Audio Object (Transform ID_nAudio Scene Geometry ID_n	T/F	T/F	T/F

Final evaluation : T/F Denoting with i, the record number in DS1, DS2, and DS3, the matrices reflect the results obtained with input records i with the corresponding outputs i.

DS1	DS2	DS3	Sound Field Description output value (obtained through the AIM under test)
DS1[i]	DS2[i]	DS3[i]	SpeechDetectionandSeparation[i]

Table 2 provides the Conformance Testing Method for the formats of the CAE-ASE AIM output.

Note: If a schema contains references to other schemas, conformance of data for the primary schema implies that any data referencing a secondary schema shall also validate against the relevant schema, if present and conform with the Qualifier, if present.

Table 2 – Conformance Testing Method for CAE-ASE AIM

Receives	Audio Object	Shall validate against Audio Object schema. Audio Data shall conform with Audio Qualifier.
	Audio Spatial Attitudes	Shall validate against Spatial Attitude schema.
Produces	Enhanced Audio Object	Shall validate against Audio Object schema. Audio Data shall conform with Audio Qualifier.
	Audio Scene Geometry	Shall validate against Audio Basic Scene Geometry schema.

Cookie	Duration	Description
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Technical".
CookieLawInfoConsent	1 year	The cookie is set by the GDPR Cookie Consent plug-in and is used to store whether the user has consented to the use of cookies or not. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_pk_id.6.08a8	13 months	Used to store a few details about the user such as the unique visitor ID
_pk_ses.6.08a8	30 minutes	Short lived cookies used to temporarily store data for the visit