1 Function | 2 Reference Model | 3 Input/Output Data |
4 SubAIMs | 5 JSON Metadata | 6 Profiles |
7 Reference Software | 8 Conformance Texting | 9 Performance Assessment |
1 Functions
Audio Separation and Enhancement (CAE-ASE):
Receives | Audio Object | in the Transform domain. |
Audio Spatial Attitudes | Spatial Attitudes of the input Audio Objects. | |
Separates | Audio Objects | by using their Spatial Attitudes. |
Produces | Enhanced Audio Object | in the Transform domain. |
Audio Scene Geometry | The Geometry of Audio Objects in the Scene. |
2 Reference Model
Figure 1 depicts the Reference Architecture of the Audio Separation and Enhancement AIM.
Figure 1 – Audio Separation and Enhancement AIM
3 Input/Output Data
Table 11 specifies the Input and Output Data of the Audio Separation and Enhancement AIM.
Table 1 – I/O Data of Audio Separation and Enhancement
Input | Description |
Audio Object | The result of the application of the Fast Fourier Transform to the Multichannel Audio. |
Audio Spatial Attitudes | The Spatial Attitudes of Audio Objects. |
Output | Description |
Enhanced Audio Object | Enhanced Multichannel Audio in the transform domain. |
Audio Scene Geometry | The spatial arrangement of the Audio Objects. |
4 SubAIMs
No SubAIMs.
5 JSON Metadata
https://schemas.mpai.community/CAE/V2.3/AIMs/AudioSeparationAndEnhancement.json
6 Profiles
No Profiles.
7 Reference Software
8 Conformance Testing
The following steps shall be followed when testing the Conformance of a CAE-ASE AIM instance.
- Use the following datasets:
- DS1: n Test files containing Audio Objects (Transform).
- DS4: n Test files containing the Spatial Attitudes of Audio Objects.
- DS2: n Expected Enhanced Audio Objects (Transform) Files.
- DS3: n Expected Audio Scene Geometries.
- Feed the AIM under test with the Test Audio Objects (Transform) and Spatial Attitudes.
- Analyse the Audio Scene Geometry and Enhanced Audio (Transform).
- Control the Audio Scene Geometry with the Expected Audio Scene Geometry:
- Count the number of Audio Objects in the Audio Scene Geometry.
- Calculate the angle difference (AD) in degrees between the Audio Objects (u) in the Audio Scene Geometry and the Audio Objects (v) in the Expected Audio Scene Geometry.
- Compare the number of Audio Blocks in the Expected Audio Objects with the number of Audio Blocks in the Audio Objects (Transform).
- Calculate Signal to Interference Ratio (SIR), Signal to Distortion Ratio (SDR), and Signal to Artefacts Ratio (SAR) between the Expected Audio Objects (Transform) and Output Audio Objects (Transform).
- Accept the CAE-ASE AIM under test if these four conditions are satisfied:
- The number of Audio Objects (Transform) in the Audio Scene Geometry is equal to the number of Audio Objects (Transform) in the Expected Audio Scene Geometry.
- The number of Audio Blocks in the Audio Objects (Transform) is equal to the number of Audio Blocks in the Expected Audio Objects (Transform).
- Compare each Audio Objects (Transform) in the Audio Scene Geometry with the Audio Objects (Transform) in the Expected Audio Scene Geometry.
- Each Audio Objects (Transform)’s AD between the Expected and Output is less than 5 degrees.
- Compare each Audio Objects (Transform) with the Audio Objects (Transform) in the Expected Audio Objects (Transform).
- If the room reverb time (T60) is greater than 0.5 seconds.
- Each object’s SIR between the Expected and Output is greater than or equal to 10 dB.
- Each object’s SDR between the Expected and Output is greater than or equal to 3 dB.
- Each object’s SAR between the Expected and Output is greater than or equal to 3 dB.
- If the room reverb time (T60) is less than 0.5 seconds.
- Each object’s SIR between the Expected and Output is greater than or equal to 15 dB.
- Each object’s SDR between the Expected and Output is greater than or equal to 6 dB.
- If the room reverb time (T60) is greater than 0.5 seconds.
- Control the Audio Scene Geometry with the Expected Audio Scene Geometry:
- The Conformance Tester will provide the following matrix containing a limited number of input records (n) with the corresponding outputs. If an input record fails, the tester would specify the reason why the test case fails.
Input data (DS1, DS4) | Expected Output Data (DS2, DS3) | Data Format | Audio Scene Geometry | Source Separation Metrics |
Spatial Attitude (ID1) Audio Object (Transform) ID1 |
Enhanced Audio Object (Transform) ID1 Audio Scene Geometry ID1 |
T/F | T/F | T/F |
Spatial Attitude (ID2) Audio Object (Transform) ID2 |
Enhanced Audio Object (Transform) ID2 Audio Scene Geometry ID2 |
T/F | T/F | T/F |
Spatial Attitude (ID3) Audio Object (Transform) ID3 |
Enhanced Audio Object (Transform ID3 Audio Scene Geometry ID3 |
T/F | T/F | T/F |
… | … | … | … | … |
Spatial Attitude (IDn) Audio Object (Transform) IDn |
Enhanced Audio Object (Transform IDn Audio Scene Geometry IDn |
T/F | T/F | T/F |
- Final evaluation : T/F Denoting with i, the record number in DS1, DS2, and DS3, the matrices reflect the results obtained with input records i with the corresponding outputs i.
DS1 | DS2 | DS3 | Sound Field Description output value (obtained through the AIM under test) |
DS1[i] | DS2[i] | DS3[i] | SpeechDetectionandSeparation[i] |
Table 2 provides the Conformance Testing Method for the formats of the CAE-ASE AIM output.
Note: If a schema contains references to other schemas, conformance of data for the primary schema implies that any data referencing a secondary schema shall also validate against the relevant schema, if present and conform with the Qualifier, if present.
Table 2 – Conformance Testing Method for CAE-ASE AIM
Receives | Audio Object | Shall validate against Audio Object schema. Audio Data shall conform with Audio Qualifier. |
Audio Spatial Attitudes | Shall validate against Spatial Attitude schema. | |
Produces | Enhanced Audio Object | Shall validate against Audio Object schema. Audio Data shall conform with Audio Qualifier. |
Audio Scene Geometry | Shall validate against Audio Basic Scene Geometry schema. |