1     Function 2     Reference Model 3     Input/Output Data
4     SubAIMs 5     JSON Metadata 6     Profiles
7     Reference Software 8     Conformance Texting 9     Performance Assessment

1     Functions

Audio Source Localisation (CAE-ASL):

Receives Audio Objects  With associated Microphone Array information.
Detects Audio Objects In the Audio Scene.
Determines Spatial Attitudes Of Audio Objects.
Produces Spatial Attitudes Of input Audio Objects.

2     Reference Model

Figure 1 depicts the Reference Architecture of the Audio Source Localisation (CAE-ASL) AIM.

Figure 1 – Audio Source Localisation (CAE-ASL) AIM

3    Input/Output Data

Table 1 specifies the Input and Output Data of the Audio Source Localisation (CAE-ASL) AIM.

Table 1 – Audio Source Localisation (CAE-ASL) AIM

Input Description
Audio Object The result of the application of the Fast Fourier Transform to the Multichannel Audio (with associated Microphone Array info).
Output Description
Audio Spatial Attitudes The Orientations and Directions of Audio Objects.

4     SubAIMs

No SubAIMs.

5     JSON Metadata

https://schemas.mpai.community/CAE1/V2.3/AIMs/AudioSourceLocalisation.json

6     Profiles

No Profiles.

7     Reference Software

8     Conformance Testing

The following procedure shall be followed when testing the Conformance of a CAE-ASL AIM instance.

  1. Use the following datasets:
    1. DS1: n Test files containing Audio Objects (Transform) .
    2. DS2: n Expected Spatial Attitudes.
  2. Feed the AIM under test with the Test files.
  3. Analyse the Spatial Attitudes produced by the CAE-ASL AIM instance.
  4. Calculate the angle difference (AD) in degrees between the output Spatial Attitudes with the Expected Spatial Attitudes.
Input data (DS1) Expected Output Data (DS2) Data Format RMSE
Audio Object (Microphone Array) ID1 Spatial Attitude ID1 T/F < A*0.1%
Audio Object (Microphone Array) ID2 Spatial Attitude  (Transform) ID2 T/F < A*0.1%
Audio Object (Microphone Array) ID3 Spatial Attitude  (Transform) ID3 T/F < A*0.1%
Audio Object (Microphone Array) IDn Spatial Attitude  (Transform) IDn T/F < A*0.1%
  1. Final evaluation: T/F Denoting with i, the record number in DS1 and DS2, the matrices reflect the results obtained with input records i with the corresponding outputs i.
DS1 DS2 Audio Object (Transform) output value (from AIM under test)
DS1[i] DS2[i] Audio Object (Transform)[i]

Table 2 provides the Conformance Testing Method for the formats of the CAE-ASL AIM output.

Note:  If a schema contains references to other schemas, conformance of data for the primary schema implies that any data referencing a secondary schema shall also validate against the relevant schema, if present and conform with the Qualifier, if present.

Table 2 – Conformance Testing Method for CAE-ASL AIM

Receives Audio Objects  Shall validate against Audio Object schema.
Audio Data shall conform with Audio Qualifier.
Produces Spatial Attitudes Shall validate against Spatial Attitude schema.

9     Performance Assessment