The Audio Scene Description Composite AIM is specified in the following six sections.
1 Functions of Audio Scene Description
2 Reference Model of Audio Scene Description
3 I/O Data of Audio Scene Description
4 Functions of AI Modules of Audio Scene Description
5 I/O Data of AI Modules of Audio Scene Description
6 AIM and JSON Metadata Specification of Audio Scene Description
1 Functions of Audio Scene Description
Audio Scene Description (CAE-ASD):
- Receives the Audio Scene composed of:
- Microphone Array Geometry.
- Multichannel Audio, i.e., the output of the Microphone Array.
- Separates Audio Objects in the scene.
- Produces Audio Scene Descriptors.
2 Reference Model of Audio Scene Description
Figure 1 depicts the Reference Architecture of CAE-ASD.

Figure 1 – Reference Model of Audio Scene Description Composite AIM
3 I/O Data of Audio Scene Description
Table 1 gives the Input/Output data of Audio Scene Description.
Table 1 – I/O data of Audio Scene Description
| Input data | Comment |
| Microphone Array Geometry | The description of the spatial microphone arrangement. |
| Multichannel Audio | The Audio output of the Microphone Array. |
| Output data | Comments |
| Audio Scene Descriptors | The Descriptors of the Audio Scene. |
1.4 Functions of AI Modules of Audio Scene Description
Table 2 gives the list of the AIMs with their functions.
Table 2 – AI Modules of Audio Scene Description
| AIM | Function |
| Audio Analysis Transform |
|
| Audio Source Localisation |
|
| Audio Separation and Enhancement |
|
| Audio Synthesis Transform |
|
| Audio Descriptor Multiplexing |
|
1.5 I/O Data of AI Modules of Audio Scene Description
Table 3 – Audio Scene Description and their data
| AIM | Input Data | Output Data |
| Audio Analysis Transform | Multichannel Audio | Transform Multichannel Audio |
| Audio Source Localisation | Transform Multichannel Audio Microphone Array Geometry |
Audio Spatial Attitudes |
| Audio Separation and Enhancement | Audio Spatial Attitudes Transform Multichannel Audio Microphone Array Geometry |
Transform Enhanced Audio Audio Scene Geometry |
| Audio Synthesis Transform | Transform Enhanced Audio | Enhanced Audio |
| Audio Descriptor Multiplexing | Enhanced Audio Audio Scene Geometry Microphone Array Geometry |
Audio Scene Descriptors |
6 Specification of Audio Scene Description AIMs and JSON Metadata
Table 4 – AIM and JSON Metadata
| AIW | AIMs | Names | JSON | |
| CAE-ASD | Audio Scene Description | X | ||
| – | CAE-AAT | Audio Analysis Transform | X | |
| – | CAE-ASL | Audio Source Localisation | X | |
| – | CAE-ASE | Audio Separation and Enhancement | X | |
| – | CAE-AST | Audio Synthesis Transform | X | |
| – | CAE-AMX | Audio Descriptor Multiplexing | X | |