1 Functions
Audio Scene Description (CAE-ASD):
- Receives the Audio Scene composed of:
- Microphone Array Geometry.
- Multichannel Audio, i.e., the output of the Microphone Array.
- Separates Audio Objects in the scene.
- Produces Audio Scene Descriptors containing:
2 Reference Model
Figure 8 depicts the Reference Model of CAE-ASD.
Figure 8 – Reference Model of Audio Scene Description Composite AIM
3 I/O Data of Composite AIM
Table 20 gives the Input/Output data of Audio Scene Description.
Table 20 – I/O data of Audio Scene Description
Input data | Comment |
Microphone Array Geometry | The description of the spatial microphone arrangement. |
Multichannel Audio | The Audio output of the Microphone Array. |
Output data | Comments |
Scene Descriptors | The Descriptors of the Audio Scene. |
4 Functions of AIMs
Table 21 gives the list of the AIMs with their functions. Note that Audio Analysis Transform and Audio Synthesis Transform are the same AIMs of the Enhanced Audioconference Experience Use Case.
Table 21 – AI Modules of Audio Scene Description
AIM | Function |
Audio Analysis Transform | Transforms the Microphone Array Audio into frequency bands via a Fast Fourier Transform (FFT). The following operations are carried out in discrete frequency bands. When such a configuration is used, a 50% overlap between subsequent audio blocks needs to be employed. The output is a data structure comprising complex valued audio samples in the frequency domain. |
Audio Source Localisation | Detects the Audio Objects in the Audio Scene with their Spatial Attitudes. It receives Transform Multichannel Audio, and Microphone Array Geometry. Its output is Spatial Attitudes of the Audio Objects. |
Audio Separation and Enhancement | Separates the Audio Objects by using their Spatial Attitudes. It receives Transform Multichannel Audio, Audio Object Spatial Attributes and Microphone Array Geometry. Its outputs are Transform Enhanced Audio and Audio Scene Geometry. |
Audio Synthesis Transform | Transforms the Transform Enhanced Source into time domain via an Inverse Fast Fourier Transform (IFFT). It receives Transform Enhanced Audio and outputs Enhanced Audio by applying the inverse of the Audio Analysis Transform. |
Audio Description Multiplexing | Receives Enhanced Audio, Microphone Array Geometry, and Audio Scene Geometry. It multiplexes the Enhanced Audio and the Audio Scene Geometry and then produces Audio Scene Descriptors. |
5 I/O Data of AIMs
Table 22 – Audio Scene Description and their data
AIM | Input Data | Output Data |
Audio Analysis Transform | Multichannel Audio | Transform Multichannel Audio |
Audio Source Localisation | Transform Multichannel Audio
Microphone Array Geometry |
Audio Spatial Attitudes |
Audio Separation and Enhancement | Audio Spatial Attitudes
Transform Multichannel Audio Microphone Array Geometry |
Transform Enhanced Audio
Audio Scene Geometry |
Audio Synthesis Transform | Transform Enhanced Audio | Enhanced Audio |
Audio Description Multiplexing | Enhanced Audio
Audio Scene Geometry Microphone Array Geometry |
Audio Scene Descriptors |
6 AIMs and JSON Metadata
Table 23 – AIM and JSON Metadata
AIW | AIMs | Names | JSON |
CAE-ASD | Audio Scene Description | File | |
CAE-AAT | Audio Analysis Transform | File | |
CAE-ASL | Audio Source Localisation | File | |
CAE-ASE | Audio Separation and Enhancement | File | |
CAE-AST | Audio Synthesis Transform | File | |
CAE-ADM | Audio Description Multiplexing | File |