1        Functions of AIW

2        Reference Model

3        I/O Data of AIW

4        Functions of AIMs

5        I/O Data of AIMs

6        AIMs and JSON Metadata

1       Functions

Audio Scene Description (CAE-ASD):

  1. Receives the Audio Scene composed of:
    • Microphone Array Geometry.
    • Multichannel Audio, i.e., the output of the Microphone Array.
  2. Separates Audio Objects in the scene.
  3. Produces Audio Scene Descriptors containing:

2       Reference Model

Figure 8 depicts the Reference Model of CAE-ASD.

Figure 8 – Reference Model of Audio Scene Description Composite AIM

3       I/O Data of Composite AIM

Table 20 gives the Input/Output data of Audio Scene Description.

Table 20 – I/O data of Audio Scene Description

Input data Comment
Microphone Array Geometry The description of the spatial microphone arrangement.
Multichannel Audio The Audio output of the Microphone Array.
Output data Comments
Scene Descriptors The Descriptors of the Audio Scene.

4       Functions of AIMs

Table 21 gives the list of the AIMs with their functions. Note that Audio Analysis Transform and Audio Synthesis Transform are the same AIMs of the Enhanced Audioconference Experience Use Case.

Table 21 – AI Modules of Audio Scene Description

AIM Function
Audio Analysis Transform Transforms the Microphone Array Audio into frequency bands via a Fast Fourier Transform (FFT). The following operations are carried out in discrete frequency bands. When such a configuration is used, a 50% overlap between subsequent audio blocks needs to be employed. The output is a data structure comprising complex valued audio samples in the frequency domain.
Audio Source Localisation Detects the Audio Objects in the Audio Scene with their Spatial Attitudes. It receives Transform Multichannel Audio, and Microphone Array Geometry. Its output is Spatial Attitudes of the Audio Objects.
Audio Separation and Enhancement Separates the Audio Objects by using their Spatial Attitudes. It receives Transform Multichannel Audio, Audio Object Spatial Attributes and Microphone Array Geometry. Its outputs are Transform Enhanced Audio and Audio Scene Geometry.
Audio Synthesis Transform Transforms the Transform Enhanced Source into time domain via an Inverse Fast Fourier Transform (IFFT). It receives Transform Enhanced Audio and outputs Enhanced Audio by applying the inverse of the Audio Analysis Transform.
Audio Description Multiplexing Receives Enhanced Audio, Microphone Array Geometry, and Audio Scene Geometry. It multiplexes the Enhanced Audio and the Audio Scene Geometry and then produces Audio Scene Descriptors.

5       I/O Data of AIMs

Table 22 – Audio Scene Description and their data

AIM Input Data Output Data
Audio Analysis Transform Multichannel Audio Transform Multichannel Audio
Audio Source Localisation Transform Multichannel Audio

Microphone Array Geometry

Audio Spatial Attitudes
Audio Separation and Enhancement Audio Spatial Attitudes

Transform Multichannel Audio

Microphone Array Geometry

Transform Enhanced Audio

Audio Scene Geometry

Audio Synthesis Transform Transform Enhanced Audio Enhanced Audio
Audio Description Multiplexing Enhanced Audio

Audio Scene Geometry

Microphone Array Geometry

Audio Scene Descriptors

6       AIMs and JSON Metadata

Table 23 – AIM and JSON Metadata

AIW AIMs Names JSON
CAE-ASD Audio Scene Description File
CAE-AAT Audio Analysis Transform File
CAE-ASL Audio Source Localisation File
CAE-ASE Audio Separation and Enhancement File
CAE-AST Audio Synthesis Transform File
CAE-ADM Audio Description Multiplexing File