CAE-USC V2.2 AIMs Audio Scene Description (CAE-ASD)

1 Functions of AIW

2 Reference Model

3 I/O Data of AIW

4 Functions of AIMs

5 I/O Data of AIMs

6 AIMs and JSON Metadata

1 Functions

Audio Scene Description (CAE-ASD):

Receives the Audio Scene composed of:
- Microphone Array Geometry.
- Multichannel Audio, i.e., the output of the Microphone Array.
Separates Audio Objects in the scene.
Produces Audio Scene Descriptors containing:

2 Reference Model

Figure 8 depicts the Reference Model of CAE-ASD.

Figure 8 – Reference Model of Audio Scene Description Composite AIM

3 I/O Data of Composite AIM

Table 20 gives the Input/Output data of Audio Scene Description.

Table 20 – I/O data of Audio Scene Description

Input data	Comment
Microphone Array Geometry	The description of the spatial microphone arrangement.
Multichannel Audio	The Audio output of the Microphone Array.
Output data	Comments
Scene Descriptors	The Descriptors of the Audio Scene.

4 Functions of AIMs

Table 21 gives the list of the AIMs with their functions. Note that Audio Analysis Transform and Audio Synthesis Transform are the same AIMs of the Enhanced Audioconference Experience Use Case.

Table 21 – AI Modules of Audio Scene Description

AIM	Function
Audio Analysis Transform	Transforms the Microphone Array Audio into frequency bands via a Fast Fourier Transform (FFT). The following operations are carried out in discrete frequency bands. When such a configuration is used, a 50% overlap between subsequent audio blocks needs to be employed. The output is a data structure comprising complex valued audio samples in the frequency domain.
Audio Source Localisation	Detects the Audio Objects in the Audio Scene with their Spatial Attitudes. It receives Transform Multichannel Audio, and Microphone Array Geometry. Its output is Spatial Attitudes of the Audio Objects.
Audio Separation and Enhancement	Separates the Audio Objects by using their Spatial Attitudes. It receives Transform Multichannel Audio, Audio Object Spatial Attributes and Microphone Array Geometry. Its outputs are Transform Enhanced Audio and Audio Scene Geometry.
Audio Synthesis Transform	Transforms the Transform Enhanced Source into time domain via an Inverse Fast Fourier Transform (IFFT). It receives Transform Enhanced Audio and outputs Enhanced Audio by applying the inverse of the Audio Analysis Transform.
Audio Description Multiplexing	Receives Enhanced Audio, Microphone Array Geometry, and Audio Scene Geometry. It multiplexes the Enhanced Audio and the Audio Scene Geometry and then produces Audio Scene Descriptors.

5 I/O Data of AIMs

Table 22 – Audio Scene Description and their data

AIM	Input Data	Output Data
Audio Analysis Transform	Multichannel Audio	Transform Multichannel Audio
Audio Source Localisation	Transform Multichannel Audio Microphone Array Geometry	Audio Spatial Attitudes
Audio Separation and Enhancement	Audio Spatial Attitudes Transform Multichannel Audio Microphone Array Geometry	Transform Enhanced Audio Audio Scene Geometry
Audio Synthesis Transform	Transform Enhanced Audio	Enhanced Audio
Audio Description Multiplexing	Enhanced Audio Audio Scene Geometry Microphone Array Geometry	Audio Scene Descriptors

6 AIMs and JSON Metadata

Table 23 – AIM and JSON Metadata

AIW	AIMs	Names	JSON
CAE-ASD		Audio Scene Description	File
	CAE-AAT	Audio Analysis Transform	File
	CAE-ASL	Audio Source Localisation	File
	CAE-ASE	Audio Separation and Enhancement	File
	CAE-AST	Audio Synthesis Transform	File
	CAE-ADM	Audio Description Multiplexing	File

Cookie	Duration	Description
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Technical".
CookieLawInfoConsent	1 year	The cookie is set by the GDPR Cookie Consent plug-in and is used to store whether the user has consented to the use of cookies or not. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_pk_id.6.08a8	13 months	Used to store a few details about the user such as the unique visitor ID
_pk_ses.6.08a8	30 minutes	Short lived cookies used to temporarily store data for the visit

CAE-USC V2.2 AIMs Audio Scene Description (CAE-ASD)

1 Functions

2 Reference Model

3 I/O Data of Composite AIM

4 Functions of AIMs

5 I/O Data of AIMs

6 AIMs and JSON Metadata

Notice