CAE-6DF V1.0 AIWs - 6DF Decoder

<-Go to AI Workflows Go to ToC ->

1 Functions	2 Reference Model	3 I/O data of AI Workflow
4 Functions of AI Modules	5 I/O Data of AI Modules	6 AIW, AIMs, and JSON Metadata
7 Reference Software	8 Conformance Testing	9 Performance Assessment

1 Functions

A CAE-6DF receiver uses the 6DF transform-domain direct sound components and the time-domain of the diffuse field components to compute the first or higher-order Ambisonics representation of an arbitrary Point of View in the represented area of the reconstructed sound field using:

Demultiplex the input data (the transform-domain direct sound components and the time-domain of the diffuse field components).
Directional interpolation of transform-domain plane wave components (direct sound components)
Linear interpolation of time-domain diffuse components
Inverse transformation of transform-domain plane wave components into time-domain
Combination of the reconstructed plane wave and diffuse field components
(Transcoding the result into binaural signals if necessary)
The output can be first order ambisonics or high order ambisonics specified by multichannel audio data.

2 Reference Model and its operation

Figure 1 depicts the Six Degrees of Freedom Decoder (CAE-6DC) Reference Model.

Figure 1 – Six Degrees of Freedom Decoder (CAE-6DC)

The sequence of operations in CAE-6DC are as follows:

6DF Multiplexing AIM unpacks the 6DF Audio Object produced by the CAE-6EN encoder to produce 1) Device Scene Geometry, 2) the Time-domain Compressed Diffuse-dominant signals as a complex-valued tensor where is the number of Microphone Arrays in the Device Scene Geometry, is the SH Decomposition order; is the duration of the Audio Object in seconds; and is the sampling rate, and 3) parametrically represented Dominant Plane Wave time-frequency bins as a nested list of lists of tuples [ where and represent time and frequency indices; is the result of the residual energy test for the specific time-frequency bin; h is the index of the direction on the HEALPix grid, near-uniformly tessellating the unit sphere, that corresponds to the direction of the dominant source; and are the real and imaginary parts of that source’s amplitude.
Audio Decompression AIM receives as input the time-domain compressed diffuse-dominant audio components and decompresses these signals using the matching decoder to the audio compression AIM in CAE-6EN producing time-domain decompressed diffuse field components.
Interpolation Coefficients Calculation AIM receives the position of the listener as well as the Device Scene Geometry to produce the Microphone Array indices that are later to be selected for interpolation as well as the Interpolation Parameters, which are to be used in sound field interpolation. The interpolation parameters are calculated as barycentric interpolation coefficients that correspond to the relative position of the user with respect to the three closest vertices indicated by the positions of the selected Microphone Arrays.
Triplet Select AIM selects the direct dominant time-frequency bins and the diffuse dominant time-domain audio signals using the microphone array indices calculated in the Interpolation Coefficients Calculation AIM only for those Microphone Arrays where the RENT is greater than a specified threshold. It produces three sets of plane wave parameters (i.e., direction and amplitude), as well as three diffuse field time domain signals for use by the 6DF Interpolation AIM.
6DF Interpolation AIM is a composite AIM which carries out the following operations: 1) parametric interpolation of the selected dominant plane wave components in a time-frequency bin-basis, 2) linear interpolation of time-domain diffuse dominant components, 3) transforming the dominant plane wave components back to time domain. This AIM outputs Time-Domain Interpolated Diffuse Sound Field and Time-domain Interpolated Direct Sound Field.
The Inverse Short-Time Fourier Transform, ISTFT AIM is the reconstruction stage that transitions the synthesized spatial audio channels from the time-frequency domain back into continuous time-domain waveforms. Utilizing an overlap-add (OLA) synthesis method.
Direct-Diffuse Mix AIM uses a user-defined Direct Gain to set the relative gains of the direct and diffuse components of the interpolated sound field prior to rendering. This allows the user to control how reverberant the resulting environment will sound. The output is the Time Domain Interpolated Total Sound Field.
Audio Rendering Module AIM receives the Time-Domain Interpolated Total Sound Field in HOA format. It uses the user head Orientation to rotate the HOA signal to match Orientation of the user’s head. The user can select to generate an Audio Object for delivery over headphones or over a loudspeaker setup. This is signalled via the selection parameter, which is input to this AIM.

3 I/O data of AI Workflow

Table 1 gives the input and output data of 6DF Decoder.

Table 2 – Input/Output Data of CAE-6DC

Table 6 – Input/Output Data of the Scene-Based 6DF Decoder

Input	Description
Audio Object	6DF Audio Object Output of 6DF Encoder
Direct Gain	Controls the ratio of the direct vs. the diffuse field
Position	Position of User’s Head
Orientation	Orientation of User’s Head
Output	Description
Audio Object	Output of 6DF Decoder which allows rendering in different formats (including 5.1, 7.1, 7.1.4, 11.4, 22.2, HOA, Binaural)

4 Functions of AI Modules

The Functions of the AIMs required by 6DF Decoder are specified in Table 3.

Table 2 – Functions of AI Modules of 6DF Decoder

AI Modules	Functions
6DF Demultiplexing	Demultiplexes the 6DF Bitstream.
Audio Decompression	Restores the audio waveforms pertaining to the diffuse field .
Triplet Select	Selects the three closest microphone arrays’ indices.
Interpolation Coefficients Calculation	Calculates the interpolation coefficients given the position of the listener’s head and the closest triplets.
Plane Wave Parameter Interpolation	Interpolates the amplitude and direction of plane waves based on the listener’s Position and Orientation.
Plane Wave Synthesis	Reconstructs the interpolated plane wave, based on 6DF interpolation (Frequency Domain).
Linear Interpolation	Interpolates the diffuse field audio waveform using a linear combination of individual waveforms of diffuse field.
Inverse STFT	Reconstructs the direct field audio waveform from frequency domain to time domain.
Direct/Diffuse Mix	Mixes the direct and diffuse field based on the Listener-selected direct gain.
6DF Interpolation	Generates an approximation of the Audio Scene Descriptors in a position that was not captured by a Microphone Array.
Sound Field Rotation	Rotates the sound field based on the Listener’s head orientation.
Audio Rendering Module	Converts time-domain sound field into the audio payload of loudspeakers/headphones.
Binaural Transcoding	Transcodes from the Ambisonics format to the binaural format when necessary.
Loudspeaker Transcoding	Transcodes from the Ambisonics format to the loudspeaker format when necessary.

5 I/O Data of AI Modules

Table 4 – Input/Output Data of AI Modules

AI Modules	Input Data	Output Data
6DF Demultiplexing	Audio Object	Compressed Diffuse Dominant Dominant Sparse Plane Wave Device Scene Geometry
Audio Decompression	Compressed Diffuse Field	Decompressed Diffuse Field
Triplet Select	Decompressed Diffuse Field, Dominant Plane Wave, Microphone Array Indices	Selected Time-Domain Diffuse Field, Selected Dominant Plane Wave Parameters
Interpolation Coefficients Calculation	Listener/head Position, Scene Geometry	Interpolation Coefficients
Plane Wave Parameter Interpolation	Selected Direct Dominant Sparse Plane Wave Decomposition, Interpolation Coefficients	Interpolated Frequency-Domain Direct Dominant Sparse Plane Wave Decomposition
Plane Save Synthesis	Interpolated Frequency-Domain Direct Dominant Sparse Plane Wave Decomposition	Interpolated Frequency-Domain Direct Dominant Field
Linear Interpolation	Selected Time-Domain Diffuse Dominant, Interpolation Coefficients	Interpolated Diffuse Field Dominant
Inverse STFT	Interpolated Frequency-Domain Direct Dominant Field	Interpolated Time-Domain Direct Dominant Field
Direct/Diffuse Mix	Interpolated Time-Domain Direct Dominant Field, Interpolated Diffuse Field Dominant, Direct Gain	Interpolated Sound Field
6DF Interpolation	Interpolation Parameters Selected TD Diffuse Field Selected Dominant Plane Wave Parameters	TD Interpolated Diffuse Sound Field TD Interpolated Direct Sound Field
Sound Field Rotation	Interpolated Sound Field, Head Orientation	Rotated Sound Field
Audio Rendering Module	Selection Point of View TD Interpolated Total Sound Field	Audio Object (Headphone) Audio Object (Loudspeaker)
Headphone Transcoding	Rotated Sound Field, HRTF	Binaural Representation of the Sound Field
Loudspeaker Transcoding	Rotated Sound Field, HRTF	Loudspeaker Representation of the Sound Field

6 AIW, AIMs, and JSON Metadata

Table 5 – AIW, AIMs, and JSON Metadata

AIW	AIM	Name	JSON
6DF-6DC		Six Degrees of Freedom Decoder	File
	CAE-6DX	6DF Demultiplexing	File
	CAE-DCM	6DF Decompression	File
	CAE-ICC	Interpolation Coefficients Calculation	File
	CAE-TPS	Triplet Select	File
	CAE-PWI	Plane Wave Parameter Interpolation	File
	CAE-PWS	Plane Wave Synthesis	File
	CAE-LIP	Linear Interpolation	File
	CAE-ISF	Inverse STFT	File
	CAE-DDX	Direct/Diffuse Mix	File
	CAE-SFR	Sound Field Rotation	File
	CAE-6DI	6DF Interpolation	File
	CAE-ARM	Audio Rendering Module	File

7. Reference Software

The 6DF Encoder Reference Software can be downloaded from the MPAI Git.

8. Conformance Testing

Receives	6DF Audio Object	Shall validate against the Audio Object schema. The Qualifier shall validate against the Audio Qualifier schema. The values of any Sub-Type, Format, and Attribute of the Qualifier shall correspond with the Sub-Type, Format, and Attributes of the Audio Object Qualifier schema.
Produces	Audio Object (Headphone)	Shall validate against the Audio Object schema. The Qualifier shall validate against the Audio Qualifier schema. The values of any Sub-Type, Format, and Attribute of the Qualifier shall correspond with the Sub-Type, Format, and Attributes of the Audio Object Qualifier schema.
	Audio Object (Loudspeaker)	Shall validate against the Audio Object schema. The Qualifier shall validate against the Audio Qualifier schema. The values of any Sub-Type, Format, and Attribute of the Qualifier shall correspond with the Sub-Type, Format, and Attributes of the Audio Object Qualifier schema.

9. Performance Assessment