<-Go to AI Workflows Go to ToC ->

1 Functions 2 Reference Model 3 I/O data of AI Workflow
4 Functions of AI Modules 5 I/O Data of AI Modules 6 AIW, AIMs, and JSON Metadata
7 Reference Software 8 Conformance Testing 9 Performance Assessment

1 Functions

A CAE-6DF receiver uses the 6DF transform-domain direct sound components and the time-domain of the diffuse field components to compute the first or higher-order Ambisonics representation of an arbitrary Point of View in the represented area of the reconstructed sound field using:

  1. Demultiplex the input data (the transform-domain direct sound components and the time-domain of the diffuse field components).
  2. Directional interpolation of transform-domain plane wave components (direct sound components)
  3. Linear interpolation of time-domain diffuse components
  4. Inverse transformation of transform-domain plane wave components into time-domain
  5. Combination of the reconstructed plane wave and diffuse field components
  6. (Transcoding the result into binaural signals if necessary)
  7. The output can be first order ambisonics or high order ambisonics specified by multichannel audio data.

2 Reference Model and its operation

Figure 1 depicts the Six Degrees of Freedom Decoder (CAE-6DC) Reference Model.

Figure 1 – Six Degrees of Freedom Decoder (CAE-6DC)

The sequence of operations in CAE-6DC are as follows:

  1. 6DF Multiplexing AIM unpacks the 6DF Audio Object produced by the CAE-6EN encoder to produce 1) Device Scene Geometry, 2) the Time-domain Compressed Diffuse-dominant signals as a complex-valued tensor where  is the number of Microphone Arrays in the Device Scene Geometry,  is the SH Decomposition order;  is the duration of the Audio Object in seconds; and  is the sampling rate, and 3) parametrically represented Dominant Plane Wave time-frequency bins as a nested list of  lists of tuples [  where  and  represent time and frequency indices;  is the result of the residual energy test for the specific time-frequency bin; h is the index of the direction on the HEALPix grid, near-uniformly tessellating the unit sphere, that corresponds to the direction of the dominant source;  and  are the real and imaginary parts of that source’s amplitude.
  2. Audio Decompression AIM receives as input the time-domain compressed diffuse-dominant audio components and decompresses these signals using the matching decoder to the audio compression AIM in CAE-6EN producing time-domain decompressed diffuse field components.
  3. Interpolation Coefficients Calculation AIM receives the position of the listener as well as the Device Scene Geometry to produce the Microphone Array indices that are later to be selected for interpolation as well as the Interpolation Parameters, which are to be used in sound field interpolation. The interpolation parameters are calculated as barycentric interpolation coefficients that correspond to the relative position of the user with respect to the three closest vertices indicated by the positions of the selected Microphone Arrays.
  4. Triplet Select AIM selects the direct dominant time-frequency bins and the diffuse dominant time-domain audio signals using the microphone array indices calculated in the Interpolation Coefficients Calculation AIM only for those Microphone Arrays where the RENT is greater than a specified threshold. It produces three sets of plane wave parameters (i.e., direction and amplitude), as well as three diffuse field time domain signals for use by the 6DF Interpolation AIM.
  5. 6DF Interpolation AIM is a composite AIM which carries out the following operations: 1) parametric interpolation of the selected dominant plane wave components in a time-frequency bin-basis, 2) linear interpolation of time-domain diffuse dominant components, 3) transforming the dominant plane wave components back to time domain. This AIM outputs Time-Domain Interpolated Diffuse Sound Field and Time-domain Interpolated Direct Sound Field.
  6. The Inverse Short-Time Fourier Transform, ISTFT AIM is the reconstruction stage that transitions the synthesized spatial audio channels from the time-frequency domain back into continuous time-domain waveforms. Utilizing an overlap-add (OLA) synthesis method.
  7. Direct-Diffuse Mix AIM uses a user-defined Direct Gain to set the relative gains of the direct and diffuse components of the interpolated sound field prior to rendering. This allows the user to control how reverberant the resulting environment will sound. The output is the Time Domain Interpolated Total Sound Field.
  8. Audio Rendering Module AIM receives the Time-Domain Interpolated Total Sound Field in HOA format. It uses the user head Orientation to rotate the HOA signal to match Orientation of the user’s head. The user can select to generate an Audio Object for delivery over headphones or over a loudspeaker setup. This is signalled via the selection parameter, which is input to this AIM.

3 I/O data of AI Workflow

Table 1 gives the input and output data of 6DF Decoder.

Table 2 – Input/Output Data of CAE-6DC

Table 6 – Input/Output Data of the Scene-Based 6DF Decoder

Input Description
Audio Object 6DF Audio Object Output of 6DF Encoder
Direct Gain Controls the ratio of the direct vs. the diffuse field
Position Position of User’s Head
Orientation Orientation of User’s Head
Output Description
Audio Object Output of 6DF Decoder which allows rendering in different formats (including 5.1, 7.1, 7.1.4, 11.4, 22.2, HOA, Binaural)

4 Functions of AI Modules

The Functions of the AIMs required by 6DF Decoder are specified in Table 3.

Table 2 – Functions of AI Modules of 6DF Decoder

AI Modules Functions
6DF Demultiplexing Demultiplexes the 6DF Bitstream.
Audio Decompression Restores the audio waveforms pertaining to the diffuse field .
Triplet Select Selects the three closest microphone arrays’ indices.
Interpolation Coefficients Calculation Calculates the interpolation coefficients given the position of the listener’s head and the closest triplets.
Plane Wave Parameter Interpolation Interpolates the amplitude and direction of plane waves based on the listener’s Position and Orientation.
Plane Wave Synthesis Reconstructs the interpolated plane wave, based on 6DF interpolation (Frequency Domain).
Linear Interpolation Interpolates the diffuse field audio waveform using a linear combination of individual waveforms of diffuse field.
Inverse STFT Reconstructs the direct field audio waveform from frequency domain to time domain.
Direct/Diffuse Mix Mixes the direct and diffuse field based on the Listener-selected direct gain.
6DF Interpolation Generates an approximation of the Audio Scene Descriptors in a position that was not captured by a Microphone Array.
Sound Field Rotation Rotates the sound field based on the Listener’s head orientation.
Audio Rendering Module Converts time-domain sound field into the audio payload of loudspeakers/headphones.
Binaural Transcoding Transcodes from the Ambisonics format to the binaural format when necessary.
Loudspeaker Transcoding Transcodes from the Ambisonics format to the loudspeaker format when necessary.

5 I/O Data of AI Modules

Table 4 – Input/Output Data of AI Modules

AI Modules Input Data Output Data
6DF Demultiplexing Audio Object Compressed Diffuse Dominant Dominant Sparse Plane Wave

Device Scene Geometry

Audio Decompression Compressed Diffuse Field Decompressed Diffuse Field
Triplet Select Decompressed Diffuse Field,

Dominant Plane Wave, Microphone Array Indices

Selected Time-Domain Diffuse Field, Selected Dominant Plane Wave Parameters
Interpolation Coefficients Calculation Listener/head Position, Scene Geometry Interpolation Coefficients
Plane Wave Parameter Interpolation Selected Direct Dominant Sparse Plane Wave Decomposition, Interpolation Coefficients Interpolated Frequency-Domain Direct Dominant Sparse Plane Wave Decomposition
Plane Save Synthesis Interpolated Frequency-Domain Direct Dominant Sparse Plane Wave Decomposition Interpolated Frequency-Domain Direct Dominant Field
Linear Interpolation Selected Time-Domain Diffuse Dominant, Interpolation Coefficients Interpolated Diffuse Field Dominant
Inverse STFT Interpolated Frequency-Domain Direct Dominant Field Interpolated Time-Domain Direct Dominant Field
Direct/Diffuse Mix Interpolated Time-Domain Direct Dominant Field, Interpolated Diffuse Field Dominant, Direct Gain Interpolated Sound Field
6DF Interpolation Interpolation Parameters

Selected TD Diffuse Field

Selected Dominant Plane Wave Parameters

TD Interpolated Diffuse Sound Field

TD Interpolated Direct Sound Field

 

Sound Field Rotation Interpolated Sound Field, Head Orientation Rotated Sound Field
Audio Rendering Module Selection

Point of View

TD Interpolated Total Sound Field

Audio  Object (Headphone)

Audio  Object (Loudspeaker)

Headphone Transcoding Rotated Sound Field, HRTF Binaural Representation of the Sound Field
Loudspeaker Transcoding Rotated Sound Field, HRTF Loudspeaker Representation of the Sound Field

6 AIW, AIMs, and JSON Metadata

Table 5 – AIW, AIMs, and JSON Metadata

AIW AIM Name JSON
6DF-6DC Six Degrees of Freedom Decoder File
CAE-6DX 6DF Demultiplexing File
CAE-DCM 6DF Decompression File
CAE-ICC Interpolation Coefficients Calculation File
CAE-TPS Triplet Select File
CAE-PWI Plane Wave Parameter Interpolation File
CAE-PWS Plane Wave Synthesis File
CAE-LIP Linear Interpolation File
CAE-ISF Inverse STFT File
CAE-DDX Direct/Diffuse Mix File
CAE-SFR Sound Field Rotation File
CAE-6DI 6DF Interpolation File
CAE-ARM Audio Rendering Module File

7. Reference Software

The 6DF Encoder Reference Software can be downloaded from the MPAI Git.

8. Conformance Testing

Receives 6DF Audio Object Shall validate against the Audio Object schema.
The Qualifier shall validate against the Audio Qualifier schema.
The values of any Sub-Type, Format, and Attribute of the Qualifier shall correspond with the Sub-Type, Format, and Attributes of the Audio Object Qualifier schema.
Produces Audio Object (Headphone) Shall validate against the Audio Object schema.
The Qualifier shall validate against the Audio Qualifier schema.
The values of any Sub-Type, Format, and Attribute of the Qualifier shall correspond with the Sub-Type, Format, and Attributes of the Audio Object Qualifier schema.
Audio Object (Loudspeaker) Shall validate against the Audio Object schema.
The Qualifier shall validate against the Audio Qualifier schema.
The values of any Sub-Type, Format, and Attribute of the Qualifier shall correspond with the Sub-Type, Format, and Attributes of the Audio Object Qualifier schema.

9. Performance Assessment

<-Go to AI Workflows Go to ToC ->