<-Go to AI Workflows Go to ToC ->
1 Functions
A CAE-6DF receiver uses the 6DF transform-domain direct sound components and the time-domain of the diffuse field components to compute the first or higher-order Ambisonics representation of an arbitrary Point of View in the represented area of the reconstructed sound field using:
- Demultiplex the input data (the transform-domain direct sound components and the time-domain of the diffuse field components).
- Directional interpolation of transform-domain plane wave components (direct sound components)
- Linear interpolation of time-domain diffuse components
- Inverse transformation of transform-domain plane wave components into time-domain
- Combination of the reconstructed plane wave and diffuse field components
- (Transcoding the result into binaural signals if necessary)
- The output can be first order ambisonics or high order ambisonics specified by multichannel audio data.
2 Reference Model and its operation
Figure 1 depicts the Six Degrees of Freedom Decoder (CAE-6DC) Reference Model.

Figure 1 – Six Degrees of Freedom Decoder (CAE-6DC)
The sequence of operations in CAE-6DC are as follows:
- 6DF Multiplexing AIM unpacks the 6DF Audio Object produced by the CAE-6EN encoder to produce 1) Device Scene Geometry, 2) the Time-domain Compressed Diffuse-dominant signals as a complex-valued tensor where is the number of Microphone Arrays in the Device Scene Geometry, is the SH Decomposition order; is the duration of the Audio Object in seconds; and is the sampling rate, and 3) parametrically represented Dominant Plane Wave time-frequency bins as a nested list of lists of tuples [ where and represent time and frequency indices; is the result of the residual energy test for the specific time-frequency bin; h is the index of the direction on the HEALPix grid, near-uniformly tessellating the unit sphere, that corresponds to the direction of the dominant source; and are the real and imaginary parts of that source’s amplitude.
- Audio Decompression AIM receives as input the time-domain compressed diffuse-dominant audio components and decompresses these signals using the matching decoder to the audio compression AIM in CAE-6EN producing time-domain decompressed diffuse field components.
- Interpolation Coefficients Calculation AIM receives the position of the listener as well as the Device Scene Geometry to produce the Microphone Array indices that are later to be selected for interpolation as well as the Interpolation Parameters, which are to be used in sound field interpolation. The interpolation parameters are calculated as barycentric interpolation coefficients that correspond to the relative position of the user with respect to the three closest vertices indicated by the positions of the selected Microphone Arrays.
- Triplet Select AIM selects the direct dominant time-frequency bins and the diffuse dominant time-domain audio signals using the microphone array indices calculated in the Interpolation Coefficients Calculation AIM only for those Microphone Arrays where the RENT is greater than a specified threshold. It produces three sets of plane wave parameters (i.e., direction and amplitude), as well as three diffuse field time domain signals for use by the 6DF Interpolation AIM.
- 6DF Interpolation AIM is a composite AIM which carries out the following operations: 1) parametric interpolation of the selected dominant plane wave components in a time-frequency bin-basis, 2) linear interpolation of time-domain diffuse dominant components, 3) transforming the dominant plane wave components back to time domain. This AIM outputs Time-Domain Interpolated Diffuse Sound Field and Time-domain Interpolated Direct Sound Field.
- The Inverse Short-Time Fourier Transform, ISTFT AIM is the reconstruction stage that transitions the synthesized spatial audio channels from the time-frequency domain back into continuous time-domain waveforms. Utilizing an overlap-add (OLA) synthesis method.
- Direct-Diffuse Mix AIM uses a user-defined Direct Gain to set the relative gains of the direct and diffuse components of the interpolated sound field prior to rendering. This allows the user to control how reverberant the resulting environment will sound. The output is the Time Domain Interpolated Total Sound Field.
- Audio Rendering Module AIM receives the Time-Domain Interpolated Total Sound Field in HOA format. It uses the user head Orientation to rotate the HOA signal to match Orientation of the user’s head. The user can select to generate an Audio Object for delivery over headphones or over a loudspeaker setup. This is signalled via the selection parameter, which is input to this AIM.
3 I/O data of AI Workflow
Table 1 gives the input and output data of 6DF Decoder.
Table 2 – Input/Output Data of CAE-6DC
Table 6 – Input/Output Data of the Scene-Based 6DF Decoder
| Input | Description |
| Audio Object | 6DF Audio Object Output of 6DF Encoder |
| Direct Gain | Controls the ratio of the direct vs. the diffuse field |
| Position | Position of User’s Head |
| Orientation | Orientation of User’s Head |
| Output | Description |
| Audio Object | Output of 6DF Decoder which allows rendering in different formats (including 5.1, 7.1, 7.1.4, 11.4, 22.2, HOA, Binaural) |
4 Functions of AI Modules
The Functions of the AIMs required by 6DF Decoder are specified in Table 3.
Table 2 – Functions of AI Modules of 6DF Decoder
| AI Modules | Functions |
| 6DF Demultiplexing | Demultiplexes the 6DF Bitstream. |
| Audio Decompression | Restores the audio waveforms pertaining to the diffuse field . |
| Triplet Select | Selects the three closest microphone arrays’ indices. |
| Interpolation Coefficients Calculation | Calculates the interpolation coefficients given the position of the listener’s head and the closest triplets. |
| Plane Wave Parameter Interpolation | Interpolates the amplitude and direction of plane waves based on the listener’s Position and Orientation. |
| Plane Wave Synthesis | Reconstructs the interpolated plane wave, based on 6DF interpolation (Frequency Domain). |
| Linear Interpolation | Interpolates the diffuse field audio waveform using a linear combination of individual waveforms of diffuse field. |
| Inverse STFT | Reconstructs the direct field audio waveform from frequency domain to time domain. |
| Direct/Diffuse Mix | Mixes the direct and diffuse field based on the Listener-selected direct gain. |
| 6DF Interpolation | Generates an approximation of the Audio Scene Descriptors in a position that was not captured by a Microphone Array. |
| Sound Field Rotation | Rotates the sound field based on the Listener’s head orientation. |
| Audio Rendering Module | Converts time-domain sound field into the audio payload of loudspeakers/headphones. |
| Binaural Transcoding | Transcodes from the Ambisonics format to the binaural format when necessary. |
| Loudspeaker Transcoding | Transcodes from the Ambisonics format to the loudspeaker format when necessary. |
5 I/O Data of AI Modules
Table 4 – Input/Output Data of AI Modules
| AI Modules | Input Data | Output Data |
| 6DF Demultiplexing | Audio Object | Compressed Diffuse Dominant Dominant Sparse Plane Wave
Device Scene Geometry |
| Audio Decompression | Compressed Diffuse Field | Decompressed Diffuse Field |
| Triplet Select | Decompressed Diffuse Field,
Dominant Plane Wave, Microphone Array Indices |
Selected Time-Domain Diffuse Field, Selected Dominant Plane Wave Parameters |
| Interpolation Coefficients Calculation | Listener/head Position, Scene Geometry | Interpolation Coefficients |
| Plane Wave Parameter Interpolation | Selected Direct Dominant Sparse Plane Wave Decomposition, Interpolation Coefficients | Interpolated Frequency-Domain Direct Dominant Sparse Plane Wave Decomposition |
| Plane Save Synthesis | Interpolated Frequency-Domain Direct Dominant Sparse Plane Wave Decomposition | Interpolated Frequency-Domain Direct Dominant Field |
| Linear Interpolation | Selected Time-Domain Diffuse Dominant, Interpolation Coefficients | Interpolated Diffuse Field Dominant |
| Inverse STFT | Interpolated Frequency-Domain Direct Dominant Field | Interpolated Time-Domain Direct Dominant Field |
| Direct/Diffuse Mix | Interpolated Time-Domain Direct Dominant Field, Interpolated Diffuse Field Dominant, Direct Gain | Interpolated Sound Field |
| 6DF Interpolation | Interpolation Parameters
Selected TD Diffuse Field Selected Dominant Plane Wave Parameters |
TD Interpolated Diffuse Sound Field
TD Interpolated Direct Sound Field
|
| Sound Field Rotation | Interpolated Sound Field, Head Orientation | Rotated Sound Field |
| Audio Rendering Module | Selection
Point of View TD Interpolated Total Sound Field |
Audio Object (Headphone)
Audio Object (Loudspeaker) |
| Headphone Transcoding | Rotated Sound Field, HRTF | Binaural Representation of the Sound Field |
| Loudspeaker Transcoding | Rotated Sound Field, HRTF | Loudspeaker Representation of the Sound Field |
6 AIW, AIMs, and JSON Metadata
Table 5 – AIW, AIMs, and JSON Metadata
| AIW | AIM | Name | JSON |
| 6DF-6DC | Six Degrees of Freedom Decoder | File | |
| CAE-6DX | 6DF Demultiplexing | File | |
| CAE-DCM | 6DF Decompression | File | |
| CAE-ICC | Interpolation Coefficients Calculation | File | |
| CAE-TPS | Triplet Select | File | |
| CAE-PWI | Plane Wave Parameter Interpolation | File | |
| CAE-PWS | Plane Wave Synthesis | File | |
| CAE-LIP | Linear Interpolation | File | |
| CAE-ISF | Inverse STFT | File | |
| CAE-DDX | Direct/Diffuse Mix | File | |
| CAE-SFR | Sound Field Rotation | File | |
| CAE-6DI | 6DF Interpolation | File | |
| CAE-ARM | Audio Rendering Module | File |
7. Reference Software
The 6DF Encoder Reference Software can be downloaded from the MPAI Git.
8. Conformance Testing
| Receives | 6DF Audio Object | Shall validate against the Audio Object schema. The Qualifier shall validate against the Audio Qualifier schema. The values of any Sub-Type, Format, and Attribute of the Qualifier shall correspond with the Sub-Type, Format, and Attributes of the Audio Object Qualifier schema. |
| Produces | Audio Object (Headphone) | Shall validate against the Audio Object schema. The Qualifier shall validate against the Audio Qualifier schema. The values of any Sub-Type, Format, and Attribute of the Qualifier shall correspond with the Sub-Type, Format, and Attributes of the Audio Object Qualifier schema. |
| Audio Object (Loudspeaker) | Shall validate against the Audio Object schema. The Qualifier shall validate against the Audio Qualifier schema. The values of any Sub-Type, Format, and Attribute of the Qualifier shall correspond with the Sub-Type, Format, and Attributes of the Audio Object Qualifier schema. |