Abstract

This document is provided as a help to those who intend to submit a response to Call for Technologies: Template for Responses – Context-based Audio Enhancement (MPAI-CAE) – Six-DoF Audio (CAE-6DF). Text in

  1. Blue  (as in this sentence) provides guidance to submitters and should not be included in a submission.
  2. Green (as in this sentence) is text that must mandatorily be present. If the submission is in multiple files, each file shall include the statement marked as green. If a submission does not include the text marked as green, the submission will be rejected.
  3. Black (as in this sentence) is text that is suggested and respondents may use in a submission.

 

1          Introduction

This document is submitted by <organisation name> (if an MPAI Member) and/or by <organisation name>, a <company, university etc.> registered in <…> (if not an MPAI member) in response to the Call for Technologies: Context-based Audio Enhancement (MPAI-CAE) – Six-DoF Audio (CAE-6DF) [5] issued by Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) on 2024/05/15.

In this document, the submitter, proposes technologies that satisfy the Functional Requirements of Use Cases and Functional Requirements: Context-based Audio Enhancement (MPAI-CAE) – Six-DoF Audio (CAE-6DF) [6] and the Framework Licence: Context-based Audio Enhancement (MPAI-CAE) – Six-DoF Audio (CAE-6DF) [7] both issued by MPAI on 2024/05/15.

 

This document also contains comments on the Use Cases and Functional Requirements as requested by N1763.

This document also contains proposed technologies that satisfy additional requirements as allowed by N1763.

 

<Company and/or Member> explicitly agrees to the steps of the MPAI standards development process defined in Annex 1 to the MPAI Statutes (N421) and [1], in particular <Company and/or Member> declares that  <Company and/or Member> or its successors will make available the terms of the Licence related to its Essential IPR according to the Framework Licence: Context-based Audio Enhancement (MPAI-CAE) – Six-DoF Audio (CAE-6DF) [7], alone or jointly with other IPR holders after the approval of Technical Specification: Context-based Audio Enhancement (MPAI-CAE) – Six-DoF Audio (CAE-6DF) by the MPAI General Assembly and in no event after commercial implementations of the Technical Specification become available on the market.

 

< Company and/or Member> acknowledges the following points:

  1. MPAI in not obligated, by virtue of this CfT, to select a particular technology or to select any technology if those submitted are found inadequate.
  2. A representative of <Company and/or Member> shall present this submission at a Requirements (AIH) meeting communicated by the MPAI Secretariat. If no <Company and/or Member> will attend the meeting and present their submission, this will be discarded.
  3. <Company and/or Member> shall make available a working implementation – including source code – for use in the development of Reference Software Specification: Context-based Audio Enhancement (MPAI-CAE) – Six-DoF Audio (CAE-6DF) and eventual publication by MPAI before the technology submitted is published in Technical Specification: Context-based Audio Enhancement (MPAI-CAE) – Six-DoF Audio (CAE-6DF).
  4. The software submitted may be written in programming languages that can be compiled or interpreted and in hardware description languages, upon acceptance by MPAI for further eval­uation of their submission in whole or in part.
  5. <Company> shall immediately join MPAI upon acceptance by MPAI for further evaluation of this submission in whole or in part.
  6. If <Company> does not join MPAI, this submission shall be discarded.

2          Information about this submission

This information corresponds to Annex A of N1761. It is included here for submitter’s convenience.

 

  1. Title of the proposal
  2. Organisation: company name, position, e-mail of contact person
  3. What are the main functionalities of your proposal?
  4. Does your proposal provide or describe a formal specification and APIs?
  5. Will you provide a demonstration to show how your proposal meets the evaluation criteria?

3          Comments on/extensions to requirements (if any)

 

4          Overview of Functional Requirements supported by this submission

Please answer Y or N. Detail on the specific answers should be provided in Chapter 6.

CAE-6DF use cases Response
Use Case 1 – Immersive Concert Experience (Music plus Video) Y/N
Use Case 2 – Immersive Radio Drama (Speech plus Foley/Effects) Y/N
Use Case 3 – Virtual lecture (Audio plus Video) Y/N
Use Case 4 – Immersive Opera/Ballet/Dance/Theatre experience (Music, Drama plus 360° Video/6DoF Visual) Y/N
CAE-6DF use cases Response
1.     The Functional Requirements apply to the Audio experience and to the impact of visual conditions on the Audio experience. For instance: Y/N
a.     Audio-Visual Contract, i.e. alignment of audio scenes with visual scenes. Y/N
b.     Effects of locomotion on a human audio-visual perception. Y/N
c.     Orientation response, i.e., turning toward a sound source of interest. Y/N
d.     Distance perception such that visual and auditory modalities affect each other. Y/N
2.     One or more of the following three content profiles should be addressed: Y/N
a.     Scene-based, i.e., the captured audio scene, for example Ambisonics, is accurately reconstructed so that the Audio Scene provides a high degree of correspondence to the acoustic ambient characteristics of the captured audio scene. Y/N
b.     Object-based, i.e., the Audio Scene comprises Audio Objects and associated metadata to allow synthesising a perceptually veridical, but not necessarily physically accurate, representation of the captured audio scene. Y/N
c.     Mixed, i.e., a combination of scene-based and object-based profiles where Audio Objects can be overlaid or mixed with scene-based content. Y/N
3.     One or both of the following rendering modalities should be addressed: Y/N
a.     Loudspeaker-based, i.e., the content is rendered through at least two loudspeakers. Y/N
b.     Headphone-based, i.e., the content is rendered through headphones. Y/N
4.     If the audio content is rendered through loudspeakers, the rendering space should have the following characteristics: Y/N
a.     Shape and dimensions: Y/N
                                      i.     Not larger than the captured space. Y/N
b.     Acoustic ambient characteristics: Y/N
                                      i.     Early decay time (EDT) lower than the captured space. Y/N
                                    ii.     Frequency mode density lower than the captured space. Y/N
                                  iii.     Echo density lower than the captured space. Y/N
                                   iv.     Reverberation time (T60) lower than the captured space. Y/N
                                    v.     Energy decay curve characteristics same or lower than the captured space. Y/N
                                   vi.     Background noise less than 50dB(A) SPL. Y/N
5.     If the audio content is rendered through headphones that can successfully block the audibility of the ambient acoustical characteristics of the rendering space the rendering space should have the following characteristics: Y/N
a.     Shape and dimensions: Y/N
                                      i.     Not larger than the captured space. Y/N
b.     Acoustic ambient characteristics: No constraints on the ambient characteristics defined in point 2.b. Y/N
6.     The User movement in the rendering space may be the result of actual or virtual locomotion or orientation. Y/N
a.     Actual locomotion/orientation of the User as tracked by sensors. Y/N
b.     Virtual locomotion/orientation is actuated by controlling devices. Y/N
7.     The maximum responsive latency of the audio system to user movement should be 20 ms or less, however, some applications may have higher latency. Y/N

5          New Proposed requirements (if any)

1. Y/N
2. Y/N
3. Y/N

6          Detailed description of submission

6.1        Proposal section #1

6.2        Proposal section #2

….

7          Conclusions

8          References

  1. MPAI Statutes
  2. MPAI Patent Policy
  3. MPAI Technical Specifications
  4. MPAI Patent Policy
  5. MPAI; Call for Technologies: Use Cases and Functional Requirements: Context-based Audio Enhancement (MPAI-CAE) – Six Degrees of Freedom Audio (CAE-6DF); Nxyz1
  6. MPAI; Use Cases and Functional Requirements: Context-based Audio Enhancement (MPAI-CAE) – Six Degrees of Freedom Audio (CAE-6DF); Nxyz2
  7. MPAI; Framework Licence: Use Cases and Functional Requirements: Context-based Audio Enhancement (MPAI-CAE) – Six Degrees of Freedom Audio (CAE-6DF); Nxyz3