Abstract
This document is provided as a help to those who intend to submit a response to Call for Technologies: Template for Responses – Context-based Audio Enhancement (MPAI-CAE) – Six-DoF Audio (CAE-6DF). Text in
- Blue (as in this sentence) provides guidance to submitters and should not be included in a submission.
- Green (as in this sentence) is text that must mandatorily be present. If the submission is in multiple files, each file shall include the statement marked as green. If a submission does not include the text marked as green, the submission will be rejected.
- Black (as in this sentence) is text that is suggested and respondents may use in a submission.
1 Introduction
This document is submitted by <organisation name> (if an MPAI Member) and/or by <organisation name>, a <company, university etc.> registered in <…> (if not an MPAI member) in response to the Call for Technologies: Context-based Audio Enhancement (MPAI-CAE) – Six-DoF Audio (CAE-6DF) [5] issued by Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) on 2024/05/15.
In this document, the submitter, proposes technologies that satisfy the Functional Requirements of Use Cases and Functional Requirements: Context-based Audio Enhancement (MPAI-CAE) – Six-DoF Audio (CAE-6DF) [6] and the Framework Licence: Context-based Audio Enhancement (MPAI-CAE) – Six-DoF Audio (CAE-6DF) [7] both issued by MPAI on 2024/05/15.
This document also contains comments on the Use Cases and Functional Requirements as requested by N1763.
This document also contains proposed technologies that satisfy additional requirements as allowed by N1763.
<Company and/or Member> explicitly agrees to the steps of the MPAI standards development process defined in Annex 1 to the MPAI Statutes (N421) and [1], in particular <Company and/or Member> declares that <Company and/or Member> or its successors will make available the terms of the Licence related to its Essential IPR according to the Framework Licence: Context-based Audio Enhancement (MPAI-CAE) – Six-DoF Audio (CAE-6DF) [7], alone or jointly with other IPR holders after the approval of Technical Specification: Context-based Audio Enhancement (MPAI-CAE) – Six-DoF Audio (CAE-6DF) by the MPAI General Assembly and in no event after commercial implementations of the Technical Specification become available on the market.
< Company and/or Member> acknowledges the following points:
- MPAI in not obligated, by virtue of this CfT, to select a particular technology or to select any technology if those submitted are found inadequate.
- A representative of <Company and/or Member> shall present this submission at a Requirements (AIH) meeting communicated by the MPAI Secretariat. If no <Company and/or Member> will attend the meeting and present their submission, this will be discarded.
- <Company and/or Member> shall make available a working implementation – including source code – for use in the development of Reference Software Specification: Context-based Audio Enhancement (MPAI-CAE) – Six-DoF Audio (CAE-6DF) and eventual publication by MPAI before the technology submitted is published in Technical Specification: Context-based Audio Enhancement (MPAI-CAE) – Six-DoF Audio (CAE-6DF).
- The software submitted may be written in programming languages that can be compiled or interpreted and in hardware description languages, upon acceptance by MPAI for further evaluation of their submission in whole or in part.
- <Company> shall immediately join MPAI upon acceptance by MPAI for further evaluation of this submission in whole or in part.
- If <Company> does not join MPAI, this submission shall be discarded.
2 Information about this submission
This information corresponds to Annex A of N1761. It is included here for submitter’s convenience.
- Title of the proposal
- Organisation: company name, position, e-mail of contact person
- What are the main functionalities of your proposal?
- Does your proposal provide or describe a formal specification and APIs?
- Will you provide a demonstration to show how your proposal meets the evaluation criteria?
3 Comments on/extensions to requirements (if any)
4 Overview of Functional Requirements supported by this submission
Please answer Y or N. Detail on the specific answers should be provided in Chapter 6.
CAE-6DF use cases | Response |
Use Case 1 – Immersive Concert Experience (Music plus Video) | Y/N |
Use Case 2 – Immersive Radio Drama (Speech plus Foley/Effects) | Y/N |
Use Case 3 – Virtual lecture (Audio plus Video) | Y/N |
Use Case 4 – Immersive Opera/Ballet/Dance/Theatre experience (Music, Drama plus 360° Video/6DoF Visual) | Y/N |
CAE-6DF use cases | Response |
1. The Functional Requirements apply to the Audio experience and to the impact of visual conditions on the Audio experience. For instance: | Y/N |
a. Audio-Visual Contract, i.e. alignment of audio scenes with visual scenes. | Y/N |
b. Effects of locomotion on a human audio-visual perception. | Y/N |
c. Orientation response, i.e., turning toward a sound source of interest. | Y/N |
d. Distance perception such that visual and auditory modalities affect each other. | Y/N |
2. One or more of the following three content profiles should be addressed: | Y/N |
a. Scene-based, i.e., the captured audio scene, for example Ambisonics, is accurately reconstructed so that the Audio Scene provides a high degree of correspondence to the acoustic ambient characteristics of the captured audio scene. | Y/N |
b. Object-based, i.e., the Audio Scene comprises Audio Objects and associated metadata to allow synthesising a perceptually veridical, but not necessarily physically accurate, representation of the captured audio scene. | Y/N |
c. Mixed, i.e., a combination of scene-based and object-based profiles where Audio Objects can be overlaid or mixed with scene-based content. | Y/N |
3. One or both of the following rendering modalities should be addressed: | Y/N |
a. Loudspeaker-based, i.e., the content is rendered through at least two loudspeakers. | Y/N |
b. Headphone-based, i.e., the content is rendered through headphones. | Y/N |
4. If the audio content is rendered through loudspeakers, the rendering space should have the following characteristics: | Y/N |
a. Shape and dimensions: | Y/N |
i. Not larger than the captured space. | Y/N |
b. Acoustic ambient characteristics: | Y/N |
i. Early decay time (EDT) lower than the captured space. | Y/N |
ii. Frequency mode density lower than the captured space. | Y/N |
iii. Echo density lower than the captured space. | Y/N |
iv. Reverberation time (T60) lower than the captured space. | Y/N |
v. Energy decay curve characteristics same or lower than the captured space. | Y/N |
vi. Background noise less than 50dB(A) SPL. | Y/N |
5. If the audio content is rendered through headphones that can successfully block the audibility of the ambient acoustical characteristics of the rendering space the rendering space should have the following characteristics: | Y/N |
a. Shape and dimensions: | Y/N |
i. Not larger than the captured space. | Y/N |
b. Acoustic ambient characteristics: No constraints on the ambient characteristics defined in point 2.b. | Y/N |
6. The User movement in the rendering space may be the result of actual or virtual locomotion or orientation. | Y/N |
a. Actual locomotion/orientation of the User as tracked by sensors. | Y/N |
b. Virtual locomotion/orientation is actuated by controlling devices. | Y/N |
7. The maximum responsive latency of the audio system to user movement should be 20 ms or less, however, some applications may have higher latency. | Y/N |
5 New Proposed requirements (if any)
1. | Y/N |
2. | Y/N |
3. | Y/N |
6 Detailed description of submission
6.1 Proposal section #1
6.2 Proposal section #2
….
7 Conclusions
8 References
- MPAI Statutes
- MPAI Patent Policy
- MPAI Technical Specifications
- MPAI Patent Policy
- MPAI; Call for Technologies: Use Cases and Functional Requirements: Context-based Audio Enhancement (MPAI-CAE) – Six Degrees of Freedom Audio (CAE-6DF); Nxyz1
- MPAI; Use Cases and Functional Requirements: Context-based Audio Enhancement (MPAI-CAE) – Six Degrees of Freedom Audio (CAE-6DF); Nxyz2
- MPAI; Framework Licence: Use Cases and Functional Requirements: Context-based Audio Enhancement (MPAI-CAE) – Six Degrees of Freedom Audio (CAE-6DF); Nxyz3