Context-based Audio Enhancement (MPAI-CAE)
This document is also available in MS Word format as MPAI-CAE Call for Technologies
3 Evaluation Criteria and Procedure. 4
4 Expected development timeline. 4
Annex C: Requirements check list 10
Annex D: Technologies that may require specific testing. 11
Annex E: Mandatory text in responses. 12
1 Introduction
Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) is an international non-profit organisation with the mission to develop standards for Artificial Intelligence (AI) enabled digital data coding and for technologies that facilitate integration of data coding components into ICT systems. With the mechanism of Framework Licences, MPAI seeks to attach clear IPR licensing frameworks to its standards.
MPAI has found that the application area called “Context-based Audio Enhancement” is particularly relevant for MPAI standardisation because using context information to act on the input audio content can substantially improve the user experience of a variety of audio-related applications that include entertainment, communication, teleconferencing, gaming, post-production, restoration etc. for a variety of contexts such as in the home, in the car, on-the-go, in the studio etc.
Therefore, MPAI intends to develop a standard – to be called MPAI-CAE – that will provide standard technologies to implement four Use Cases identified so far:
- Emotion-Enhanced Speech (EES)
- Audio Recording Preservation (ARP)
- Enhanced Audioconference Experience (EAC)
- Audio-on-the-go (AOG)
This document is a Call for Technologies (CfT) that
- Satisfy the MPAI-CAE Functional Requirements (N151) [4] and
- Are released according to the MPAI-CAE Framework Licence (N171) [6], if selected by MPAI for inclusion in the MPAI-CAE standard.
The standard will be developed with the following guidelines:
- To satisfy the Functional Requirements (N151) [1], available online. In the future, MPAI may decide to extend MPAI-CAE to support other Use Cases.
- To use, where feasible and desirable, the same basic technologies required by the companion document MPAI-MMC Use Cases and Functional Requirements [2].
- To be suitable for implementation as AI Modules (AIM) conforming to the emerging MPAI AI Framework (MPAI-AIF) standard based on the responses to the Call for Technologies (N100) [5] satisfying the MPAI-AIF Functional Requirements (N74) [4].
MPAI has decided to base its application standards on the AIM and AIF notions whose functional requirements have been identified in [1] rather than follow the approach of defining end-to-end systems. It has done so because:
- AIMs allow the reduction of a large problem to a set of smaller problems.
- AIMs can be independently developed and made available to an open competitive market.
- An implementor can build a sophisticated and complex system with potentially limited knowledge of all the technologies required by the system.
- An MPAI system has an inherent explainability.
- MPAI systems allow for competitive comparisons of functionally equivalent AIMs.
Respondents should be aware that:
- Use Cases that make up MPAI-CAE, the Use Cases themselves and the AIM internals will be non-normative.
- The input and output interfaces of the AIMs, whose requirements have been derived to support the Use Cases, will be normative.
Therefore, the scope of this Call for Technologies is restricted to technologies required to implement the input and output interfaces of the AIMs identified in N151 [1].
However, MPAI invites comments on any technology or architectural component identified in N151, specifically,
- Additions or removals of input/output signals to the identified AIMs with justification of the changes and identification of data formats required by the new input/output signals.
- Possible alternative partitioning of the AIMs implementing the example cases providing:
- Arguments in support of the proposed partitioning
- Detailed specifications of the input and output data of the proposed new AIMs
- New Use Cases fully described as in N151.
All parties who believe they have relevant technologies satisfying all or most of the requirements of one or more than one Use Case described in N151 are invited to submit proposals for consideration by MPAI. MPAI membership is not a prerequisite for responding to this CfT. However, proponents should be aware that, if their proposal or part thereof is accepted for inclusion in the MPAI-CAE standard, they shall immediately join MPAI, or their accepted technologies will be discarded.
MPAI will select the most suitable technologies based on their technical merits for inclusion in MPAI-CAE. However, MPAI in not obligated, by virtue of this CfT, to select a particular technology or to select any technology if those submitted are found inadequate.
Submissions are due on 2021/04/12T23:59 UTC and should be sent to the MPAI secretariat (secretariat@mpai.community). The secretariat will acknowledge receipt of the submission via email. Submissions will be reviewed according to the schedule that the 7th MPAI General Assembly (MPAI-7) will define at its online meeting on 2021/04/14. For details on how submitters who are not MPAI members can attend the said review please contact the MPAI secretariat (secretariat@mpai.community).
2 How to submit a response
Those planning to respond to this CfT:
- Are advised that online events will be held on 2021/02/24 and 2021/03/10 to present the MPAI-CAE CfT and respond to questions. Logistic information on these events will be posted on the MPAI web site.
- Are requested to communicate their intention to respond to this CfT with an initial version of the form of Annex A to the MPAI secretariat (secretariat@mpai.community) by 2021/03/16. A potential submitter making a communication using the said form is not required to actually make a submission. A submission will be accepted even if the submitter did not communicate their intention to submit a response by the said date.
- Are advised to visit regularly the https://mpai.community/how-to-join/calls-for-technologies/ web site where relevant information will be posted.
Responses to this MPAI-CAE CfT shall/may include:
Table 1 – Mandatory and optional elements of a response
Item | Status |
Detailed documentation describing the proposed technologies | mandatory |
The final version of Annex A | mandatory |
The text of Annex B duly filled out with the table indicating which requirements identified in MPAI N151 [1] are satisfied. If all the requirements of a Use Case are not satisfied, this should be explained. | mandatory |
Comments on the completeness and appropriateness of the MPAI-CAE requirements and any motivated suggestion to amend or extend those requirements. | optional |
A preliminary demonstration, with a detailed document describing it. | optional |
Any other additional relevant information that may help evaluate the submission, such as additional use cases. | optional |
The text of Annex E. | mandatory |
Respondents are invited to take advantage of the check list of Annex C before submitting their response and filling out Annex A.
Respondents are requested to present their submission (mandatory) at a meeting by teleconference that will be properly announced to submitters by the MPAI Secretariat. If no presenter will attend the meeting, the proposal will be discarded.
Respondents are advised that, upon acceptance by MPAI of their submission in whole or in part for further evaluation, MPAI will require that:
- A working implementation, including source code, – for use in the development of the MPAI-CAE Reference Software and later publication as a standard by MPAI – be made available before the technology is accepted for inclusion in the MPAI-CAE standard. Software may be written in programming languages that can be compiled or interpreted and in hardware description languages.
- The working implementation be suitable for operation in the MPAI AIF Framework (MPAI-AIF).
- A non-MPAI member immediately join MPAI. If the non-MPAI member elects not to do so, their submission will be discarded. Direction on how to join MPAI can be found online.
Further information on MPAI can be obtained from the MPAI website.
3 Evaluation Criteria and Procedure
Proposals will be assessed using the following process:
- Evaluation panel is created from:
- All CAE-DC members attending.
- Non-MPAI members who are respondents.
- Non respondents/non MPAI member experts invited in a consulting capacity.
- No one from 1.1.-1.2. will be denied membership in the Evaluation panel.
- Respondents present their proposals.
- Evaluation Panel members ask questions.
- If required subjective and/or objective tests are carried out:
- Define required tests.
- Carry out the tests.
- Produce report.
- At least 2 reviewers will be appointed to review & report about specific points in a proposal if required.
- Evaluation panel members fill out Annex B for each proposal.
- Respondents respond to evaluations.
- Proposal evaluation report is produced.
4 Expected development timeline
Timeline of the CfT, deadlines and response evaluation:
Table 2 – Dates and deadlines
Step | Date |
Call for Technologies | 2021/02/17 |
CfT introduction conference call 1 | 2021/02/24T14:00 UTC |
CfT introduction conference call 2 | 2021/03/10T15:00 UTC |
Notification of intention to submit proposal | 2021/03/16T23.59 UTC |
Submission deadline | 2021/04/12T23.59 UTC |
Evaluation of responses will start | 2021/04/14 (MPAI-7) |
Evaluation to be carried out during 2-hour sessions according to the calendar agreed at MPAI-7.
5 References
- MPAI-AIF Use Cases & Functional Requirements, N74; https://mpai.community/standards/mpai-aif/
- MPAI-AIF Call for Technologies, N100; https://mpai.community/standards/mpai-aif/#Technologies
- MPAI-AIF Framework Licence, MPAI N171; https://mpai.community/standards/mpai-aif/#Licence
- MPAI-CAE Use Cases & Functional Requirements; MPAI N151; https://mpai.community/standards/mpai-cae/#UCFR
- MPAI-CAE Call for Technologies, MPAI N152; https://mpai.community/standards/mpai-cae/#Technologies
- MPAI-CAE Framework Licence, MPAI N171; https://mpai.community/standards/mpai-cae/#Licence
- MPAI-MMC Use Cases & Functional Requirements; MPAI N153; https://mpai.community/standards/mpai-mmc/#UCFR
- MPAI-MMC Call for Technologies, MPAI N154; https://mpai.community/standards/mpai-mmc/#Technologies
- MPAI-MMC Framework Licence, N173; https://mpai.community/standards/mpai-mmc/#Licence
Annex A: Information Form
This information form is to be filled in by a Respondent to the MPAI-CAE CfT
- Title of the proposal
- Organisation: company name, position, e-mail of contact person
- What are the main functionalities of your proposal?
- Does your proposal provide or describe a formal specification and APIs?
- Will you provide a demonstration to show how your proposal meets the evaluation criteria?
Annex B: Evaluation Sheet
NB: This evaluation sheet will be filled out by members of the Evaluation Team.
Proposal title:
Main Functionalities:
Response summary: (a few lines)
Comments on Relevance to the CfT (Requirements):
Comments on possible MPAI-CAE profiles[1]
Evaluation table:
Table 3 – Assessment of submission features
Note 1 | The semantics of Submission features is provided by Table 4 |
Note 2 | Evaluation elements indicate the elements used by the evaluator in assessing the submission |
Note 3 | Final Assessment indicates the ultimate assessment based on the Evaluation Elements |
Submission features | Evaluation elements | Final Assessment |
Completeness of description | ||
Understandability | ||
Extensibility | ||
Use of Standard Technology | ||
Efficiency | ||
Test cases | ||
Maturity of reference implementation | ||
Relative complexity | ||
Support of MPAI use cases | ||
Support of non-MPAI use cases |
Content of the criteria table cells:
Evaluation facts should mention:
- Not supported / partially supported / fully supported.
- What supported these facts: submission/presentation/demo.
- The summary of the facts themselves, e.g., very good in one way, but weak in another.
Final assessment should mention:
- Possibilities to improve or add to the proposal, e.g., any missing or weak features.
- How sure the evaluators are, i.e., evidence shown, very likely, very hard to tell, etc.
- Global evaluation (Not Applicable/ –/ – / + / ++)
New Use Cases/Requirements Identified:
(please describe)
- Evaluation summary:
- Main strong points, qualitatively:
- Main weak points, qualitatively:
- Overall evaluation: (0/1/2/3/4/5)
0: could not be evaluated
1: proposal is not relevant
2: proposal is relevant, but requires significant more work
3: proposal is relevant, but with a few changes
4: proposal has some very good points, so it is a good candidate for standard
5: proposal is superior in its category, very strongly recommended for inclusion in standard
Additional remarks: (points of importance not covered above.)
The submission features in Table 3 are explained in the following Table 4.
Table 4 – Explanation of submission features
Submission features | Criteria |
Completeness of description | Evaluators should
1. Compare the list of requirements (Annex C of the CfT) with the submission. 2. Check if respondents have described in sufficient detail to what part of the requirements their proposal refers to. NB1: Completeness of a proposal for a Use Case is a merit because reviewers can assess that the components are integrated. NB2: Submissions will be judged for the merit of what is proposed. A submission on a single technology that is excellent may be considered instead of a submission that is complete but has a less performing technology. |
Understandability | Evaluators should identify items that are demonstrably unclear (inconsistencies, sentences with dubious meaning etc.) |
Extensibility | Evaluators should check if respondent has proposed extensions to the Use Cases.
NB: Extensibility is the capability of the proposed solution to support use cases that are not supported by current requirements. |
Use of standard Technology | Evaluators should check if new technologies are proposed where widely adopted technologies exist. If this is the case, the merit of the new technology shall be proved. |
Efficiency | Evaluators should assess power consumption, computational speed, computational complexity. |
Test cases | Evaluators should report whether a proposal contains suggestions for testing the technologies proposed |
Maturity of reference implementation | Evaluators should assess the maturity of the proposal.
Note 1: Maturity is measured by the completeness, i.e., having all the necessary information and appropriate parts of the HW/SW implementation of the submission disclosed. Note 2: If there are parts of the implementation that are not disclosed but demonstrated, they will be considered if and only if such components are replicable. |
Relative complexity | Evaluators should identify issues that would make it difficult to implement the proposal compared to the state of the art. |
Support of MPAI-CAE use cases | Evaluators should check how many use cases are supported in the submission |
Support of non MPAI-CAE use cases | Evaluators should check whether the technologies proposed can demonstrably be used in other significantly different use cases. |
Annex C: Requirements check list
Please note the following acronyms
KB | Knowledge Base |
QF | Query Format |
Table 5 – List of technologies identified in MPAI-CAE N151 [1]
Note: The numbers in the first column refer to the section numbers of N151 [1].
Technologies by Use Cases | Response | |
Emotion-Enhanced Speech | ||
4.2.4.1 | Digital Speech | Y/N |
4.2.4.2 | Emotion | Y/N |
4.2.4.3 | Emotion KB query format | Y/N |
4.2.4.4 | Emotion descriptors | Y/N |
Audio Recording Preservation | ||
4.3.4.1 | Digital Audio | Y/N |
4.3.4.2 | Digital Video | Y/N |
4.3.4.3 | Digital Image | Y/N |
4.3.4.4 | Tape irregularity KB query format | Y/N |
4.3.4.5 | Text | Y/N |
4.3.4.6 | Packager | Y/N |
Enhanced Audioconference Experience | ||
4.4.4.1 | Digital Speech | Y/N |
4.4.4.2 | Microphone geometry information | Y/N |
4.4.4.3 | Output device acoustic model metadata KB query format | Y/N |
4.4.4.4 | Delivery | Y/N |
Audio-on-the-go | ||
4.5.4.1 | Digital Audio | Y/N |
4.5.4.2 | Microphone geometry information | Y/N |
4.5.4.3 | Sound array | Y/N |
4.5.4.4 | Sound categorisation KB query format | Y/N |
4.5.4.5 | Sounds categorisation | Y/N |
4.5.4.6 | User Hearing Profiles KB query format | Y/N |
4.5.4.7 | Delivery | Y/N |
Respondent should consult the equivalent list in N154 [8] as some technologies are common or have a degree of similarity.
Annex D: Technologies that may require specific testing
Emotion-Enhanced Speech | Speech features |
Emotion-Enhanced Speech | Emotion descriptors |
Audio Recording Preservation | Image features |
Additional technologies may be identified during the evaluation phase.
Annex E: Mandatory text in responses
A response to this MPAI-CAE CfT shall mandatorily include the following text
<Company/Member> submits this technical document in response to MPAI Call for Technologies for MPAI project MPAI-CAE (N151).
<Company/Member> explicitly agrees to the steps of the MPAI standards development process defined in Annex 1 to the MPAI Statutes (N80), in particular <Company/Member> declares that <Company/Member> or its successors will make available the terms of the Licence related to its Essential Patents according to the Framework Licence of MPAI-CAE (N171), alone or jointly with other IPR holders after the approval of the MPAI-CAE Technical Specification by the General Assembly and in no event after commercial implementations of the MPAI-CAE Technical Specification become available on the market.
In case the respondent is a non-MPAI member, the submission shall mandatorily include the following text
If (a part of) this submission is identified for inclusion in a specification, <Company> understands that <Company> will be requested to immediately join MPAI and that, if <Company> elects not to join MPAI, this submission will be discarded.
Subsequent technical contribution shall mandatorily include this text
<Member> submits this document to MPAI-CAE Development Committee (CAE-DC) as a contribution to the development of the MPAI-CAE Technical Specification.
<Member> explicitly agrees to the steps of the MPAI standards development process defined in Annex 1 to the MPAI Statutes (N80), in particular <Company> declares that <Company> or its successors will make available the terms of the Licence related to its Essential Patents according to the Framework Licence of MPAI-CAE (N171), alone or jointly with other IPR holders after the approval of the MPAI-CAE Technical Specification by the General Assembly and in no event after commercial implementations of the MPAI-CAE Technical Specification become available on the market.
[1] Profile of a standard is a particular subset of the technologies that are used in a standard and, where applicable, the classes, subsets, options and parameters relevan for the subset
Clarifications of Call for Technologies
MPAI-5 has approved the MPAI-CAE Use Cases and Functional Requirements (N151) as attachment to the Calls for Technologies N171. However, the source CAE-DC has identified some issues that are worth a clarificaton. This is posted on the MPAI web site and will be communicated directly to those who have informed the Secretariat of their intention to respond.
General issue
MPAI understands that the scope of N151 is very broad. Therefore it reiterates the point made in N152 that:
Completeness of a proposal for a Use Case is a merit because reviewers can assess that the components are integrated. However, submissions will be judged for the merit of what is proposed. A submission on a single technology that is excellent may be considered instead of a submission that is complete but has a less performing technology.
Emotion-Enhanced Speech (Use case #1 in N151)
The Functional Requirements of the Use Case does not explicitly indicate the form in which speech without emotion and Emotion enter the EES system. Possible modalities are
- A speech file and a separate Emotion file where the sequence of Emotions carries time stamps
- An interleaved stream of speech and Emotions
MPAI welcomes proposals addressing these issues.
The assessment of submissions by Respondents who elect not to not answer this point will not influence the assessment of the rest of their submission
Audio Recording Preservation (Use case #2 in N151)
MPAI welcomes proposed semantics of the information conveyed in the Text output of the Musicological classifier. It is not clear what information the Musicological classifier is providing.
The assessment of submissions by Respondents who elect not to not answer this point will not influence the assessment of the rest of their submission
Enhanced Audioconference Experience (Use case #3 in N151)
The function of the Speech detection and separation AIM is described as “Separation of relevant speech vs non-speech signals”.
As the description is possibly misleading, we inform submitters that the correct description of the AIM should be “Separation of relevant speech vs other signals (including unwanted speech)”.
The assessment of submissions by Respondents who base their submission on the text in N151 will not be affected by this clarification.
References
- MPAI-CAE Use Cases & Functional Requirements; MPAI N151; https://mpai.community/standards/mpai-cae/#UCFR
- MPAI-CAE Call for Technologies, MPAI N152; https://mpai.community/standards/mpai-cae/#Technologies
- MPAI-CAE Framework Licence, MPAI N171; https://mpai.community/standards/mpai-cae/#Licence