Context-based Audio Enhancement (MPAI-CAE)

This document is also available in MS Word format as MPAI-CAE Call for Technologies

1       Introduction. 1

2       How to submit a response. 3

3       Evaluation Criteria and Procedure. 4

4       Expected development timeline. 4

5       References. 4

Annex A: Information Form.. 6

Annex B: Evaluation Sheet 7

Annex C: Requirements check list 10

Annex D: Technologies that may require specific testing. 11

Annex E: Mandatory text in responses. 12

1        Introduction

Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) is an international non-profit organisation with the mission to develop standards for Artificial Intelligence (AI) enabled digital data coding and for technologies that facilitate integration of data coding components into ICT systems. With the mechanism of Framework Licences, MPAI seeks to attach clear IPR licensing frameworks to its standards.

MPAI has found that the application area called “Context-based Audio Enhancement” is particul­arly relevant for MPAI standardisation because using context information to act on the input audio content can substantially improve the user experience of a variety of audio-related applications that include entertainment, communication, teleconferencing, gaming, post-produc­tion, restor­ation etc. for a variety of contexts such as in the home, in the car, on-the-go, in the studio etc.

Therefore, MPAI intends to develop a standard – to be called MPAI-CAE – that will provide standard tech­nologies to implement four Use Cases identified so far:

  1. Emotion-Enhanced Speech (EES)
  2. Audio Recording Preservation (ARP)
  3. Enhanced Audioconference Experience (EAC)
  4. Audio-on-the-go (AOG)

This document is a Call for Technologies (CfT) that

  1. Satisfy the MPAI-CAE Functional Requirements (N151) [4] and
  2. Are released according to the MPAI-CAE Framework Licence (N171) [6], if selected by MPAI for inclusion in the MPAI-CAE standard.

The standard will be developed with the following guidelines:

  1. To satisfy the Functional Requirements (N151) [1], available online. In the future, MPAI may decide to extend MPAI-CAE to support other Use Cases.
  2. To use, where feasible and desirable, the same basic tech­nol­ogies required by the companion document MPAI-MMC Use Cases and Functional Requir­ements [2].
  3. To be suitable for implementation as AI Modules (AIM) conforming to the emerging MPAI AI Framework (MPAI-AIF) standard based on the responses to the Call for Technologies (N100) [5] satisfying the MPAI-AIF Functional Requirements (N74) [4].

MPAI has decided to base its application standards on the AIM and AIF notions whose functional requirements have been identified in [1] rather than follow the approach of defining end-to-end systems. It has done so because:

  1. AIMs allow the reduction of a large problem to a set of smaller problems.
  2. AIMs can be independently developed and made available to an open competitive market.
  3. An implementor can build a sophisticated and complex system with potentially limited know­ledge of all the tech­nologies required by the system.
  4. An MPAI system has an inherent explainability.
  5. MPAI systems allow for competitive comparisons of functionally equivalent AIMs.

Respondents should be aware that:

  1. Use Cases that make up MPAI-CAE, the Use Cases themselves and the AIM internals will be non-normative.
  2. The input and output interfaces of the AIMs, whose requirements have been derived to support the Use Cases, will be normative.

Therefore, the scope of this Call for Technologies is restricted to technologies required to implement the input and output interfaces of the AIMs identified in N151 [1].

However, MPAI invites comments on any technology or architectural component identified in N151, specifically,

  1. Additions or removals of input/output signals to the identified AIMs with justification of the changes and identification of data formats required by the new input/output signals.
  2. Possible alternative partitioning of the AIMs implementing the example cases providing:
    1. Arguments in support of the proposed partitioning
    2. Detailed specifications of the input and output data of the proposed new AIMs
  3. New Use Cases fully described as in N151.

All parties who believe they have relevant technologies satisfying all or most of the requirements of one or more than one Use Case described in N151 are invited to submit proposals for consid­eration by MPAI. MPAI membership is not a prerequisite for responding to this CfT. However, proponents should be aware that, if their proposal or part thereof is accepted for inclusion in the MPAI-CAE standard, they shall immediately join MPAI, or their accepted technologies will be discarded.

MPAI will select the most suitable technologies based on their technical merits for inclusion in MPAI-CAE. However, MPAI in not obligated, by virtue of this CfT, to select a particular tech­nology or to select any technology if those submitted are found inadequate.

Submissions are due on 2021/04/12T23:59 UTC and should be sent to the MPAI secretariat (secretariat@mpai.community). The secretariat will acknowledge receipt of the submission via email. Submissions will be reviewed according to the schedule that the 7th MPAI General Assembly (MPAI-7) will define at its online meeting on 2021/04/14. For details on how submitters who are not MPAI members can attend the said review please contact the MPAI secretariat (secretariat@mpai.community).

2        How to submit a response

Those planning to respond to this CfT:

  1. Are advised that online events will be held on 2021/02/24 and 2021/03/10 to present the MPAI-CAE CfT and respond to questions. Logistic information on these events will be posted on the MPAI web site.
  2. Are requested to communicate their intention to respond to this CfT with an initial version of the form of Annex A to the MPAI secretariat (secretariat@mpai.community) by 2021/03/16. A potential submitter making a communication using the said form is not required to actually make a submission. A submission will be accepted even if the submitter did not communicate their intention to submit a response by the said date.
  3. Are advised to visit regularly the https://mpai.community/how-to-join/calls-for-technologies/ web site where relevant information will be posted.

Responses to this MPAI-CAE CfT shall/may include:

Table 1 – Mandatory and optional elements of a response

Item Status
Detailed documentation describing the proposed technologies mandatory
The final version of Annex A mandatory
The text of Annex B duly filled out with the table indicating which requirements identified in MPAI N151 [1] are satisfied. If all the requirements of a Use Case are not satisfied, this should be explained. mandatory
Comments on the completeness and appropriateness of the MPAI-CAE requirem­ents and any motivated suggestion to amend or extend those requirements. optional
A preliminary demonstration, with a detailed document describing it. optional
Any other additional relevant information that may help evaluate the submission, such as additional use cases. optional
The text of Annex E. mandatory

Respondents are invited to take advantage of the check list of Annex C before submitting their response and filling out Annex A.

Respondents are requested to present their submission (mandatory) at a meeting by teleconference that will be properly announced to submitters by the MPAI Secretariat. If no presenter will attend the meeting, the proposal will be discarded.

Respondents are advised that, upon acceptance by MPAI of their submission in whole or in part for further evaluation, MPAI will require that:

  • A working implementation, including source code, – for use in the development of the MPAI-CAE Reference Software and later publication as a standard by MPAI – be made available before the technology is accepted for inclusion in the MPAI-CAE standard. Software may be written in programming languages that can be compiled or interpreted and in hardware description languages.
  • The working implementation be suitable for operation in the MPAI AIF Framework (MPAI-AIF).
  • A non-MPAI member immediately join MPAI. If the non-MPAI member elects not to do so, their submission will be discarded. Direction on how to join MPAI can be found online.

Further information on MPAI can be obtained from the MPAI website.

3        Evaluation Criteria and Procedure

Proposals will be assessed using the following process:

  1. Evaluation panel is created from:
    1. All CAE-DC members attending.
    2. Non-MPAI members who are respondents.
    3. Non respondents/non MPAI member experts invited in a consulting capacity.
  2. No one from 1.1.-1.2. will be denied membership in the Evaluation panel.
  3. Respondents present their proposals.
  4. Evaluation Panel members ask questions.
  5. If required subjective and/or objective tests are carried out:
    1. Define required tests.
    2. Carry out the tests.
    3. Produce report.
  6. At least 2 reviewers will be appointed to review & report about specific points in a proposal if required.
  7. Evaluation panel members fill out Annex B for each proposal.
  8. Respondents respond to evaluations.
  9. Proposal evaluation report is produced.

4        Expected development timeline

Timeline of the CfT, deadlines and response evaluation:

Table 2 – Dates and deadlines

Step Date
Call for Technologies 2021/02/17
CfT introduction conference call 1 2021/02/24T14:00 UTC
CfT introduction conference call 2 2021/03/10T15:00 UTC
Notification of intention to submit proposal 2021/03/16T23.59 UTC
Submission deadline 2021/04/12T23.59 UTC
Evaluation of responses will start 2021/04/14 (MPAI-7)

Evaluation to be carried out during 2-hour sessions according to the calendar agreed at MPAI-7.

5        References

  1. MPAI-AIF Use Cases & Functional Requirements, N74; https://mpai.community/standards/mpai-aif/
  2. MPAI-AIF Call for Technologies, N100; https://mpai.community/standards/mpai-aif/#Technologies
  3. MPAI-AIF Framework Licence, MPAI N171; https://mpai.community/standards/mpai-aif/#Licence
  4. MPAI-CAE Use Cases & Functional Requirements; MPAI N151; https://mpai.community/standards/mpai-cae/#UCFR
  5. MPAI-CAE Call for Technologies, MPAI N152; https://mpai.community/standards/mpai-cae/#Technologies
  6. MPAI-CAE Framework Licence, MPAI N171; https://mpai.community/standards/mpai-cae/#Licence
  7. MPAI-MMC Use Cases & Functional Requirements; MPAI N153; https://mpai.community/standards/mpai-mmc/#UCFR
  8. MPAI-MMC Call for Technologies, MPAI N154; https://mpai.community/standards/mpai-mmc/#Technologies
  9. MPAI-MMC Framework Licence, N173; https://mpai.community/standards/mpai-mmc/#Licence

Annex A: Information Form

This information form is to be filled in by a Respondent to the MPAI-CAE CfT

  1. Title of the proposal
  2. Organisation: company name, position, e-mail of contact person
  3. What are the main functionalities of your proposal?
  4. Does your proposal provide or describe a formal specification and APIs?
  5. Will you provide a demonstration to show how your proposal meets the evaluation criteria?

Annex B: Evaluation Sheet

NB: This evaluation sheet will be filled out by members of the Evaluation Team.

Proposal title:

Main Functionalities:

Response summary: (a few lines)

Comments on Relevance to the CfT (Requirements):

Comments on possible MPAI-CAE profiles[1]

Evaluation table:

Table 3Assessment of submission features

Note 1 The semantics of Submission features is provided by Table 4
Note 2 Evaluation elements indicate the elements used by the evaluator in assessing the submission
Note 3 Final Assessment indicates the ultimate assessment based on the Evaluation Elements

 

Submission features Evaluation elements Final Assessment
Completeness of description

Understandability

Extensibility

Use of Standard Technology

Efficiency

Test cases

Maturity of reference implementation

Relative complexity

Support of MPAI use cases

Support of non-MPAI use cases

Content of the criteria table cells:

Evaluation facts should mention:

  • Not supported / partially supported / fully supported.
  • What supported these facts: submission/presentation/demo.
  • The summary of the facts themselves, e.g., very good in one way, but weak in another.

Final assessment should mention:

  • Possibilities to improve or add to the proposal, e.g., any missing or weak features.
  • How sure the evaluators are, i.e., evidence shown, very likely, very hard to tell, etc.
  • Global evaluation (Not Applicable/ –/ – / + / ++)

New Use Cases/Requirements Identified:

(please describe)

  •  Evaluation summary:
  •  Main strong points, qualitatively:
  •  Main weak points, qualitatively:
  • Overall evaluation: (0/1/2/3/4/5)

0: could not be evaluated

1: proposal is not relevant

2: proposal is relevant, but requires significant more work

3: proposal is relevant, but with a few changes

4: proposal has some very good points, so it is a good candidate for standard

5: proposal is superior in its category, very strongly recommended for inclusion in standard

Additional remarks: (points of importance not covered above.)

The submission features in Table 3 are explained in the following Table 4.

Table 4 – Explanation of submission features

Submission features Criteria
Completeness of description Evaluators should

1.     Compare the list of requirements (Annex C of the CfT) with the submission.

2.     Check if respondents have described in sufficient detail to what part of the requirements their proposal refers to.

NB1: Completeness of a proposal for a Use Case is a merit because reviewers can assess that the components are integrated.

NB2: Submissions will be judged for the merit of what is proposed. A submission on a single technology that is excellent may be considered instead of a submission that is complete but has a less performing technology.

Understandability Evaluators should identify items that are demonstrably unclear (incon­sistencies, sentences with dubious meaning etc.)
Extensibility Evaluators should check if respondent has proposed extensions to the Use Cases.

NB: Extensibility is the capability of the proposed solution to support use cases that are not supported by current requirements.

Use of standard Technology Evaluators should check if new technologies are proposed where widely adopted technologies exist. If this is the case, the merit of the new tech­nology shall be proved.
Efficiency Evaluators should assess power consumption, computational speed, computational complexity.
Test cases Evaluators should report whether a proposal contains suggestions for testing the technologies proposed
Maturity of reference implementation Evaluators should assess the maturity of the proposal.

Note 1: Maturity is measured by the completeness, i.e., having all the necessary information and appropriate parts of the HW/SW implementation of the submission disclosed.

Note 2: If there are parts of the implementation that are not disclosed but demonstrated, they will be considered if and only if such components are replicable.

Relative complexity Evaluators should identify issues that would make it difficult to implement the proposal compared to the state of the art.
Support of MPAI-CAE use cases Evaluators should check how many use cases are supported in the submission
Support of non MPAI-CAE use cases Evaluators should check whether the technologies proposed can demonstrably be used in other significantly different use cases.

Annex C: Requirements check list

Please note the following acronyms

KB Knowledge Base
QF Query Format

Table 5 – List of technologies identified in MPAI-CAE N151 [1]

Note: The numbers in the first column refer to the section numbers of N151 [1].

Technologies by Use Cases Response
Emotion-Enhanced Speech
4.2.4.1 Digital Speech Y/N
4.2.4.2 Emotion Y/N
4.2.4.3 Emotion KB query format Y/N
4.2.4.4 Emotion descriptors Y/N
Audio Recording Preservation
4.3.4.1 Digital Audio Y/N
4.3.4.2 Digital Video Y/N
4.3.4.3 Digital Image Y/N
4.3.4.4 Tape irregularity KB query format Y/N
4.3.4.5 Text Y/N
4.3.4.6 Packager Y/N
Enhanced Audioconference Experience
4.4.4.1 Digital Speech Y/N
4.4.4.2 Microphone geometry information Y/N
4.4.4.3 Output device acoustic model metadata KB query format Y/N
4.4.4.4 Delivery Y/N
Audio-on-the-go
4.5.4.1 Digital Audio Y/N
4.5.4.2 Microphone geometry information Y/N
4.5.4.3 Sound array Y/N
4.5.4.4 Sound categorisation KB query format Y/N
4.5.4.5 Sounds categorisation Y/N
4.5.4.6 User Hearing Profiles KB query format Y/N
4.5.4.7 Delivery Y/N

Respondent should consult the equivalent list in N154 [8] as some technologies are common or have a degree of similarity.

Annex D: Technologies that may require specific testing

Emotion-Enhanced Speech Speech features
Emotion-Enhanced Speech Emotion descriptors
Audio Recording Preservation Image features

Additional technologies may be identified during the evaluation phase.

Annex E: Mandatory text in responses

A response to this MPAI-CAE CfT shall mandatorily include the following text

<Company/Member> submits this technical document in response to MPAI Call for Technologies for MPAI project MPAI-CAE (N151).

 <Company/Member> explicitly agrees to the steps of the MPAI standards development process defined in Annex 1 to the MPAI Statutes (N80), in particular <Company/Member> declares that  <Com­pany/Member> or its successors will make available the terms of the Licence related to its Essential Patents according to the Framework Licence of MPAI-CAE (N171), alone or jointly with other IPR holders after the approval of the MPAI-CAE Technical Specif­ication by the General Assembly and in no event after commercial implementations of the MPAI-CAE Technical Specification become available on the market.

In case the respondent is a non-MPAI member, the submission shall mandatorily include the following text

If (a part of) this submission is identified for inclusion in a specification, <Company>  understands that  <Company> will be requested to immediately join MPAI and that, if  <Company> elects not to join MPAI, this submission will be discarded.

Subsequent technical contribution shall mandatorily include this text

<Member> submits this document to MPAI-CAE Development Committee (CAE-DC) as a con­tribution to the development of the MPAI-CAE Technical Specification.

 <Member> explicitly agrees to the steps of the MPAI standards development process defined in Annex 1 to the MPAI Statutes (N80), in particular  <Company> declares that <Company> or its successors will make available the terms of the Licence related to its Essential Patents according to the Framework Licence of MPAI-CAE (N171), alone or jointly with other IPR holders after the approval of the MPAI-CAE Technical Specification by the General Assembly and in no event after commercial implementations of the MPAI-CAE Technical Specification become available on the market.

[1] Profile of a standard is a particular subset of the technologies that are used in a standard and, where applicable, the classes, subsets, options and parameters relevan for the subset

 


Clarifications of Call for Technologies

MPAI-5 has approved the MPAI-CAE Use Cases and Functional Requirements (N151) as attachment to the Calls for Technologies N171. However, the source CAE-DC has identified some issues that are worth a clarificaton. This is posted on the MPAI web site and will be com­mun­icated directly to those who have informed the Secretariat of their intention to respond.

General issue

MPAI understands that the scope of N151 is very broad. Therefore it reiterates the point made in N152 that:

Completeness of a proposal for a Use Case is a merit because reviewers can assess that the components are integrated. However, submissions will be judged for the merit of what is proposed. A submission on a single technology that is excellent may be considered instead of a submission that is complete but has a less performing technology.

Emotion-Enhanced Speech (Use case #1 in N151)

The Functional Requirements of the Use Case does not explicitly indicate the form in which speech without emotion and Emotion enter the EES system. Possible modalities are

  1. A speech file and a separate Emotion file where the sequence of Emotions carries time stamps
  2. An interleaved stream of speech and Emotions

MPAI welcomes proposals addressing these issues.

The assessment of submissions by Respondents who elect not to not answer this point will not influence the assessment of the rest of their submission

Audio Recording Preservation (Use case #2 in N151)

MPAI welcomes proposed semantics of the information conveyed in the Text output of the Musicological classifier. It is not clear what information the Musicological classifier is providing.

The assessment of submissions by Respondents who elect not to not answer this point will not influence the assessment of the rest of their submission

Enhanced Audioconference Experience (Use case #3 in N151)

The function of the Speech detection and separation AIM is described as “Separation of relevant speech vs non-speech signals”.

As the description is possibly misleading, we inform submitters that the correct description of the AIM should be “Separation of relevant speech vs other signals (including unwanted speech)”.

The assessment of submissions by Respondents who base their submission on the text in N151 will not be affected by this clarification.

References

  1. MPAI-CAE Use Cases & Functional Requirements; MPAI N151; https://mpai.community/standards/mpai-cae/#UCFR
  2. MPAI-CAE Call for Technologies, MPAI N152; https://mpai.community/standards/mpai-cae/#Technologies
  3. MPAI-CAE Framework Licence, MPAI N171; https://mpai.community/standards/mpai-cae/#Licence