Search results for MPAI-AIF Framework Licence

MPAI receives technologies for its AI framework standard and calls for technologies supporting audio and human-machine conversation

Geneva, Switzerland – 17 february 2021. At its 5th General Assembly, Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI), an international, unaffiliated standards association

  1. Has kicked off work on its AI Framework (MPAI-AIF) standard after receiving substantial proposed technologies.
  2. Is calling for technologies to develop two standards related to audio (MPAI-CAE) and multimodal conversation (MPAI-MMC).
  3. Will soon be developing the Framework Licence for the next maturing project “Compression and Understanding of Industrial Data” (MPAI-CUI).

MPAI has reviewed responses to the call issued 2 months ago for tech­nologies supporting its AI Framework (MPAI-AIF) standard. The goal is to enable creation and automation of mixed Mach­ine Learning (ML) – Artificial Intelligence (AI) – Data Proces­sing (DP) and inference workflows, implemented as software, hardware or mixed software and hardware. MPAI-AIF will offer extended explainability to applications conforming to MPAI standards. The submissions received are enabling MPAI to develop the inten­ded standard whose publication is planned for July 2021.

MPAI has issued two Calls for Technologies supporting two new standards:

  1. The Context-based Audio Enhancement (MPAI-CAE) standard will improve the user exper­ien­ce for several audio-related applications in a variety of contexts such as in the home, in the car, on-the-go, in the studio etc. Examples of use are adding a desired emotion to a speech without emotion, preserving old audio tapes, improving the audioconference experience and removing unwanted sounds while keeping the relevant ones to a user walking in the street.
  2. The Multimodal Conversation (MPAI-MMC) standard will enable a human-mach­ine conver­sation that emulates human-human conversation in completeness and intensity by using AI. Examples of use are an audio-visual conversation with a machine where the machine is imper­sonated by a synthesised voice and an animated face, a request for information about an object while displaying it, a human question to a machine translated using a voice preserving the speech features of the human.

The content of the two Calls for Technologies will be introduced at two online conferences. Attendance is open to interested parties.

MPAI has developed the functional requirements for the Compression and Understanding of In­dustrial Data (MPAI-CUI) and, by decision of the General Assembly, MPAI Active Members may now develop the Framework Licence to facilitate the actual licences – to be developed outside of MPAI. The standard will be used to assess the risks faced by a company by using information from the flow of data produced.

The MPAI web site provides information about other AI-based standards being developed: AI-Enhanced Video Coding (MPAI-EVC) will improve the performance of existing video codecs, Integrative Genomic/Sensor Analysis (MPAI-GSA) will compress and understand the res­ults of combining genomic experiments with those produced by related devices/sensors, and Server-based Predictive Multiplayer Gaming (MPAI-SPG) will improve the user experience of online multiplayer game players.

MPAI develops data coding standards for applications that have AI as the core enabling technology. Any legal entity who supports the MPAI mission may join MPAI if it is able to contribute to the development of standards for the efficient use of data.

Visit the MPAI web site and contact the MPAI secretariat for specific information.

 


An introduction to the MPAI-AIF Call for Technologies

On 202/12/21 MPAI has held a teleconference to illustrate the MPAI-AIF Call for Technologies (CfT) and associated Framework Licence (FWL). This article summarises the main points illustrated at the teleconference: Whay and who is MPAI, the MPAI-AIF Functional Requirements, the MPAI_AIF Call for Technologies and the MPAI-AIF Framework Licence.

Miran Choi, an MPAI Director and Chair of the Communication Advisory Committee, recalled the reasons that led to the establishment of MPAI.

Over the past 3 decades, media compression standards have allowed manufacturing and services to boom. However, the technology momentum is progressively slowing while AI technologies are taking stage by offering more capabilities than traditional technologies, by being applicable to data other than audio/video and by being  supported by a global research effort. In addition to that, industry has recently suffered from the inadequacy of the FAIR, Reasonable and Non-Discriminatory (FRAND) model to deal with the tectonic changes of technology-intensive standards.

Miran then summarised the main characteristics of MPAI. A non-profit, unaffiliated and international association that develops

  • Standards for
    • AI-enabled data coding
    • Technologies that facilitate integration of data coding components in ICT systems and
  • Associated clear IPR licensing frameworks.

MPAI is the only standards organisation that has set AI as the key enabling technology for data coding standards. MPAI members come from industry, research and academia of 15 countries, representing a broad spectrum of technologies and applications.

The development of standards must obey rules of openness and due process, and MPAI has a rigorous process to develop standards in 6 steps

Step

Description

Use cases Collect/aggregate use cases in cohesive projects applicable across industries
Functional Requirements Identify the functional requirements the standard should satisfy
Commercial Requirements Develop and approve the framework licence of the stan­dard
Call for Technologies Publish a call for technologies supporting the functional and commercial requirements
Standard development Develop standard in an especially established Devel­opment Committee (DC)
MPAI standard Complete standard and obtain declarations from all Members

The transitions from one stage to the next are approved by the General Assembly.

The MPAI-AIF standard project is at the Call for Technologies stage.

Andrea Basso, Chair of the MPAI-AIF Development Committee (AIF-DC) in charge of the development of the MPAI-AIF standard introduced the motivations and functional requirements of the MPAI-AIF standard.

MPAI has developed several Use Cases for disparate applications coming to the conclusion that they can all be implemented with a combination of AI-based modules concurring to the achievement of the intended result. the history of media standards has shown the benefits of standardisation. Therefore, to avoid the danger of incompatible implementations of modules put to the market, where costs multiply at all levels and mass adoption of AI tech is delayed, MPAI seeks to standardise AI Modules (AIM) with standard interfaces, combined and executed within an MPAI-specified AI Framework. AIMs with standard interfaces will reduce overall design costs and increase component reusability,create favourable conditions leading to horizontal markets of competing implementations, and promote adoption and incite progress of AI technologies.

AIMs need an environment where they can be combined and executed. This is what MPAI- AIF – where AIF stands for AI Framework, is about. The AI Framework is depicted in the figure.

The AI Framework has 6 components: Management and Control, Execution, AI Modules, Communication, Storage and Access.

The MPAI functional requirements are

  1. Possibility to establish general Machine Learning and/or Data Processing life cycles
    1. for single AIMs to
      1. instantiate-configure-remove
      2. dump/retrieve internal state
      3. start-suspend-stop
      4. train-retrain-update
      5. enforce resource limits
      6. implement auto-configuration/reconfiguration of ML-based computational models of
    2. for multiple AIMs to
      1. initialise the overall computational model
      2. instantiate-remove-configure AIMs
      3. manually, automatically, dynamically and adaptively configure interfaces with Com­ponents
      4. one- and two-way signal for computational workflow initialisation and control of
      5. combinations of AIMs
  2. Application-scenario dependent hierarchical execution of workflows
  3. Topology of networked AIMs that can be synchronised according to a given time base and full ML life cycles
  4. Supervised, unsupervised and reinforcement-based learning paradigms
  5. Computational graphs, such as Direct Acyclic Graph (DAG) as a minimum
  6. Initialisation of signalling patterns, communication and security policies between AIMs
  7. Protocols to specify storage access time, retention, read/write throughput etc.
  8. Storage of Components’ data
  9. Access to
    1. Static or slowly changing data with standard formats
    2. Data with proprietary formats
  10. Possibility to implement AI Frameworks featuring
    1. Asynchronous and time-based synchronous operation depending on application
    2. Dynamic update of the ML models with seamless or minimal impact on its operation
    3. Time-sharing operation of ML-based AIMs shall enable use of the same ML-based AIM in multiple concurrent applications
    4. AIMs which are aggregations of AIMs exposing new interfaces
    5. Workflows that are a mixture of AI/ML-based and DP technology-based AIMs.
    6. Scalability of complexity and performance to cope with different scenarios, e.g. from small MCUs to complex distributed systems
  11. Possibility to create MPAI-AIF profiles

Panos Kudumakis, MPAI member, explained the MPAI-AIF Call For Technologies

  1. Who can submit
    1. All parties, including non members, who believe they have relevant technologies
    2. Responses submitted to secretariat who acknowledges via email
    3. Technologies submitted must
      1. Support the requirements of N74
      2. Be released according to the MPAI-AIF Framework Licence (N101) – if selected by MPAI for inclusion in MPAI-AIF
    4. MPAI will select the most suitable technologies on the basis of their technical merits for inclusion in MPAI-AIF.
    5. MPAI in not obligated to select a particular technology or to select any technology if those submitted are found inadequate.
  2. A submission shall contain
    1. Detailed documentation describing the proposed technologies.
    2. Annex A: Information Form (contact info, proposal summary).
    3. Annex B: Evaluation Sheet to be taken into consideration for self-evaluation (quantitative & qualitative) of the submission but will be filled out during the peer-to-peer evaluation phase.
    4. Annex C: Requirements Check List (N74) to be duly filled out indicating (using a table) which requirements identified are satisfied. If a requirement is not satisfied, the submission shall indicate the reason.
    5. Annex D: Mandatory text in responses
  3. A submission may contain
    1. Comments on the completeness and appropriateness of the MPAI-AIF requirements and any motivated suggestion to extend those requirements.
    2. A preliminary demonstration, with a detailed document describing it.
    3. Any other additional relevant information that may help evaluate the submission, such as additional use cases.
  4. Assessment
    1. Respondents must present their submission or proposal is discarded.
    2. If submission is accepted in whole or in part, submitter shall make available a working implementation, including source code (for use in MPAI-AIF Reference Software) before the technology is accepted for the MPAI-AIF standard.
    3. Software can be written in compiled or interpreted programming languages and in hardware description languages.
    4. A submitter who is not an MPAI member shall immediately join MPAI, else submission is discarded.
    5. Assessment guidelines form to aid peer-to-peer evaluation phase being finalised.
  5. Calendar
    1. Call for Technologies 16 Dec (MPAI-3)
    2. Presentation Conference Calls 21 Dec/07 Jan
    3. Notification of intention to submit 15 Jan
    4. Assessment form 20 Jan (MPAI-4)
    5. Submission deadline 15 Feb
    6. Calendar of evaluation of responses 17 Feb (MPAI-5)
    7. Approval of MPAI-AIF standard 19 July (estimate)

Davide Ferri, MPAI Director and Chair of AIF-FWL, the committee that developed the MPAI-AIF Framework Licence (FWL) explained that FWL covers the MPAI-AIF technology that specifies a generic execution environment, possibly integrating Machine Learning, Artificial Intelligence and legacy Data Processing components, implementing application areas such as

  1. Context-based Audio Enhancement (MPAI-CAE)
  2. Integrative Genomic/Sensor Analysis (MPAI-GSA)
  3. AI-Enhanced Video Coding (MPAI-EVC)
  4. Server-based Predictive Multiplayer Gaming (MPAI-SPG)
  5. Multi-Modal Conversation (MPAI-MMC)
  6. Compression and Understanding of Industrial data (MPAI-CUI)

These six application areas are expected to become MPAI standards.

The FWL includes a set of definitions that are omitted here. In particular the definition of Licence, namely, the Framework Licence to which values, e.g., currency, percent, dates etc., related to a specific Intellectual Property will be added.

The FWL is expressed in concise form as below

  1. The Licence will:
    1. be in compliance with generally accepted principles of competition law and the MPAI Statutes
    2. cover all of Licensor’s claims to Essential IPR practiced by a Licensee of the MPAI-AIF standard
    3. cover Development Rights and Implementation Rights
    4. apply to a baseline MPAI-AIF profile and to other profiles containing additional technologies
  2. Grant access to Essential IPRs of the MPAI-AIF standard in a non-discriminatory fashion.
  3. Have a scope to legal, bias, ethical and moral limitations
  4. Royalties will:
    1. apply to Implementations that are based on the MPAI-AIF standard
    2. not be based on the computational time nor on the number of API calls
    3. apply on a worldwide basis
    4. apply to any Implementation
  5. An MPAI-AIF Implementation may use other IPR to extend the MPAI-AIF Implementation or to provide additional functionalities
  6. The Licence may be granted free of charge for particular uses if so decided by the licensors
  7. The Licences will specify
    1. a threshold below which a Licence will be granted free of charge and/or
    2. a grace period during which a Licence will be granted free of charge and/or
    3. an annual in-compliance royalty cap applying to total royalties due on worldwide rev­enues for a single Enterprise
  8. A preference will be expressed on the entity that should administer the patent pool of holders of Patents Essential to the MPAI-AIF standard
  9. The total cost of the Licences issued by IPR holders will be in line with the total cost of the licences for similar technologies standardised in the context of Standard Development Organisations
  10. The total cost of the Licences will take into account the value on the market of the AI Framework technology Standardised by MPAI.

Miran reminded how easily legal entities or individuals representing a technical departments of a university supporting the MPAI mission and able to contribute to the development of MPAI standards can join MPAI. They should

  1. Choose one of the two classes of membership (until 2021/12/31):
    1. Principal Members, with the right to vote (2400 €)
    2. Associate Members, without the right to vote (480 €)
  2. Send secretariat@mpai.community
    1. a signed copy of Template for MPAI Membership applications
    2. a signed copy of the MPAI Statutes. Each page should be signed and initialled
    3. a copy of the bank transfer

MPAI issues a Call for Technologies supporting its AI Framework standard

Geneva, Switzerland – 16 December 2020. Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI), an international unaffiliated standards association, has approved a Call for Technologies (CfT) for publication at its 3rd General Assembly MPAI-3. The CfT concerns tech­nologies for MPAI-AIF, acronym of the MPAI AI Frame­work standard.

The goal of MPAI-AIF is to enable set up and execution of mixed processing and infer­ence work­flows made of Machine Learning, Artificial Intelligence and legacy Data Processing com­ponents called AI Modules (AIM).

The MPAI AI Framework standard will facilitate integration of AI and legacy data processing components through standard interfaces and methods. MPAI experts have already validated MPAI’s innovative approach in a sample micro controller-based implementation that is synergistic with MPAI-AIF standard development.

In line with its statutes, MPAI has developed the Framework Licence associated with the MPAI-AIF standard. Responses to the CfT shall be in line with the requirements laid down in the CfT and shall be supported by a statement that the respondent will licence their technologies, if adopted in the standard, according to the framework licence.

MPAI is also working on a range of standards for AIM input/output interfaces used in several application areas. Two candidate standards have completed the definition of Functional Requirements and have been promoted to the Commercial Requirements stage.

The two candidates are

  1. MPAI-CAE – Context-based Audio Enhancement uses AI to improve the user experience for a variety of uses such as entertainment, communication, teleconferencing, gaming, post-prod­uction, restoration etc. in the contexts of the home, the car, on-the-go, the studio etc. allowing a dynamically optimised user experience.
  2. MPAI-MMC – Multi-Modal Conversation uses AI to enable human-machine conversation that emulates human-human conversation in completeness and intensity.

MPAI adopts a light approach in the definition AIMs standardisation. Different implementors can produce AIMs of different performance still exposing the same standard interfaces. MPAI AIMs with different features from a variety of sources will promote hor­izontal markets of AI solutions that tap from and further promote AI innovation.

The MPAI web site provides more information about other MPAI standards: MPAI-CUI uses AI to compress and understand industrial data, MPAI-EVC to improve the performance of existing video codecs, MPAI GSA to to understand and compress the res­ult of combining genomic experiments with those produced by related devices, e.g. video, motion, location, weather, medical sensors, and MPAI-SPG to improve the user experience of online multiplayer games.

MPAI develops data coding standards for applications that have AI as core enabling technology. Any legal entity that supports the MPAI mission may join MPAI if it is able to contribute to the development of standards for the efficient use of Data.

Visit the MPAI home page and contact the MPAI secretariat for specific information.

 

 


MPAI commences development of the Framework Licence for the MPAI AI Framework

Geneva, Switzerland – 18 November 2020. The Geneva-based international Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) has concluded its second General As­sembly making a major step toward the development of its first standard called MPAI AI Frame­work, acronym MPAI-AIF.

MPAI-AIF has been designed to enable creation and automation of mixed processing and infer­ence workflows made of Machine Learning, Artificial Intelligence and traditional Data Processing components.

MPAI wishes to give as much information as possible to users of its standards. After approving the Functional Requirements, MPAI is now developing the Commercial Requirements, to be embodied in the MPAI-AIF Framework Licence. This will collect the set of conditions of use of the eventual licence(s), without values, e.g. currency, percentage, dates etc.

An optimal implementation of the MPAI use cases requires a coordinated combination of processing modules. MPAI has assessed that, by standardising the interfaces of Processing Mod­ules, to be executed in the MPAI AI-Framework, horizontal markets of competing standard implementations of processing modules will emerge.

The MPAI-AIF standard, that MPAI plans on delivering in July 2021, will reduce cost, promote adoption and incite progress of AI technologies while, if the market develops incompatible implementations, costs will multiply, and adoption of AI technologies will be delayed.

MPAI-AIF is the first of a series of standards MPAI has in its development pipeline. The following three work areas, promoted to Functional Requirements stage, will build on top of MPAI-AIF:

  1. MPAI-CAE – Context-based Audio Enhancement uses AI to improve the user experience for a variety of uses such as entertainment, communication, teleconferencing, gaming, post-prod­uction, restoration etc. in the contexts of the home, the car, on-the-go, the studio etc. allowing a dynamically optimised user experience.
  2. MPAI-GSA – Integrative Genomic/Sensor Analysis uses AI to understand and compress the results of high-throughput experiments combining genomic/proteomic and other data – for in­stance from video, motion, location, weather, medical sensors. The target use cases range from personalised medicine to smart farming.
  3. MPAI-MMC – Multi-Modal Conversation uses AI to enable human-machine conversation that emulates human-human conversation in completeness and intensity.

The MPAI web site provides more information about other MPAI standards: MPAI-EVC uses AI to improve the performance of existing video codecs, MPAI-SPG to improve the user experience of online multiplayer games and MPAI-CUI to compress and understand industrial data.

MPAI seeks the involvement of companies who can benefit from international data coding stan­dards and calls for proposals of standards. In a unique arrangement for a standards organisations, MPAI gives the opportunity, even to non-members, to accompany a proposal through the defin­ition of its goals and the development of functional requirements. More details here.

MPAI develops data coding standards for a range of applications with Artificial Intelligence (AI) as its core enabling technology. Any legal entity that supports the MPAI mission may join MPAI if it is able to contribute to the development of Technical Specifications for the efficient use of Data.

Visit the MPAI home page and  contact the MPAI secretariat for specific information.

 


MPAI launches 6 standard projects on audio, genomics, video, AI framework, multiuser online gaming and multimodal conversation

Geneva, Switzerland – 21 October 2020. The Geneva-based international Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) has concluded its first operational General Assembly adopting 6 areas of work, due to become standardisation projects.

MPAI-CAE – Context-based Audio Enhancement is an area that uses AI to improve the user experience for a variety of uses such as entertainment, communication, teleconferencing, gaming, post-production, restoration etc. in such contexts as in the home, in the car, on-the-go, in the studio etc. allowing a dynamically optimized user experience.

MPAI-GSA – Integrative Genomic/Sensor Analysis is an area that uses AI to understand and compress the results of high-throughput experiments combining genomic/proteomic and other data – for instance from video, motion, location, weather, medical sensors. The target use cases range. from personalised medicine to smart farming.

MPAI-SPG – Server-based Predictive Multiplayer Gaming uses AI to minimise the audio-visual and gameplay disruptions during an online real-time game caused by missing information at the server or at the client because of high latency and packet losses.

MPAI-EVC – AI-Enhanced Video Coding plans on using AI to further reduce the bitrate required to store and transmit video information for a variety of consumer and professional applications. One user of the MPAI-EVC standard is likely to be MPAI-SPG for improved compression and higher quality of cloud-gaming content.

MPAI-MMC – Multi-Modal Conversation aims to use AI to enable human-machine conversation that emulates human-human conversation in completeness and intensity

MPAI-AIF – Artificial Intelligence Framework is an area based on the notion of a framework populated by AI-based or traditional Processing Modules. As this is a foundational standard on which other planned MPAI standards such as MPAI-CAE, MPAI-GSA and MPAI-MMC, will be built, MPAI intends to move at an accelerated pace: Functional Requirements ready in November 2020, Commercial Requirements ready in December 2020 and Call for Technologies issued in January, 2021. The MPAI-AIF standard is planned to be ready before the summer holidays in 2021.

You can find more information about MPAI standards.

MPAI covers its Commercial Requirements needs with Framework Licences (FWL). These are the set of conditions of use of a license of a specific MPAI standard without the values, e.g. curren­cy, percentages, dates, etc. MPAI expects that FWLs will accelerate the practical use of its stan­dards.

MPAI develops data coding standards for a range of applications with Artificial Intelligence (AI) as its core enabling technology. Any legal entity that supports the MPAI mission may join MPAI if it is able to contribute to the development of Technical Specifications for the efficient use of Data.

Visit the MPAI home page and  contact the MPAI secretariat for spec­ific information.


MPAI-AIF

Artificial Intelligence Framework

To enable creation and automation of mixed Machine Language-Artificial Intelligence-Data Processing and inference workflows for the application areas currently included in the MPAI work plan.


Use Cases and Functional RequirementsApplication NoteFramework LicenceCall for Technologies – Announcement

MPAI-AIF AI Framework Announcement

By the 15th of January 2021, those intending to submit a response to the MPAI-AIF Call for Technologies (CfT) should send secretariat@mpai.community an email containing the following data (Annex A to the CfT)

  1. Title of the proposal
  1. Organisation: company name, position, e-mail of contact person
  1. What is the main functionalities of your proposal?
  1. Does your proposal provide or describe a formal specification and APIs?
  1. Will you provide a demonstration to show how your proposal meets the evaluation criteria?

Your response, but not your identity, will be posted to this web page. While this is a competitive CfT, we wish to give as much information as possible about how well the CfT Functional Requirements are covered by responses.


Use Cases and Functional RequirementsApplication NoteFramework LicenceCall for Technologies – Announcement

Call for Technologies

1 Introduction

Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) is an international non-profit organisation with the mission to develop Artificial Intelligence (AI) enabled digital data coding standards, especially using new technologies such as Artificial Intelligence, and of technologies that facilitate integration of data coding components into ICT systems. With the mechanism of Framework Licences, MPAI seeks to attach clear IPR licensing frameworks to its standards.

As a result of the analysis of several use cases, MPAI has identified the need for a common AI Framework that can support the implementation of Use Cases. MPAI expects that most future use cases will benefit from the use of the MPAI AI Framework or extensions thereof. For this reason, MPAI has decided that a standard satisfying the requirements contained in MPAI document N74 available online would benefit use case implementors.

This document is a Call for Technologies (CfT) that 1) satisfy the requirements of N74 and 2) are released according to the Framework Licence of N101, if selected by MPAI for inclusion in the MPAI AI Framework standard called MPAI-AIF. MPAI will select the most suitable technologies on the basis of their technical merits for inclusion in MPAI-AIF.

All parties who believe they have relevant technologies satisfying all or most of the requirements mentioned in MPAI N74 are invited to submit proposals for consideration by MPAI. The parties do not necessarily have to be MPAI members.

MPAI in not obligated, by virtue of this CfT, to select a particular technology or to select any technology if those submitted are found inadequate.

Submissions are due on 2020/02/15T23:59 UTC and will be reviewed according to the schedule that the 5th MPAI General Assembly (MPAI-5) will define at its online meeting on 2021/02/17. Non-MPAI members should contact the MPAI secretariat (secretariat@mpai.community) for further details on how they can attend the said review.

2 How to submit a response

Those planning to respond to this CfT

  1. Are advised that online events will be held on 2020/12/21 and 2021/01/07 to present the MPAI-AIF CfT and respond to questions. Logistic information of these events will be posted on the MPAI web site
  2. Are requested to communicate their intention to respond to this CfT with an initial version of the form of Annex A to the MPAI secretariat (secretariat@mpai.community) by 2021/01/15. A potential submitter making a communication using the said form is not required to actually make a submission. Submission will be accepted even if the submitter did not communicate their intention to submit a response.

Responses to this MPAI-AIF CfT shall/may include:

Item Status
Detailed documentation describing the proposed technologies mandatory
The final version of Annex A mandatory
The text of Annex B duly filled out with the table indicating which requirements identified in MPAI N74 are satisfied. If a requirement is not satisfied, the submission shall indicate the reason mandatory
Comments on the completeness and appropriateness of the MPAI-AIF requirem­ents and any motivated suggestion to extend those requirements optional
A preliminary demonstration, with a detailed document describing it optional
Any other additional relevant information that may help evaluate the submission, such as additional use cases optional
The text of Annex D mandatory

Respondents are invited to review the check list of Annex C before submitting their response and filling out Annex B.

Responses to this MPAI-AIF CfT shall be submitted to secretariat@mpai.community (MPAI secretariat) by 2020/02/15T23:59 UTC. The secretariat will acknowledge receipt of the submission via email.

Respondents to this CfT are requested to present their submission (mandatory). If no presenter will attend the meeting, the proposal will be discarded.

Respondents are advised that, upon acceptance by MPAI for further evaluation of their submission in whole or in part, MPAI will require that

  • A working implementation, including source code, – for use in the development of the MPAI-AIF Reference Software – be made available before the technology is accepted for the MPAI-AIF standard. Software may be written in programming languages that can be compiled or interpreted and in hardware description languages.
  • A non-MPAI member immediately join MPAI. If the non-MPAI member elects not to do so, their submission will be discarded. Direction on how to join MPAI can be found online.

Further information on MPAI can be obtained from the MPAI website.

3 Evaluation Criteria and Procedure

Submissions will be evaluated on the basis of the criteria identified in Annex B and with the following steps:

1) Presentation (mandatory) / Demonstration (optional)

Goal To assess the submission based on a presentation and possible demonstration that

1.     Demonstrate the appropriateness and disclose the appropriate range of use.

2.     Provide evidence of the functionalities claimed, and of how the submission satisfies the evaluation criteria.

NB1: A respondent may opt to select a particular use case to demonstrate their functionalities. MPAI encourages to select one of the existing Use Cases. A respondent my demonstrate a new use case. However, they should provide complete description of the use case, of the inputs and outputs of the implemented AIMs and the interaction between AIMs and Management and Control.

NB2: Both demo and presentation will each have a time limit (to be determined).

Output Complete proposal evaluation sheet in Annex B.

2) Produce a conclusion

Goal To summarise the results. This should enable MPAI to identify

·       The strong points of the proposal.

·       How the proposal might be adapted or combined with other proposals to enter the Working Draft, and/or be further tested.

Output  Proposed evaluation results.

4 Expected development timeline

Timeline of the call, deadlines and evaluation of the answers:

Call for Technologies 2020/12/16
Conference Calls 2020/12/21 and 2021/01/07
Notification of intention to submit a proposal 2021/01/15
Submission deadline 2021/02/15T23.59 UTC
Evaluation of responses Calendar determined at MPAI-5 2021/02/17

Evaluation to be carried out during 2-hour sessions according to the calendar agrees at MPAI-5

5 References

[1] Use Cases & Functional Requirements of MPAI-AIF, MPAI N74; https://mpai.community/standards/mpai-aif/

[2] Use Case-Requirements-candidate technologies for MPAI-CAE CfT, MPAI N96

[3] Use Case-Requirements-candidate technologies for MPAI-MMC CfT, MPAI N97

[4] MPAI-CUI Use Cases and Functional Requirements, MPAI N95

Annex A: Information Form

This information form is to be filled in by a respondent to the MPAI-AIF CfT

  1. Title of the proposal
  1. Organisation: company name, position, e-mail of contact person
  1. What is the main functionalities of your proposal?
  1. Does your proposal provide or describe a formal specification and APIs?
  1. Will you provide a demonstration to show how your proposal meets the evaluation criteria?

Annex B: Evaluation Sheet

This evaluation sheet is to be used for self-evaluation in the submission and to be filled out during evaluation phase.

Title of the Proposal:

Main Functionalities:

 Summary of Response: (a few lines)

Comments on Relevance to the CfT (Requirements):

Evaluation table:

Submission features Evaluation elements Final Assement
Completeness of description

Understandability

Adaptability

Extensibility

Use of Standard Technology

Efficiency

Test cases

Maturity of reference implementation

Relative complexity

Support of MPAI use cases

Support of non-MPAI use cases

Content of the criteria table cells:

Evaluation facts should mention:

  • Not supported / partially supported / fully supported.
  • What supported these facts: submission/presentation/demo.
  • The summary of the facts themselves, e.g., very good in one way, but weak in another.

Final assessment should mention:

  • Possibilities of improving or adding to the proposal, e.g., any missing or weak features.
  • How sure the experts are, i.e., evidence shown, very likely, very hard to tell, etc.
  • Global evaluation (Not Applicable/ –/ – / + / ++)

 New Use Cases/Requirements Identified:

Summary of the evaluation:

  • Main strong points, qualitatively: 
  • Main weak points, qualitatively:
  • Overall evaluation: (0/1/2/3/4/5)

0: could not be evaluated

1: proposal is not relevant

2: proposal is relevant, but requires much more work

3: proposal is relevant, but with a few changes

4: proposal has some very good points, so it is a good candidate for standard

5: proposal is superior in its category, very strongly recommended for inclusion in standard

Additional remarks: (points of importance not covered above.)

Annex C: Requirements check list

This list has been derived from the Requirements of N74. It is not intended to be a replacement of those Requirements.

The submission shall support the following requirements

  1. General Machine Learning and/or Data Processing life cycles with the possibility to
    1. instantiate-configure-remove
    2. dump/retrieve internal state
    3. start-suspend-stop
    4. train-retrain-update
    5. enforce resource limits
    6. implement auto-configuration/reconfiguration of ML-based computational models of

single AIMs and

  1. initialise the overall computational model
  2. instantiate-remove-configure AIMs
  3. manually, automatically, dynamically and adaptively configure interfaces with Com­ponents
  4. one- and two-way signal for computational workflow initialisation and control of

combinations of AIMs

  1. Application-scenario dependent hierarchical execution of workflows
  2. Topology of networked AIMs that can be synchronised according to a given time base and full ML life cycles
  3. Supervised, unsupervised and reinforcement-based learning paradigms
  4. Computational graphs, such as Direct Acyclic Graph (DAG) as a minimum
  5. Initialisation of signalling patterns, communication and security policies between AIMs
  6. Protocols to specify storage access time, retention, read/write throughput etc.
  7. Storage of Components’ data
  8. Access to
    1. Static or slowly changing data with standard formats
    2. Data with proprietary formats

The submission shall support the implementation of AI Frameworks featuring

  1. Asynchronous and time-based synchronous operation depending on application
  2. Dynamic update of the ML models with seamless or minimal impact on its operation
  3. Time-sharing operation of ML-based AIMs shall to enabl use of the same ML-based AIM in multiple concurrent applications
  4. AIMs which are aggregations of AIMs exposing new interfaces
  5. Workflows that are a mixture of AI/ML-based and DP technology-based AIMs.
  6. Scalability of complexity and performance to cope with different scenarios, e.g. from small MCUs to complex distributed systems

The submission shall not inhibit the creation of MPAI-AIF profiles.

Annex D: Mandatory text in responses

A response to this MPAI-AIF CfT shall mandatorily include the following text

<Company/Member> submits this technical document in response to MPAI Call for Technologies for MPAI project MPAI-AIF (MPAI document N100).

 <Company/Member> explicitly agrees to the steps of the MPAI standards development process defined in Annex 1 to the MPAI Statutes, in particular <Company/Member> declares that  <Com­pany/Member> or its successors will make available the terms of the Licence related to its Essential Patents according to the Framework Licence of MPAI-AIF (MPAI document N101), alone or jointly with other IPR holders after the approval of the MPAI-AIF Technical Specif­ication by the General Assembly and in no event after commercial implementations of the MPAI-AIF Technical Specification become available on the market.

In case the respondent is a non-MPAI member, the submission shall mandatorily include the following text

If (a part of) this submission is identified for inclusion in a specification, <Company>  understands that  <Company> will be requested to immediately join MPAI and that, if  <Company> elects not to join MPAI, this submission will be discarded.

Subsequent technical contribution shall mandatorily include this text

<Member> submits this document to MPAI Development Committee AIF as a contribution to the development of the MPAI-AIF Technical Specification.

 <Member> explicitly agrees to the steps of the MPAI standards development process defined in Annex 1 to the MPAI Statutes, in particular  <Company> declares that <Company> or its successors will make available the terms of the Licence related to its Essential Patents according to the Framework Licence of MPAI-AIF (MPAI document N101), alone or jointly with other IPR holders after the approval of the MPAI-AIF Technical Specification by the General Assembly and in no event after commercial implementations of the MPAI-AIF Technical Specification become available on the market.


Use Cases and Functional RequirementsApplication NoteFramework LicenceCall for Technologies – Announcement

Framework Licence

1 Coverage

The MPAI AI Framework (MPAI-AIF) standard as will be defined in document Nxyz of Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI).

MPAI-AIF specifies a generic execution environment possibly integrating Machine Learning, Artificial Intelligence and legacy Data Processing components implementing application areas such as

  1. Context-based Audio Enhancement (MPAI-CAE)
  2. Integrative Genomic/Sensor Analysis (MPAI-GSA)
  3. AI-Enhanced Video Coding (MPAI-EVC)
  4. Server-based Predictive Multiplayer Gaming (MPAI-SPG)
  5. Multi-Modal Conversation (MPAI-MMC)
  6. Compression and Understanding of Industrial data (MPAI-CUI)

The six application areas are expected to become MPAI standards.

2 Definitions

Term Definition
Data Any digital representation of a real or computer-generated entity, such as moving pictures, audio, point cloud, computer graphics, sensor and actu­ator. Data includes, but is not restricted to, media, manufacturing, auto­mot­ive, health and generic data.
Development Rights Licence to use MPAI-AIF Essential IPRs to develop Implementations
Enterprise Any commercial entity that develops or implements the MPAI-AIF standard
Essential IPR Any Proprietary Rights, (such as patents) without which it is not possible on technical (but not commercial) grounds, to make, sell, lease, otherwise dispose of, repair, use or operate Implementations without infringing those Proprietary Rights
Framework Licence A document, developed in compliance with the gener­ally accepted principles of competition law, which contains the conditions of use of the Licence without the values, e.g., currency, percent, dates etc.
Implementation A hardware and/or software reification of the MPAI-AIF standard serving the needs of a professional or consumer user directly or through a service
Implementation Rights Licence to reify the MPAI-AIF standard
Licence This Framework Licence to which values, e.g., currency, percent, dates etc., related to a specific Intellectual Property will be added. In this Framework Licence, the word Licence will be used as singular. However, multiple Licences from different IPR holders may be issued
Profile A particular subset of the technologies that are used in MPAI-AIF standard and, where applicable, the classes, subsets, options and parameters relevant to the subset

3 Conditions of use of the Licence

  1. The Licence will be in compliance with generally accepted principles of competition law and the MPAI Statutes
  2. The Licence will cover all of Licensor’s claims to Essential IPR practiced by a Licencee of the MPAI-AIF standard.
  3. The Licence will cover Development Rights and Implementation Rights
  4. The Licence will apply to a baseline MPAI-AIF profile and to other profiles containing additional technologies
  5. Access to Essential IPRs of the MPAI-AIF standard will be granted in a non-discriminatory fashion.
  6. The scope of the Licence will be subject to legal, bias, ethical and moral limitations
  7. Royalties will apply to Implementations that are based on the MPAI-AIF standard
  8. Royalties will not be based on the computational time nor on the number of API calls
  9. Royalties will apply on a worldwide basis
  10. Royalties will apply to any Implementation
  11. An MPAI-AIF Implementation may use other IPR to extend the MPAI-AIF Implementation or to provide additional functionalities
  12. The Licence may be granted free of charge for particular uses if so decided by the licensors
  13. The Licences will specify
    1. a threshold below which a Licence will be granted free of charge and/or
    2. a grace period during which a Licence will be granted free of charge and/or
    3. an annual in-compliance royalty cap applying to total royalties due on worldwide rev­enues for a single Enterprise
  14. A preference will be expressed on the entity that should administer the patent pool of holders of Patents Essential to the MPAI-AIF standard
  15. The total cost of the Licences issued by IPR holders will be in line with the total cost of the licences for similar technologies standardised in the context of Standard Development Organisations

The total cost of the Licences will take into account the value on the market of the AI Framework


Use Cases and Functional RequirementsApplication NoteFramework LicenceCall for Technologies – Announcement

Use Cases and Functional Requirements

1 Introduction

Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) is an international association with the mission to develop AI-enabled data coding standards. Artificial Intelligence (AI) technologies have shown they can offer more efficient data coding than existing technol­ogies.

MPAI has analysed six use cases covering applic­ation areas benefiting from AI technol­ogies. Even though use cases are disparate, each of them can be implemented with a combination of processing modules performing functions that concur to achieving the inten­ded result.

MPAI has assessed that, leaving it to the market to develop individual implementations, would multiply costs and delay adoption of AI technologies, while modules with standard interfaces, combined and executed within the MPAI-specified AI Framework, will favour the emergence of horizontal markets where proprietary and competing module implemen­tations exposing standard interfaces will reduce cost, promote adoption and incite progress of AI technologies. MPAI calls these modules AI Modules (AIM).

MPAI calls the planned AI Framework standard as MPAI-AIF. As AI is a fast-moving field, MPAI expects that MPAI-AIF will be extended as new use cases will bring new requirements and new technologies will reach maturity.

To avoid the deadlock experienced in other high-technology fields, before engaging in the development of the the MPAI-AIF standard, MPAI will develop a Frame­work Licence (FWL) associated with the MPAI-AIF Architecture and Functional Requir­ements defined in this document. The FWL, essentially the business model that standard essential patent (SEP) holders will apply to monetise their Intellectual Properties (IP), but without values such as the amount or percentage of royalties or dates due, will act as Commercial Requirements for the standard and provide a clear IPR licensing framework.

This document contains a summary description of the six use cases (Section 2) followed by a section describing the architecture expected to become normative (Section 3). Section 4 lists the normative requirements identified so far.

2        Use Cases

The six use cases considered cover a broad area of application. Therefore, it is expected that the MPAI-AIF architecture can support a wide range of use cases of practical interest.

Each case is identified by its name and the acronym identifying the future MPAI standard. More information about MPAI-AIF can be found in [1].

2.1 Context-based Audio Enhancement (MPAI-CAE)

2.2 Integrative Genomic/Sensor Analysis (MPAI-GSA)

2.3 AI-Enhanced Video Coding (MPAI-EVC)

2.4 Server-based Predictive Multiplayer Gaming (MPAI-SPG)

2.5 Multi-Modal Conversation (MPAI-MMC)

2.6 Compression and Understanding of Industrial data (MPAI-CUI)

3        Architecture

The normative MPAI-AIF architecture enables the creation and automation of mixed ML-AI-DP processing and inference workflows at scale for the use cases considered above. It includes six basic normative elements of the Architecture called Components addressing different modalities of operation – AI, Machine Learning (ML) and Data Processing (DP), data pipelines jungles and computing resource allocations including constrained hardware scenarios of edge AI devices.

The normative reference diagram of MPAI-AIF is given by the following figure where APIs be­tween different Components at different level are shown.

Figure 3 – Proposed normative MPAI-AIF Architecture

  1. Management and Control

Management concerns the activation/disactivation/suspensions of AIMs, while Control supports complex application scenarios.

Management and Control handles simple orchestration tasks (i.e. represented by the execution of a script) and much more complex tasks with a topology of networked AIMs that can be syn­chronised according to a given time base and full ML life cycles.

  1. Execution

The environment where AIMs operate. It is interfaced with Management and Control and with Communication and Storage. It receives external inputs and produces the requested outputs both of which are application specific.

  1. AI Modules (AIM)

AIMs are units comprised of at least the following 3 functions:

  1. The processing element (ML or traditional DP)
  2. Interface to Communication and Storage
  3. Input and output interfaces (function specific)

AIMs can implement auto-configuration or reconfiguration of their ML-based computational models.

  1. Communication

Communication is required in several cases and can be implemented accordingly, e.g. by means of a service bus. Components can communicate among themselves and with outputs and Storage.

The Management and Control API implements one- and two-way signalling for computational workflow initialisation and control.

  1. Storage

Storage encompasses traditional storage and is referred to a variety of data types, e.g.:

  1. Inputs and outputs of the individual AIMs
  2. Data from the AIM’s state, e.g. with respect to traditional and continuous learning
  3. Data from the AIM’s intermediary results
  4. Shared data among AIMs
  5. Information used by Management and Control.
  6. Access

Access represents the access to static or slowly changing data that are required by the application such as domain knowledge data, data models, etc.

4 Requirements

4.1 Component requirements

  1. The MPAI-AIF standard shall include specifications of the interfaces of 6 Components
    1. Management and Control
    2. Execution
    3. AI Modules (AIM)
    4. Communication
    5. Storage
    6. Access
  2. MPAI-AIF shall support configurations where Components are distributed in the cloud and at the edge
  3. Management and Control shall enable operations on the general ML life cycle: the sequence of steps that and/or traditional data processing life cycle of
    1. Single AIMs, e.g. instantiation-configuration-removal, internal state dumping/retrieval, start-suspend-stop, train-retrain-update, enforcement of resource limits
    2. Combinations of AIMs, e.g. initialisation of the overall computational model, instan­tiation-removal-configuration of AIMs, manual, automatic, dynamic and adaptive configuration of interfaces with Components.
  4. Management and Control shall support
    1. Architectures that allow application-scenario dependent hierarchical execution of workflows, i.e. a combination of AIMs into computational graphs
    2. Supervised, unsupervised and reinforcement-based learning paradigms
    3. Computational graphs, such as Direct Acyclic Graph (DAG) as a minimum
    4. Initialisation of signalling patterns, communication and security policies between AIMs
  5. Storage shall support protocols to specify application-dependent requirements such as access time, retention, read/write throughput
  6. Access shall provide
    1. Static or slowly changing data with standard formats
    2. Data with proprietary formats

4.2 Systems requirements

The following requirements are not intended to apply to the MPAI-AIF standard, but should be used for assessing technologies

  1. Management and Control shall support asynchronous and time-based synchronous operation depending on application
  2. The Architecture shall support dynamic update of the ML models with seamless or minimal impact on its operation
  3. ML-based AIMs shall support time sharing operation enabling use of the same ML-based AIM in multiple concurrent applications
  4. AIMs may be aggregations of AIMs exposing new interfaces
  5. Complexity and performance shall be scalable to cope with different scenarios, e.g. from small MCUs to complex distributed systems
  6. The Architecture shall support workflows of a mixture of AI/ML-based and DP technology-based AIMs.

4.3       General requirements

The MPAI-AIF standard may include profiles for specific (sets of) requirements

5 Conclusions

When the definition of the MPAI-AIF Framework Licence will be completed, MPAI will issue a Call for Technologies that support the AI Framework with the requirem­ents given in this document.

Respondents will be requested to state in their submissions their intention to adhere to the Frame­work Licence developed for MPAI-AIF when licensing their technologies if they have been inc­luded in the MPAI-AIF standard.

The MPAI-AIF Framework Licence will be developed, as for all other MPAI Framework Licences, in compliance with the gener­ally accepted principles of competition law.

6        References

[1] MPAI Application Note#4 – MPAI-AIF Artificial Intelligence Framework

[2] MPAI Application Note#1 R1 – MPAI-CAE Context-based Audio Enhancement

[3] MPAI Application Note#2 R1 – MPAI-GSA Integrative Genomic/Sensor Analysis

[4] MPAI Application Note#3 R1 – MPAI-EVC AI-Enhanced Video Coding

[5] MPAI Application Note#5 R1 – MPAI-SPG Server-based Predictive Multiplayer Gaming

[6] MPAI Application Note#6 R1 – MPAI-MMC Multi-Modal Conversation

[7] MPAI-CAE Functional Requirements work programme

[8] MPAI-GSA Functional Requirements work programme

[9] MPAI-MMC Functional Requirements work programme

[10] MPAI-EVC Use Cases and Requirements

[11] Collaborative Evidence Conditions for MPAI-EVC Evidence Project R1

[12] Operational Guidelines for MPAI-EVC Evidence Project


Use Cases and Functional RequirementsApplication NoteFramework LicenceCall for Technologies – Announcement

 

MPAI Application Note #4

Artificial Intelligence Framework (MPAI-AIF)

Proponent: Andrea Basso.

 Description: The purpose of the MPAI framework is to enable the creation and automation of mixed ML-AI-DP processing and inference workflows at scale. The key components of the framework should address different modalities of operation (AI, ML and DP), data pipeline jungles and computing resource allocations including constrained hardware scenarios of edge AI devices.

The framework is depicted in Figure 1. It is composed by:

  1. Data Storage component
  2. Orchestrator
  3. Processing modules (PM), traditional or ML algorithms

Figure 1 – MPAI Framework

  1. MPAI Processing Modules (PM)

PMs are composed by 4 components as indicated in Figure 1:

  1. The processing element PM (ML or Traditional Data Processor)
  2. Interface to the common data storage format
  3. input interfaces
  4. output interfaces
  5. Orchestrator

The PMs in Figure 1 need to be executed and orchestrated, so that are being executed in the correct order and the timing where needed is respected e.g. that inputs to a PM are computed before the PM is executed. As reported in [6] the glue code between PMs of a complex ML system is often brittle and that custom scripts don’t scale beyond specific cases.

Therefore, it is important to have a specific orchestrator component that supports the implementation of several MPAI application scenarios and ease the efficient usage of PMs.

 As shown in figure 1 the orchestrator is characterized by:

  1. Interface to PMs
  2. Interface to the input
  3. Interface to the common data storage format

Note that the Orchestrator should be able to handle from simple orchestration tasks (i.e. represented by the execution of a script) to much more complex tasks with a topology of networked PMs that need to be synchronized according to a given time base.

  1. Data Storage

The Data Storage component encompass traditional storage and is referred to a variety of data types. Some examples: it stores the inputs and outputs of the individual PMs, data from the orchestrator PM’s state and intermediary results, shared data among PMs, information used by the orches­trator, orchestrator procedures as well domain knowledge data, data models, etc.

Comments:

Examples

1. MPAI-CAE

Examples of PMs in the MPAI-CAE application scenario are

 

  • process the signals from the environment captured by the microphones
  • performing dynamic signal equalization
  • selectively recognize and allow relevant environment sounds (i.e. the horn of a car).

2. MPAI-MMC

Examples of PMs in the MPAI-MMC scenario are

  • Speech recognition
  • Speech synthesis
  • Emotion recognition
  • Intention understanding
  • Gesture recognition
  • Knowledge fusion from different sources such as speech, facial expression, gestures, etc

An illustrative application of the MPAI-AIF Framework is given below

Figure 2 – MPAI-MMC in the MPAI-AIF Framework

Requirements:

MPAI has identified the following initial requirements:

  • Shall allow to orchestrate and automate the execution of AI mixed workflows for a variety of application scenarios
  • Agnostic to AI, ML or DP technology: The architecture should be general enough avoiding the imposition of limitations in terms of algorithmic structure, storage and communication and allow full interoperability of its components.
  • Allow uninterrupted functionality while algorithms or ML models are updated or retained. ML models once deployed may need to update often as more data is made available through their usage. Deployment of the updated models should happen seamlessly with no or minimal impact.
  • Support for distributed computing including combination of cloud and edge AI components.
  • Scalable: can be used in scenarios of different complexity
  • Efficient use of the computing and communication resources e.g. by supporting task sharing also for ML processing modules.
  • Support a common interface for processing modules
  • Support common data representation for storage
  • Support parallel and sequential combination of PMs
  • Support real time processing

 Object of standard:

MPAI has identified the following areas of possible standardization:

  • Architecture that specifies the roles of the components of the architecture
  • The modalities of the creation of the pipelines of PMs
  • Common PM interfaces specified in the application standards s8ch as MPAI-CAE, etc ..
  • Data storage interfaces

The current framework has been made as general as possible taking into consideration also current MPAI application standards. We are expecting the architecture to be enriched and extended according to the proposals of the MPAI contributors.

Benefits:

MPAI-AIF will bring benefits positively affecting

  1. Technology providers need not develop full applications to put to good use their technol­ogies. They can concentrate on improving the AI technologies of the PMs. Further, their technologies can find a much broader use in application domains beyond those they are accustomed to deal with.
  2. Equipment manufacturers and application vendors can tap from the set of technologies made available according to the MPAI-AIF standard from different competing sources, integrate them and satisfy their specific needs
  3. Service providers can deliver complex optimizations and thus superior user experience with minimal time to market as the MPAI-AIF framework enables easy combination of 3rd party components from both a technical and licensing perspective. Their services can deliver a high quality, consistent user audio experience with minimal dependency on the source by selecting the optimal delivery method
  4. End users enjoy a competitive market that provides constantly improved user exper­iences and controlled cost of AI-based products.

 Bottlenecks: the full potential of AI in MPAI-AIF would be limited by a market of AI-friendly processing units and by the introduction of the vast amount of AI technologies into products and services.

 Social aspects:

MPAI-AIF will enable the availability of a variety of AI-based products at lower price and faster.

Success criteria:

MPAI-AIF should create a competitive market of AI-based components expos­ing standard interfaces, processing units available to manufacturers, a variety of end user devices and trigger the implicit need felt by a user to have the best experience whatever the context.

References

[1] W. Wang, J. Gao, M. Zhang, S. Wang, G. Chen, T. K. Ng, B. C. Ooi, J. Shao, and M. Reyad, “Rafiki: machine learning as an analytics service system,” Proceedings of the VLDB Endowment, vol. 12, no. 2, pp. 128–140, 2018.

[2] Y. Lee, A. Scolari, B.-G. Chun, M. D. Santambrogio, M. Weimer, and M. Interlandi, “PRETZEL: Opening the black box of machine learning prediction serving systems,” in 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI18), pp. 611–626, 2018.

[3] “ML.NET [ONLINE].” https://dotnet.microsoft.com/apps/machinelearning-ai/ml-dotnet.

[69] D. Crankshaw, X. Wang, G. Zhou, M. J. Franklin, J. E. Gonzalez, and I. Stoica, “Clipper: A low-latency online prediction serving system.,” in NSDI, pp. 613–627, 2017.

[4] S. Zhao, M. Talasila, G. Jacobson, C. Borcea, S. A. Aftab, and J. F. Murray, “Packaging and sharing machine learning models via the acumos ai open platform,” in 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 841–846, IEEE, 2018.

[5] “Apache Prediction I/O.” https://predictionio.apache.org/.

[6] D.Sculley, G.Holt,D.Golovin,E. Davydov,T.Phillips,D.Ebner,V. Chaudhary,M. Young, J. Crespo, D. Dennison “Hidden technical debt in Machine learning systems Share” on NIPS’15: Proceedings of the 28th International Conference on Neural Information Processing Systems – Volume 2, December 2015, Pages 2503–2511


MPAI-3

The 3rd General Assembly has approved the following public documents


MPAI-2

The 2nd MPAI General Assembly (MPAI-2) was held by teleconference on 2020/11/18/T14:00-16:20 UTC. Here are the public document approved


MPAI-CUI

Compression and Understanding of Industrial Data

Draft Standard Table of Contents – Use Cases and Functional RequirementsFramework LicenceCall for TechnologiesTemplate for responses to the Call for TechnologiesApplication Note

Draft Table of Contents of Standard

1       Introduction. 1

2       Scope of standard. 1

3       Terms and definitions. 1

4       Normative references. 2

5       Use Case Architecture. 2

5.1       AI-based Performance Prediction. 2

6       AI modules. 3

6.1       AI-based Performance Prediction. 3

6.1.1       Governance data (raw) 4

6.1.2       Financial statement data (raw) 4

6.1.3       Risk assessment technical data (raw) 4

6.1.4       Governance. 4

6.1.5       Financial statement 4

6.1.6       Risk assessment technical data. 4

6.1.7       Financial features. 4

6.1.8       Severity. 4

6.1.9       Decision Tree. 4

6.1.10     Default probability. 5

6.1.11     Business continuity index. 5

7       References. 5

 

 

1        Introduction

2        Scope of standard

3        Terms and definitions

Table 6 – MPAI-CUI terms

 

Term   Definition
Access Static or slowly changing data that are required by an application such as domain knowledge data, data models, etc.
AI Framework AIF The environment where AIM-based workflows are executed.
AI Module AIM The basic processing elements receiving processing specific inputs and producing processing specific outputs.
Communication The infrastructure that connects the Components of an AIF.
Data Processing DP A legacy technology that may be used to implement AIMs.
Decision Tree A decision support tool that uses a tree-like model of decision, given the financial and governance features.
Delivery An AIM that wraps data for transport.
Execution The environment in which AIM workflows are executed. It receives exter­nal inputs and produces the requested outputs both of which are application specific.
Financial features A set of indexes and ratios computed using financial statement data.
Financial statement Data produced based on a set of accounting principles driving maintenance and reporting of company accounts, so that financial statements can be consistent, transparent, and comparable across companies.
Governance features A set of indexes/parameters that are used to assess the adequacy of the organizational model.
Knowledge Base Structured and unstructured information made accessible to AIM (especially DP-based).
Management and Control Manages and controls the AIMs in the AIF, so that they execute in the correct order and at the time when they are needed.
Risk assessment Attributes that indicate the internal assessment that the company performs to identify and measure potential or existing vertical risks, and their impact on business continuity.
Severity A set of values, each reflecting the level of risk for that specific vertical risk as evaluated by the company.
Storage Storage used to e.g., store the inputs and outputs of the individual AIMs, data from the AIM’s state and intermediary results, shared data among AIMs.

4        Normative references

  1. International Financial Reporting Standard. List of IFRS Standards. Available online: https://www.ifrs.org/issued-standards/list-of-standards/
  2. International Organization for Standardization. ISO 37000 Guidance for the Governance of Organizations. Available online: https://committee.iso.org/sites/tc309/home/projects/ongoing/ongoing-1.html
  3. International Organization for Standardization. ISO 31000 Risk Management. Available online: https://www.iso.org/files/live/sites/isoorg/files/store/en/PUB100426.pdf
  4. International Organization for Standardization. ISO 27005 Information technology — Security techniques — Information security risk management

5        Use Case Architecture

5.1       AI-based Performance Prediction

A company may need to access the flow of internal (i.e., financial and governance data) and exter­nal data related to the activity of the company to assess and mon­itor its financial and organizational performance, as well as the impact of vertical risks (e.g., cyber, seismic, etc.), according to the current standards (e.g., ISO 31000 on risk assessment and management). The current version of MPAI-CUI takes into account only cyber and seismic risks that have an impact on financial per­formance. Other risks will be considered in future versions of the standard.

MPAI-CUI may be used by:

  1. The company generating the data flow to perform compression and understanding of the data for its own needs (e.g., to identify core and non-core data), to analyse its financial performance, identifying possible clues to a crisis or risk of bankruptcy years in advance. It may help the board of directors and decision-makers to make the proper decisions to avoid these situations, conduct what-if analysis, and devise efficient strategies.
  2. A financial institution that receives a request for financial help from a troubled company to access its financial and organizational data and make an AI-based assessment of that company, as well as a prediction of future performance. By having a better insight of its situation, a financial institution can make the right decision in funding or not a company.

This Use Case can be implemented as in Figure 1.

Figure 1 – Compression and understanding of Industrial Data

The AI Modules of Figure 2 perform the functions described in Table 2.

Table 2 – AI Modules of Industrial Data Compression and Understanding

AIM Function
Data Conversion Gathers data needed for the assessment from several sources (internal and external), in different formats and coverts it to a unique format (e.g., json).
Financial assessment Analyses the data generated by a company (i.e., financial statements) to assess the preliminary financial performances in the form of indexes.

Builds and extracts the financial features for the Decision tree and Pred­iction AIMs.

Governance assessment Builds and extracts the features related to the adequacy of the governance asset for the Decision tree and Pred­iction AIMs.
Risk matrix Builds the risk matrix to assess the impact of vertical risks (i.e., in this Use Case cyber and seismic).
Decision Creates the decision trees for making decisions.
Prediction Predicts values of the probability of company default in a time horizon of 36 months and of the adequacy of the organizational model.
Perturbation Perturbs the probability value of company crisis computed by Prediction, considering the impact of vertical risks on company performan­ce.

6        AI modules

6.1       AI-based Performance Prediction

The I/O data of Data Compression and Understanding AIMs are given in Table 3.

Table 3 – I/O data of Use Case AIMs 

AI Module Input Output
Data Conversion Financial statement data

Governance data

Risk assessment data

Financial statement data (converted)

Governance data (converted)

Financial assessment Financial statement data Financial features
Governance assessment Governance data Governance features
Risk matrix Technical data from internal risk assessment (i.e., cyber security) Severity
Decision Financial features, Governance features Ranking of features importance
Prediction Financial features, Governance features Probability of company crisis

Adequacy of organizational model

Perturbation Probability of company crisis (index); severity from Risk Matrix Index of business continuity

6.1.1      Governance data (raw)

6.1.2      Financial statement data (raw)

6.1.3      Risk assessment technical data (raw)

6.1.4      Governance

6.1.5      Financial statement

6.1.6      Risk assessment technical data

6.1.7      Financial features

6.1.8      Severity

6.1.9      Decision Tree

6.1.10   Default probability

6.1.11   Business continuity index

7        References


Use Cases and Functional Requirements

1        Introduction

Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) is an international association with the mission to develop AI-enabled data coding standards. Research has shown that data coding with AI-based technologies is generally more efficient than with existing technol­ogies. Compression is a notable example of coding as is feature-based description.

The MPAI approach to developing AI data coding standards is based on the definition of standard interfaces of AI Modules (AIM). The Modules operate on input and output data with standard formats. AIMs can be combined and executed in an MPAI-specified AI-Framework according to the emerging MPAI-AIF standard being developed based on the responses to Call for MPAI-AIF Technologies [1] with associated Use Cases and Functional Requirements [2].

By exposing standard interfaces, AIMs are able to operate in an MPAI AI Framework. However, their performance may differ depending on the technologies used to implement them. Therefore, MPAI believes that competing developers striving to provide more performing proprietary still inter­operable AIMs will naturally create horizontal markets of AI solutions that build on and further promote AI innovation.

This document, titled Compression and understanding of industrial data (MPAI-CUI), contains the “AI-based Performance Prediction” Use Case and associated Functional Requirem­ents. The MPAI-CUI standard uses AI substantially to extract the most relevant information from the indus­trial data, with the aim of assessing the performance of a company and predicting the risk of bankruptcy long before it may happen.

It should be noted that the AI-based Performance Prediction Use Case will be non-normative. The internals of the AIMs will also be non-normative. However, the input and output interfaces of the AIMs whose requirements have been derived to support the Use Cases will be normative.

This document includes this Introduction and

Chapter 2 briefly introduces the AI Framework Reference Model and its six Components
Chapter 3 briefly introduces the Use Case.
Chapter 4 presents the MPAI-CUI Use Case with the following structure

1.     Reference architecture

2.     Description of AI Modules and their I/O data

3.     Technologies and Functional Requirements

4.     Interfaces of AIM I/O Data

Chapter 5 gives a basic list of relevant terms and their definition
Chapter 6 gives suggested references

Acronyms are defined in Table 1 (below). Terms are defined in a later Table 6.

Table 1 – MPAI-SPG acronyms

Acronym Meaning
AI Artificial Intelligence
AIF AI Framework
AIM AI Module
CfT Call for Technologies
DP Data Processing
ML Machine Learning

2        The MPAI AI Framework (MPAI-AIF)

Most MPAI applications considered so far can be implemented as a set of AIMs – AI, ML and, possibly, traditional DP-based units with standard interfaces assembled in suitable topol­ogies and executed in an MPAI-defined AI Frame­work to achieve the specific goal of an application. MPAI is making all efforts to identify processing modules that are re-usable and upgradable without necessarily changing the inside logic. MPAI plans on completing the development of a 1st gener­ation MPAI-AIF AI Framework in July 2021.

The MPAI-AIF Architecture is given by Figure 1.

 Figure 1 – The MPAI-AIF Architecture

MPAI-AIF is made up of 6 Components:

  1. Management and Control manages and controls the AIMs, so that they execute in the correct order and at the time when they are needed.
  2. Execution is the environment in which combinations of AIMs operate. It receives external inputs and produces the requested outputs, both of which are Use Case-specific, activates the AIMs, exposes interfaces with Management and Control and accesses Communic­ation, Storage and Access.
  3. AI Modules (AIM) are the basic processing elements receiving specific inputs and producing specific outputs.
  4. Communication is the basic infrastructure used to connect possibly remote Components and AIMs. It can be implemented, e.g., by means of a service bus.
  5. Storage encompasses traditional storage and is used to e.g., store the inputs and outputs of the individual AIMs, intermediary results data from the AIM states and data shared by AIMs.
  6. Access represents the access to static or slowly changing data that are required by the application such as domain knowledge data, data models, etc.

3        Use Cases

3.1       AI-based Performance Prediction

A company may need to access the flow of internal (i.e., financial and governance data) and exter­nal data related to the activity of the company to assess and mon­itor its financial and organizational performance, as well as the impact of vertical risks (e.g., cyber, seismic, etc.), according to the current standards (e.g., ISO 31000 on risk assessment and management). The current version of MPAI-CUI takes into account only cyber and seismic risks that have an impact on financial per­formance. Other risks will be considered in future versions of the standard.

MPAI-CUI may be used by:

  1. The company generating the data flow to perform compression and understanding of the data for its own needs (e.g., to identify core and non-core data), to analyse its financial performance, identifying possible clues to a crisis or risk of bankruptcy years in advance. It may help the board of directors and decision-makers to make the proper decisions to avoid these situations, conduct what-if analysis, and devise efficient strategies.
  2. A financial institution that receives a request for financial help from a troubled company to access its financial and organizational data and make an AI-based assessment of that company, as well as a prediction of future performance. By having a better insight of its situation, a financial institution can make the right decision in funding or not a company.

4        Functional Requirements

4.1       AI-based Performance Prediction

4.1.1      Reference architecture

This Use Case can be implemented as in Figure 2.

Figure 2 – Compression and understanding of Industrial Data

4.1.2      AI Modules and their I/O data

The AI Modules of Figure 2 perform the functions described in Table 2.

Table 2 – AI Modules of Industrial Data Compression and Understanding

AIM Function
Data Conversion Gathers data needed for the assessment from several sources (internal and external), in different formats and covert it in a unique format (e.g., json).
Financial assessment Analyses the data generated by a company (i.e., financial statements) to assess the preliminary financial performances in the form of indexes.

Builds and extracts the financial features for the Decision tree and Pred­iction AIMs.

Governance assessment Builds and extracts the features related to the adequacy of the governance asset for the Decision tree and Pred­iction AIMs.
Risk matrix Builds the risk matrix to assess the impact of vertical risks (i.e., in this Use Case cyber and seismic).
Decision Creates the decision trees for making decisions.
Prediction Predicts values of the probability of company default in a time horizon of 36 months and of the adequacy of the organizational model.
Perturbation Perturbs the probability value of company crisis computed by Prediction, considering the impact of vertical risks on company performan­ce.

4.1.3      I/O interfaces of AI Modules

The I/O data of Data Compression and Understanding AIMs are given in Table 3.

Table 3 – I/O data of Use Case AIMs

AI Module Input Output
Data Conversion Financial statement data

Governance data

Risk assessment data

Financial statement data (converted)

Governance data (converted)

Financial assessment Financial statement data Financial features
Governance assessment Governance data Governance features
Risk matrix Technical data from internal risk assessment (i.e., cyber security) Severity
Decision Financial features, Governance features Ranking of features importance
Prediction Financial features, Governance features Probability of company crisis

Adequacy of organizational model

Perturbation Probability of company crisis (index); severity from Risk Matrix Index of business continuity

4.1.4      Technologies and Functional Requirements

4.1.4.1     Governance data (raw)

By Governance data we mean attributes that indicate the governance structure of a company and of the roles of key personnel.

The most basic roles are shareholder, manager, sole administrator, president/member of the board of directors, auditor, president/member of the statutory board of directors. They can be considered as “universal”, as commonly recognized across all countries. ISO 37000 (still under develop­ment) [6] aims at proposing a consistent set of recommendations, including definitions, for organizations in terms of governance. However, a governance data ontology is missing.

To Respondents

Respondents are invited to propose a governance data ontology that captures today’s practice at the global level.

4.1.4.2     Financial statement data (raw)

The Financial statement (raw data) are produced based on a set of accounting principles driving maintenance and reporting of company accounts, so that financial statements are consistent, trans­parent, and comparable across companies.

A set of principles [3], identified by the International Accounting Standard/International Financial Reporting Standard (IAS/IFRS), can be considered as “universal”, as they are commonly recognized across all countries. Indeed, although different countries can consider different accounting principles based on their jurisdictions, they are endorsed and standardised by the IAS/IFRS to guarantee their convergence [5].

An example of a tool that provides digital representation of Financial Statement data is the eXtensible Business Reporting Language (XBRL). The requirement of any such languages is that it should reflect the balance sheet structure in terms of assets, liabilities, and shareholders’ or owners’ equity.

The Financial statement (raw data) are converted to a standard format by the Data conversion AIM.

To Respondents

Respondent are invited to propose a digital representation of Financial statements data that is applicable to a minimum set of Financial statements having a universally valid semantics. JSON and XBRLS are primary examples of such digital representations. However, other representations are possible and may be proposed.

Respondents are invited to either select one of the two choices above or suggest alternative formats. In all cases justification of a proposal is requested.

Preference will be given to formats that have been standardised or are in wide use.

4.1.4.3     Risk assessment technical data (raw)

By Risk assessment technical data, we mean attributes that indicate the internal assessment that the company performs to identify and measure potential or existing vertical risks, and their impact on business continuity.

This data contains values of likelihood, impact, gravity, residual risk and treatments. All are and are described in ISO 31000 – “Risk management — Principles and guidelines” [7].

To Respondents

Respondents are invited to propose a digital representation of Risk assessment technical data.

4.1.4.4     Governance

This is the Governance data (raw) after conversion. JSON appears to be a convenient format.

To Respondents

Respondents are requested to comment on this choice.

4.1.4.5     Financial statement

This is the Financial statement data (raw) after conversion. JSON appears to be a convenient format.

To Respondents

Respondents are requested to comment on this choice.

4.1.4.6     Risk assessment technical data

This is the Risk assessment technical data (raw) after conversion. JSON appears to be a convenient format.

To Respondents

Respondents are requested to comment on this choice.

4.1.4.7     Financial features

Financial features are a set of indexes and ratios computed using financial statement data. Examples of Financial features are given by Table 4.

 

Table 4 – Financial features

Feature Feature value Feature type
1 Absolute value Revenue/Profit
2 Index/Percentage (%) Revenue/Profit
3 Absolute value Revenue/Profit
4 Absolute value Revenue/Profit
5 Index/Percentage (%) Revenue/Profit
6 Index/Percentage (%) Cost/Debt
7 Absolute value Cost/Debt
8 Index/Percentage (%) Cost/Debt
9 Absolute value Cost/Debt
10 Index/Percentage (%) Cost/Debt
11 Absolute value Production
12 Absolute value Production
13 Index/Percentage (%) Revenue/Profit
14 Absolute value Production
15 Index/Percentage (%) Cost/Debt

 To Respondents

Respondents are requested to propose Financial features suitable for financial assessment. The Financial features should include those given by Table 4 and may also include other features that satisfy the requirement of being extracted or computed from Financial statement data.

4.1.4.8     Governance features

Governance features are a set of indexes/parameters that are used to assess the adequacy of the organizational model. Table 5 gives examples of Governance features.

Table 5 – Governance features

Feature Feature value Feature type
1 Absolute value Decision maker data
2 Index/Percentage (%) Shareholder data
3 Absolute value Shareholder data
4 Absolute value Decision maker data
5 Absolute value Decision maker data

 To Respondents

Respondents are requested to propose Governance features suitable for assessing the suitability of governance, e.g., those reported in Table 5. Proposed Governance features shall satisfy the requir­ements of:

  1. Being extracted or computed from the Governance data.
  2. Being expressed by numerical values.
  3. Adding insight to the data of Table 5.

4.1.4.9     Severity

A set of values, each reflecting the level of risk for a specific vertical risk, cyber and seismic in the phase, as evaluated by the company. This severity is computed according to ISO 27005 [8], considering the levels of probability of occurrence, business impact and gravity of a specific risk.

To Respondents

Respondents are invited to comment on this choice or to propose and motivate alternative solutions.

4.1.4.10  Decision Tree

It is a tree-like decision model, built by starting from the financial and governance features. An example is provided by [9] where the Random Forest supervised learning method has been used to predict the probability of company crisis and bankruptcy.

To Respondents

Respondents are invited to propose a decision support tool.

4.1.4.11  Default probability

It is a score in the 0 to 1 range that represents the likelihood of company default in a specified number of future months dependent on financial data. It is computed by Prediction using the fin­ancial features and decision tree.

To respondents

Respondents are requested to comment on the description above and to propose extensions.

4.1.4.12  Adequacy of organisational model

It is a score in the 0 to 1 range that represents the adequacy of the organisational model. Its value ca be used to identify potential critical points or conflicts of interest that can lead to an increase in the risk of default. It is computed by Prediction using the governance and financial features.

To respondents

Respondents are requested to comment on the description above. Suggestions about multidim­en­sional measures of adequacy are welcome.

4.1.4.13  Business continuity index

It is a score in the 0 to 1 range that represents the likelihood of company default in a specified number of months in the future dependent on financial or non-financial data. It is computed by Perturbation using default probability and severity.

To Respondents

Respondents are requested to comment on the description above and to propose extensions.

5        Terminology

Table 6 identifies and defines the terms used in the MPAI-CUI context.

 

Table 6 – MPAI-CUI terms

Term Definition
Access Static or slowly changing data that are required by an application such as domain knowledge data, data models, etc.
AI Framework (AIF) The environment where AIM-based workflows are executed.
AI Module (AIM) The basic processing elements receiving processing specific inputs and producing processing specific outputs.
Communication The infrastructure that connects the Components of an AIF.
Data Processing (DP) A legacy technology that may be used to implement AIMs.
Decision Tree A decision support tool that uses a tree-like model of decision, given the financial and governance features.
Delivery An AIM that wraps data for transport.
Execution The environment in which AIM workflows are executed. It receives exter­nal inputs and produces the requested outputs both of which are application specific.
Financial features A set of indexes and ratios computed using financial statement data.
Financial statement Data produced based on a set of accounting principles driving maintenance and reporting of company accounts, so that financial statements can be consistent, transparent, and comparable across companies.
Governance features A set of indexes/parameters that are used to assess the adequacy of the organizational model.
Knowledge Base Structured and unstructured information made accessible to AIM (especially DP-based).
Management and Control Manages and controls the AIMs in the AIF, so that they execute in the correct order and at the time when they are needed.
Risk assessment Attributes that indicate the internal assessment that the company performs to identify and measure potential or existing vertical risks, and their impact on business continuity.
Severity A set of values, each reflecting the level of risk for that specific vertical risk as evaluated by the company.
Storage Storage used to e.g., store the inputs and outputs of the individual AIMs, data from the AIM’s state and intermediary results, shared data among AIMs.

6        References

  1. MPAI-AIF Call for Technologies, MPAI N100; https://mpai.community/standards/mpai-aif/#Technologies
  2. MPAI-AIF Use Cases and Functional Requirements, MPAI N74; https://mpai.community/standards/mpai-aif/#Requirements
  3. MPAI-AIF Framework Licence, MPAI N101; https://mpai.community/standards/mpai-aif/#Licence
  4. International Financial Reporting Standard. List of IFRS Standards. Available online: https://www.ifrs.org/issued-standards/list-of-standards/
  5. European Commission. Financial reporting. Available online: https://ec.europa.eu/info/business-economy-euro/company-reporting-and-auditing/company-reporting/financial-reporting_en
  6. International Organization for Standardization. ISO 37000 Guidance for the Governance of Organizations. Available online: https://committee.iso.org/sites/tc309/home/projects/ongoing/ongoing-1.html
  7. International Organization for Standardization. ISO 31000 Risk Management. Available online: https://www.iso.org/files/live/sites/isoorg/files/store/en/PUB100426.pdf
  8. International Organization for Standardization. ISO 27005 Information technology — Security techniques — Information security risk management
  9. Perboli G., Arabnezhad E., A Machine Learning-based DSS for Mid and Long-Term Company Crisis Prediction. To be published in Expert Systems with Applications. 2021.
  10. MPAI-CUI Use Cases & Functional Requirements; MPAI N200; https://mpai.community/standards/mpai-cui/#UCFR
  11. MPAI-CUI Framework Licence, MPAI N201; https://mpai.community/standards/mpai-cui/#Licence
  12. MPAI-CUI Call for Technologies, MPAI N202; https://mpai.community/standards/mpai-cui/#Technologies

Use Cases and Functional RequirementsFramework LicenceCall for TechnologiesTemplate for responses to the Call for TechnologiesApplication Note

Framework Licence

1        Coverage

MPAI has found that the application area called “Compression and Understanding of Industrial Data” is particularly relevant for MPAI standardisation because AI allows for substantial reduction of the amount of information produced by companies and for more in-depth analysis of the data to be carried out.

Therefore, MPAI intends to develop a standard – to be called MPAI-CUI – that will provide standard technologies to implement several Use Cases, the first of which is:

  1. AI-based Performance Prediction (APP).

The MPAI Compression and Understanding of industrial data (MPAI-CUI) standard will be defined in document NNN of Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI).

2        Definitions

Term Definition
Data Any digital representation of a real or computer-generated entity, such as moving pictures, audio, point cloud, computer graphics, sensor and actu­ator. Data includes, but is not restricted to, media, manufacturing, auto­mot­ive, health and generic data.
Development Rights License to use MPAI-CUI Essential IPRs to develop Implementations
Enterprise Any commercial entity that develops or implements the MPAI-CUI standard
Essential IPR Any Proprietary Rights, (such as patents) without which it is not possible on technical (but not commercial) grounds, to make, sell, lease, otherwise dispose of, repair, use or operate Implementations without infringing those Proprietary Rights
Framework License A document, developed in compliance with the gener­ally accepted principles of competition law, which contains the conditions of use of the License without the values, e.g., currency, percent, dates etc.
Implementation A hardware and/or software reification of the MPAI-CUI standard serving the needs of a professional or consumer user directly or through a service
Implementation Rights License to reify the MPAI-CUI standard
License This Framework License to which values, e.g., currency, percent, dates etc., related to a specific Intellectual Property will be added. In this Framework License, the word License will be used as singular. However, multiple Licenses from different IPR holders may be issued
Profile A particular subset of the technologies that are used in MPAI-CUI standard and, where applicable, the classes, subsets, options and parameters relevant to the subset

3        Conditions of use of the License

  1. The License will be in compliance with generally accepted principles of competition law and the MPAI Statutes
  2. The License will cover all of Licensor’s claims to Essential IPR practiced by a Licensee of the MPAI-CUI standard.
  3. The License will cover Development Rights and Implementation Rights
  4. The License for Development and Implementation Rights, to the extent it is developed and implemented only for the purpose of evaluation or demo solutions or for technical trials, will be free of charge
  5. The License will apply to a baseline MPAI-CUI profile and to other profiles containing additional technologies
  6. Access to Essential IPRs of the MPAI-CUI standard will be granted in a non-discriminatory fashion.
  7. The scope of the License will be subject to legal, bias, ethical and moral limitations
  8. Royalties will apply to Implementations that are based on the MPAI-CUI standard
  9. Royalties will apply on a worldwide basis
  10. Royalties will apply to any Implementation, with the exclusion of the type of implementations specified in clause 4
  11. An MPAI-CUI Implementation may use other IPR to extend the MPAI-CUI Implementation or to provide additional functionalities
  12. The License may be granted free of charge for particular uses if so decided by the licensors
  13. A license free of charge for limited time and a limited amount of forfeited royalties will be granted on request
  14. A preference will be expressed on the entity that should administer the patent pool of holders of Patents Essential to the MPAI-CUI standard
  15. The total cost of the Licenses issued by IPR holders will be in line with the total cost of the Licenses for similar technologies standardised in the context of Standard Development Organisations
  16. The total cost of the Licenses will take into account the value on the market of the AI Framework technology Standardised by MPAI.

Use Cases and Functional RequirementsFramework LicenceCall for TechnologiesTemplate for responses to the Call for TechnologiesApplication Note

Call for Technologies

1       Introduction. 1

2       How to submit a response. 1

3       Evaluation Criteria and Procedure. 1

4       Expected development timeline. 1

5       References. 1

Annex A: Information Form.. 1

Annex B: Evaluation Sheet 1

Annex C: Requirements check list 1

Annex D: Technologies that may require specific testing. 1

Annex E: Mandatory text in responses. 1

1        Introduction

Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) is an international non-profit organisation with the mission to develop standards for Artificial Intelligence (AI) enabled digital data coding and for technologies that facilitate integration of data coding components into ICT systems. With the mechanism of Framework Licences, MPAI seeks to attach clear IPR licen­sing frameworks to its standards.

MPAI has found that the application area called “Compression and Understanding of  Industrial Data” is particul­arly relevant for MPAI standardisation because AI allows for substantial reduction of the amount of information produced by companies and for more in-depth analysis of the data to be carried out.

Therefore, MPAI intends to develop a standard – to be called MPAI-CUI – that will provide standard tech­nologies to implement several Use Cases, the first of which is:

  1. AI-based Performance Prediction (APP)

This document is a Call for Technologies (CfT) that

  1. Satisfy the MPAI-CUI Functional Requirements (N200) [4] and
  2. Are released according to the MPAI-CUI Framework Licence (N202) [5], if selected by MPAI for inclusion in the MPAI-CUI standard.

The standard will be developed with the following guidelines:

  1. To satisfy the Functional Requirements (N200) [4]. In the future, MPAI may decide to extend MPAI-CUI to support other Use Cases.
  2. To be suitable for implementation as AI Modules (AIM) conforming to the MPAI AI Framework (MPAI-AIF) standard which is being based on the responses to the Call for Technologies (N100) [1] satisfying the MPAI-AIF Functional Requirements (N74) [1].

Rather than follow the approach of defining end-to-end systems, MPAI has decided to base its application standards on the AIM and AIF notions. The AIF functional requirements have been identified in [1], while the AIM requirements are Use Case-specific. It has done so because:

  1. AIMs allow the reduction of a large problem to a set of smaller problems.
  2. AIMs can be independently developed and made available to an open competitive market.
  3. An implementor can build a sophisticated and complex system with potentially limited know­ledge of all the tech­nologies required by the system.
  4. MPAI systems are inherently explainable.
  5. MPAI systems allow for competitive comparisons of functionally equivalent AIMs.

Respondents should be aware that:

  1. The currently addressed MPAI-CUI Use Case and the AIM internals will be non-normative.
  2. The input and output interfaces of the AIMs, whose requirements have been derived to support the Use Case, will be normative.

Therefore, the scope of this Call for Technologies is restricted to technologies required to implement the input and output interfaces of the AIMs identified in N200 [4].

However, MPAI invites comments on any technology or architectural component identified in N200, specifically,

  1. Additions or removals of input/output data to the identified AIMs with justification of the changes and identification of data formats required by the new input/output signals.
  2. Possible alternative partitioning of the AIMs implementing the Use Case providing:
    1. Arguments in support of the proposed partitioning.
    2. Detailed specifications of the input and output data of the proposed new AIMs.
  3. New fully described Use Cases.

All parties who believe they have relevant technologies satisfying all or most of the requirements of the Use Case described in N200 are invited to submit proposals for consid­eration by MPAI. MPAI membership is not a prerequisite for responding to this CfT. However, proponents should be aware that, if their proposal or part thereof is accepted for inclusion in the MPAI-CUI standard, they shall immediately join MPAI, or their accepted technologies will be discarded.

MPAI will select the most suitable technologies based on their technical merits for inclusion in MPAI-CUI. However, MPAI in not obligated, by virtue of this CfT, to select a particular tech­nology or to select any technology if those submitted are found inadequate.

Submissions are due on 2021/05/10T23:59 UTC and should be sent to the MPAI secretariat (secretariat@mpai.community). The secretariat will acknowledge receipt of the submissions via email. Submissions will be reviewed according to the schedule that the 8th MPAI General Assembly (MPAI-8) will define at its online meeting on 2021/05/12. Please contact the MPAI secretariat (secretariat@mpai.community),for details on how submitters who are not MPAI members can attend the said review.

2        How to submit a response

Those planning to respond to this Call for Technologies are:

  1. Advised that online events will be held on 2021/03/31 and 2021/04/07 to present the MPAI-CUI Call for Technologies and respond to questions. Logistic information on these events will be posted on the MPAI web site.
  2. Requested to communicate their intention to respond to this CfT with an initial version of the form of Annex A to the MPAI secretariat (secretariat@mpai.community) by 2021/04/13. A potential submitter making a communication using the said form is expected but not required to actually make a submission. A submission will be accepted even if the submitter did not communicate their intention to submit a response by the said date.
  3. Advised to visit regularly the MPAI web site where relevant information will be posted.

Responses to this MPAI-CUI CfT shall/may include the elements described in Table 1:

Table 1 – Mandatory and optional elements of a response

Item Status
Detailed documentation describing the proposed technologies mandatory
The final version of Annex A mandatory
The text of Annex B duly filled out with the table indicating which requirements identified in MPAI N200 [4] are satisfied. If all Functional Requirements are not satisfied, this should be explained. mandatory
Comments on the completeness and appropriateness of the Functional Requir­ements and any motivated suggestion to amend or extend them. optional
A preliminary demonstration, with a detailed document describing it. optional
Any other additional relevant information that may help evaluate the submission, such as additional use cases. optional
The text of Annex E. mandatory

Respondents are invited to take advantage of the check list of Annex C before submitting their response and filling out Annex A.

Respondents are requested to present their submission (mandatory) at a meeting by teleconference that the MPAI Secretariat will properly announce to submitters. If no presenter will attend the meeting, the proposal will be discarded.

Respondents are advised that, upon acceptance by MPAI of their submission in whole or in part for further evaluation, MPAI will require that:

  • A working implementation, including source code – to be used in the development of the MPAI-CUI Reference Software and later publication as a standard by MPAI – be made available before the technology is accepted for inclusion in the MPAI-CUI standard. Software may be written in a programming language that can be compiled or interpreted or in a hardware description language.
  • The working implementation be suitable for operation in the MPAI AI Framework (MPAI-AIF).
  • A non-MPAI member immediately join MPAI. If the non-MPAI member elects not to do so, their submission will be discarded. Direction on how to join MPAI can be found online.

Further information on MPAI can be obtained from the MPAI website.

3        Evaluation Criteria and Procedure

Proposals will be assessed using the following process:

  1. Evaluation panel is created from:
    1. All CUI-DC members attending.
    2. Non-MPAI members who are respondents.
    3. Non respondents/non MPAI member experts invited in a consulting capacity.
  2. No one from 1.1.-1.2. will be denied membership in the Evaluation panel.
  3. Respondents present their proposals.
  4. Evaluation Panel members ask questions.
  5. If required subjective and/or objective tests are carried out:
    1. Define required tests.
    2. Carry out the tests.
    3. Produce report.
  6. At least 2 reviewers will be appointed to review & report about specific points in a proposal if required.
  7. Evaluation panel members fill out Annex B for each proposal.
  8. Respondents respond to evaluations.
  9. Proposal evaluation report is produced.

4        Expected development timeline

Timeline of the CfT, deadlines and response evaluation:

Table 2 – Dates and deadlines

Step Date Meeting
Call for Technologies 2021/03/17 MPAI-6
CfT introduction conference call 1 2021/03/31T15:00 UTC
CfT introduction conference call 2 2021/04/07T15:00 UTC
Notification of intention to submit proposal 2021/04/13T23.59 UTC
Submission deadline 2021/05/10T23.59 UTC
Evaluation of responses will start 2021/05/12 MPAI-8

Evaluation to be carried out during 2-hour sessions according to the calendar agreed at MPAI-8.

5        References

  1. MPAI-AIF Use Cases & Functional Requirements, N74; https://mpai.community/standards/mpai-aif/
  2. MPAI-AIF Framework Licence, MPAI N101; https://mpai.community/standards/mpai-aif/#Licence
  3. MPAI-AIF Call for Technologies, N100; https://mpai.community/standards/mpai-aif/#Technologies
  4. MPAI-CUI Use Cases & Functional Requirements; MPAI N200; https://mpai.community/standards/mpai-cui/#UCFR
  5. MPAI-CUI Framework Licence, MPAI N201; https://mpai.community/standards/mpai-cui/#Licence
  6. MPAI-CUI Call for Technologies, MPAI N202; https://mpai.community/standards/mpai-cui/#Technologies

Annex A: Information Form

This information form is to be filled in by a Respondent to the MPAI-CUI CfT.

The purpose of this Annex is to collect data that facilitate the organisation of submission evalu­ation. Therefore, submitters are requested to only provide such data as Use Case(s) considered, types of technologies proposed, special requirements for (optional) demonstration and any other information that is functional to the evaluation of the submission.

  1. Title of the proposal
  2. Organisation: company name, position, e-mail of contact person
  3. What are the main functionalities of your proposal?
  4. Does your proposal provide or describe a formal specification and APIs?
  5. Will you provide a demonstration to show how your proposal meets the evaluation criteria?

Parties sending this Annex A are

  1. This Annex A should be only sent to the Secretariat
  2. Points 1., 3., 4., and 5. above will be made known to MPAI members. Point 2. will not be disclosed.
  3. The full submissions will be made available to MPAI members after the submission deadline of 2021/04/10.
  4. The Secretariat will not accept any confidential inforamtion at the time expression of interest is communicated to the Secretariat.

Annex B: Evaluation Sheet

NB: This evaluation sheet will be filled out by members of the Evaluation Team.

Proposal title:

Main Functionalities:

Response summary: (a few lines)

Comments on Relevance to the CfT (Requirements):

Comments on possible MPAI-CUI profiles[1]

Evaluation table:

Table 3Assessment of submission features

Note 1 The semantics of Submission features is provided by Table 4
Note 2 Evaluation elements indicate the elements used by the evaluator in assessing the submission
Note 3 Final Assessment indicates the ultimate assessment based on the Evaluation Elements

 

Submission features Evaluation elements Final Assessment
Completeness of description

Understandability

Extensibility

Use of Standard Technology

Efficiency

Test cases

Maturity of reference implementation

Relative complexity

Support of non-MPAI use cases

Content of the criteria table cells:

Evaluation facts should mention:

  • Not supported / partially supported / fully supported.
  • What supported these facts: submission/presentation/demo.
  • The summary of the facts themselves, e.g., very good in one way, but weak in another.

Final assessment should mention:

  • Possibilities to improve or add to the proposal, e.g., any missing or weak features.
  • How sure the evaluators are, i.e., evidence shown, very likely, very hard to tell, etc.
  • Global evaluation (Not Applicable/ –/ – / + / ++)

New Use Cases/Requirements Identified:

(please describe)

Evaluation summary:

  • Main strong points, qualitatively:
  •  Main weak points, qualitatively:
  • Overall evaluation: (0/1/2/3/4/5)

0: could not be evaluated

1: proposal is not relevant

2: proposal is relevant, but requires significant more work

3: proposal is relevant, but with a few changes

4: proposal has some very good points, so it is a good candidate for standard

5: proposal is superior in its category, very strongly recommended for inclusion in standard

Additional remarks: (points of importance not covered above.)

The submission features in Table 3 are explained in the following Table 4.

Table 4 – Explanation of submission features

Submission features Criteria
Completeness of description Evaluators should

1.     Compare the list of requirements (Annex C of the CfT) with the submission.

2.     Check if respondents have described in sufficient detail to what part of the requirements their proposal refers to.

NB1: Completeness of a proposal for a Use Case is a merit because reviewers can assess that the components are integrated.

NB2: Submissions will be judged for the merit of what is proposed. A submission on a single technology that is excellent may be considered instead of a submission that is complete but has a less performing technology.

Understandability Evaluators should identify items that are demonstrably unclear (incon­sistencies, sentences with dubious meaning etc.)
Extensibility Evaluators should check if respondent has proposed extensions to the Use Cases.

NB: Extensibility is the capability of the proposed solution to support use cases that are not supported by current requirements.

Use of standard technology Evaluators should check if new technologies are proposed while widely adopted technologies exist. If this is the case, the merit of the new tech­nology shall be proved.
Efficiency Evaluators should assess power consumption, computational speed, computational complexity.
Test cases Evaluators should report whether a proposal contains suggestions for testing the technologies proposed
Maturity of reference implementation Evaluators should assess the maturity of the proposal.

Note 1: Maturity is measured by its completeness, i.e., by disclosing all the necessary information and appropriate parts of the HW/SW implem­entation of the submission.

Note 2: If there are parts of the implementation that are not disclosed but demonstrated, they will be considered if and only if such com­ponents are replicable.

Relative complexity Evaluators should identify issues that would make it difficult to implement the proposal compared to the state of the art.
Support of non MPAI-CUI use cases Evaluators should check whether the technologies proposed can demonstrably be used in other significantly different use cases.

Annex C: Requirements check list

Please note the following acronyms

KB Knowledge Base
QF Query Format

Table 5 – List of technologies identified in MPAI-CUI N200 [4]

Note: The numbers in the first column refer to the section numbers of N200 [4].

Technologies Response
4.1.4.1 Governance data (raw) Y/N
4.1.4.2 Financial statement data (raw) Y/N
4.1.4.3 Risk assessment technical data (raw) Y/N
4.1.4.4 Governance Y/N
4.1.4.5 Financial statement Y/N
4.1.4.6 Risk assessment technical data Y/N
4.1.4.7 Financial features Y/N
4.1.4.8 Governance features Y/N
4.1.4.9 Severity Y/N
4.1.4.10 Decision Tree Y/N
4.1.4.11 Default probability Y/N
4.1.4.12 Adequacy of organisational model Y/N
4.1.4.13 Business continuity index Y/N

Annex D: Technologies that may require specific testing

Financial features
Governance features
Decision Tree

Additional technologies may be identified during the evaluation phase.

Annex E: Mandatory text in responses

A response to this MPAI-CUI CfT shall mandatorily include the following text

<Company/Member> submits this technical document in response to MPAI Call for Technologies for MPAI project MPAI-CUI (N202).

 <Company/Member> explicitly agrees to the steps of the MPAI standards development process defined in Annex 1 to the MPAI Statutes (N80), in particular <Company/Member> declares that  <Com­pany/Member> or its successors will make available the terms of the Licence related to its Essential Patents according to the Framework Licence of MPAI-CUI (N201), alone or jointly with other IPR holders after the approval of the MPAI-CUI Technical Specif­ication by the General Assembly and in no event after commercial implementations of the MPAI-CUI Technical Specification become available on the market.

In case the respondent is a non-MPAI member, the submission shall mandatorily include the following text

If (a part of) this submission is identified for inclusion in a specification, <Company>  understands that  <Company> will be requested to immediately join MPAI and that, if  <Company> elects not to join MPAI, this submission will be discarded.

Subsequent technical contribution shall mandatorily include this text

<Member> submits this document to MPAI-CUI Development Committee (CUI-DC) as a con­tribution to the development of the MPAI-CUI Technical Specification.

 <Member> explicitly agrees to the steps of the MPAI standards development process defined in Annex 1 to the MPAI Statutes (N80), in particular  <Company> declares that <Company> or its successors will make available the terms of the Licence related to its Essential Patents according to the Framework Licence of MPAI-CUI (N201), alone or jointly with other IPR holders after the approval of the MPAI-CUI Technical Specification by the General Assembly and in no event after commercial implementations of the MPAI-CUI Technical Specification become available on the market.

[1] Profile of a standard is a particular subset of the technologies that are used in a standard and, where applicable, the classes, subsets, options and parameters relevan for the subset


Use Cases and Functional RequirementsFramework LicenceCall for TechnologiesTemplate for responses to the Call for TechnologiesApplication Note

Template for responses to the MPAI-CUI Call for Technologies

Abstract

This document is provided as a help to those who intend to submit responses to the MPAI-CUI Call for Technologies. Text in res(as in this sentence) provides guidance to submitters and should not be included in a submission. Text in green shall be mandatorily included in a submission. If a submission does not include the green text, the submission will be rejected.

If the submission is in multiple files, each file shall include the green statement.

Text in white is the text suggested to respondents for use in a submission.

1        Introduction

This document is submitted by <organisation name> (if an MPAI Member) and/or by <organ­is­ation name>, a <company, university etc.> registered in … (if a non-MPAI member) in response to the MPAI-CUI Call for Technol­ogies issued by Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) on 2021/03/17 as MPAI document N191.

In the opinion of the submitter, this document proposes technologies that satisfy the requirements of MPAI document MPAI-CUI Use Cases & Functional Requirements issued by MPAI on 2020/03/17 as MPAI document N151.

Possible additions

This document also contains comments on the requirements as requested by N200.

This document also contains proposed technologies that satisfy additional requirements as allowed by N189.

<Company and/or Member> explicitly agrees to the steps of the MPAI standards development process defined in Annex 1 to the MPAI Statutes (N80), in particular <Company and or Member> declares that  <Company and or Member> or its successors will make available the terms of the Licence related to its Essential Patents according to the MPAI-CUI Framework Licence (N201), alone or jointly with other IPR holders after the approval of the MPAI-CUI Technical Specif­ication by the MPAI General Assembly and in no event after commercial implem­entations of the MPAI-CAE Technical Specification become available on the market.

< Company and/or Member> acknowledges the following points:

  1. MPAI in not obligated, by virtue of this CfT, to select a particular technology or to select any technology if those submitted are found inadequate.
  2. <Company and/or Member> plans on having a representative to present this submission at a CAE-DC meeting communicated by MPAI Secretariat (mailto:secretariat@mpai.community). <Company and/or Member> acknowledges that, in no representative will attend the meeting and present the submission, this sub­mission will be discarded.
  3. <Company and/or Member> plans on making available a working implementation, including source code – for use in the development of the MPAI-CUI Reference Software and eventual public­ation by MPAI as a normative standard – before the technology submitted is accepted for the MPAI-CUI standard.
  4. The software submitted may be written in a programming language that can be compiled or interpreted or in a hardware description language, upon acceptance by MPAI for further eval­uation of their submission in whole or in part.
  5. <Company> shall immediately join MPAI upon acceptance by MPAI for further evaluation of this submission in whole or in part.
  6. If <Company> does not join MPAI, this submission shall be discarded.

2        Information about the submission

This information corresponds to Annex A on N191. It is included here for submitter’s convenience.

  1. Title of the proposal
  2. Organisation: company name, position, e-mail of contact person
  3. What are the main functionalities of your proposal?
  4. Does your proposal provide or describe a formal specification and APIs?
  5. Will you provide a demonstration to show how your proposal meets the evaluation criteria?

3        Comments on/extensions to requirements (if any)

 

4        Overview of Requirements supported by the submission

Please answer Y or N. Detail on the specific answers can be provided in the submission.

Technologies Response
4.1.4.1 Governance data (raw) Y/N
4.1.4.2 Financial statement data (raw) Y/N
4.1.4.3 Risk assessment technical data (raw) Y/N
4.1.4.4 Governance Y/N
4.1.4.5 Financial statement Y/N
4.1.4.6 Risk assessment technical data Y/N
4.1.4.7 Financial features Y/N
4.1.4.8 Governance features Y/N
4.1.4.9 Severity Y/N
4.1.4.10 Decision Tree Y/N
4.1.4.11 Default probability Y/N
4.1.4.12 Adequacy of organisational model Y/N
4.1.4.13 Business continuity index Y/N

5        New Proposed requirements (if any)

 

1. Y/N
2. Y/N
3. Y/N

 

 

6. Detailed description of submission

6.1       Proposal chapter #1

6.2       Proposal chapter #2

….

7        Conclusions


Use Cases and Functional RequirementsFramework LicenceCall for TechnologiesTemplate for responses to the Call for TechnologiesApplication Note

MPAI Application Note #7

Compression and Understanding of Industrial Data (MPAI-CUI)

Proponents: Guido Perboli (POLITO), Valeria Lazzaroli (Arisk), Mariangela Rosano (POLITO)

Description: Most economic organizations, e.g., companies, etc., produce large quantities of data, often because these are required by regulation. Users of these data maybe the company itself or Fintech and Insurtech services who need to access the flow of company data to assess and mon­itor financial and organizational performance, as well as the impact of vertical risks (e.g., cyber, seismic, etc.). For example, nowadays companies heavily rely on the security and dependability of their Information System for all the categories of workers, including the management of Industrial Control Systems. Adding into the risk analysis process cybersecurity-related parameters will help a more precise estimation of the actual risk exposure. Cybersecurity data will help a reassessment of financial parameters based on risk analysis data.

The sheer amount of data that need to be exchanged is an issue. Analysing those data by humans is typically on­erous and may miss vitally important information. Artificial intelligence (AI) may help reduce the amount of data with a controlled loss of information and extract the most relevant information from the data. AI is considered the most promising means to achieve the goal.

Unfortunately, the syntax and semantics of the flow of data is high dependent on who has produced the data. The format of the date is typically a text file with a structure not designed for indexing, search and ex­traction. Therefore, in order to be able to apply AI technologies to meaningfully reduce the data flow, it is necessary to standardize the formats of the components of the data flow and make the data “AI friendly”.

Comments:

Recent regulations are imposing a constant monitoring (ideally monthly). Thus, there is the pos­sibility to have similar blocks of data in temporally consecutive sequences of data.

The company generating the data flow may need to perform compression and understanding for its own needs (e.g., to identify core and non-core data). Subsequent entities may perform further data compres­sion and transformation.

In general, compressed data should allow for easy data search and extraction.

In a first phase MPAI-CUI is addressing primarily risk identification.

Examples

MPAI-CUI may be used in a variety of contexts, such as:

  1. To support the company’s board in deploying efficient strategies. A company can analyse its financial performance, identifying possible clues to the crisis or risk of bankruptcy years in advance. It may help the board of directors and decision-makers to make the proper decisions to avoid these situations, conduct what-if analysis, and devise efficient strategies.
  2. To assess the financial health of companies that apply for funds/financial help. A financial institution that receives a request for financial help from a troubled company, can access its financial and organizational data and make an AI-based assessment of that company, as well as a prediction of future performance. This aids the financial institution to take the right decision in funding or not that company, having a broad vision of its situation.

To assess the risk in different fields considering non-core data (e.g., non-financial data). Accurate and targeted sharing of core and non-core data that ranges from the financial and organizational information to other types of risks that affect the business continuity (e.g., environmental, seismic, infrastructure, and cyber). As an example, the cybersecurity preparedness status of a company allow better estimation of the average production parameters, like the expected number of days of production lost, which are affected by cyberattacks (e.g., days the industrial plants are stopped, days personnel cannot perform their work due to unavailability of the information system). Several parameters need to be considered, which are obtained by direct acquisition of data from the target companies that perform a cybersecurity risk analysis.

  1. To analyse the effects of disruptions on the national economy, e.g., performance evaluation by pre/post- pandemic analysis [1].

Requirements:

  1. The standards of the data in the data flow should be AI friendly. In other words, the different data required to predict a crisis/bankruptcy of a company should be gathered, carefully selected and eventually completed to be suitable for automatic analysis and then treated by an AI-based algorithm.
  2. The standard shall ensure efficiency of data structure, indexing and search, according to specific syntax and semantics.
  3. The standard shall all the extraction of main parameters with indication of their semantics.
  4. The standard shall support context-based compression (i.e., depending on the sequence of data).
  5. The standard shall support lossless compression.
  6. The standard shall support context-based filtering with different levels of details.

Object of standard:

Two main areas of standardization are identified:

  1. Inputs objects: a first set of data input related to:
    1. Financial data input
      1. Financial statements and fiscal yearly reports data (usually expressed in xls or xbrl formats). Their contents follow the accounting standards defined by the Organismo Italiano Contabilità (OIC) at the Italian level and the International Accounting Standards Board (IAS/IFRS) at the international level.
      2. Invoices. In Italy, the FatturaPA format is expressed in xml, but more in general invoices have to be compliant with the European standard EN 16931-1:2017.
  • Semantics of governance elements.
  1. Other economic data as company size uniformly recognized according to the size of employees or to the economic activities (e.g., classification elaborated by Eurostat and OECD data); imports and exports, etc…
  1. Vertical risks data input: in a preliminary phase, we will consider seismic and cyber, as vertical risks of primary interest. In the future, the object of the standard could be extended to cover other risks (e.g., related to infrastructures, sustainability). Generally, at the international level, the ISO/IEC 31000 standard defines the principles and guidelines related to the input data to consider for the risk assessment and management.
    1. Seismic risk. AI algorithm may help to define a socio-economic and technological model that will support companies and institution in defining reconstruction plan properly. In this direction, data input to assess seismic risk according to ASTM standards, will integrate:
    2. Technical data related to the existing/needed infrastructures (i.e., geolocation coordinates), architectural and urban planning data, as well as output data from the Building Information Modelling;
    3. Socio-demographic data, i.e., statistical data collected by certified sources (e.g., ISTAT in Italy, World Bank, International Financial Statistics, Worl Economic Outlook Databases, International Monetary Fund Statistics Data) about population figures, and their characteristics and distribution.
    4. Cyber risks. Considering cybersecurity-related parameters in the risk assessment will help to understand and estimate the impact of the actual risk exposure on the company performance, its financial health and business continuity.

As an example, an effective system to back up the sensitive data and testing periodically its effectiveness can be of help. Moreover, having well defined incident responses and a team prepared to deal with them, can help minimizing the effects of attacks and the time to recover. Therefore, an initial set of internationally recognised (ISO/IEC 27000 Information security management) inputs to consider are:

  1. Data related to assessment of organizational cyber management:
    1. organizational-level incident management (enumerate): no, simple plans available, detailed plan + IR Team, full integrated management (e.g., with a security operations center)
    2. backup management: no, user requested, automatic, automatic and tested
    3. vulnerability management (enumerate): no, assessement, management plans, with automatic tools
    4. enterprise patch management (enumerate): no, manual, automatic, testing
    5. specific cybersecurity and testing personnel (enumerate): no cybersecurity tasks, IT personnel with cybersecurity tasks, cybersecurity-trained roles
    6. cybersecurity procedure and mitigation testing (enumerate): no, occasional, planned, planned&frequent
    7. risk analysis: no, threats identified, assessment available, some mitigations implemented, full (mitigations implemented or justified, risks quantified)
  2. Data related to prevention of cyber-attacks. Being able to detect anomalies into an information system can allow preventing some attacks or discovering them before attackers, which can be estimated considering:
  3. monitoring: no, basic detection, integrated detection, organization-level, Security Information and Event Management (SIEM) software
  4. reaction: no, planned pre-configured responses, organized (some autom­atic), integrated tool-based & human supervised
  5. Data related to training. Studies report that the personnel that has followed spec­ific training on cybersecurity aspects that are relevant for their roles are more likely to avoid errors that may compromise the information systems of their companies (e.g., use better password, less likely to click on phishing emails). Training is even more important for personnel with cybersecurity-related responsibilities, hence:
  6. awareness/training cybersecurity personnel: no, occasional, frequent training
  7. awareness/training other categories: no, occasional, frequent, frequent and tailored per tasks.
  8. Data related to legal issues. An additional field where measuring the preparedness is the risk of losses due to legal issues (e.g., GDPR fines for
  9. availability of cybesecurity certification (enumerate): no, certifications relevant for the company business obtained
  10. compliance to regulations (enumerate): no, minimum, adequate
  11. Data for quantifying exposure. Whenever available, the following aggregated values can help to quantify the overall exposure to cybersecurity risks:
  12. overall risk exposure: monetization value from the risk assessment phase (number, directly obtained by the target company)
  13. mitigated risk exposure: monetization value after the risk mitigation phase (number, directly obtained by the target company)
  14. value of assets by security requirement: (Confidentiality-asset value/ Integrity-asset value / Availability-asset value)
  15. percentage of the value of assets-by-security-requirement on the overall value of the company assets: (Confidentiality-asset value/ Integrity-asset value / Availability-asset value).
  16. Output objects: represented by the outcome of the AI-based assessment in a format known by the user (format: Json)? This format is expressed in terms of a set of indexes that reflect the health of a company, the appropriateness of the governance and the impact of risks on economic-financial parameters. Some of these indexes are in response to legal requirements on business bankruptcy and crisis (e.g., DSCR), others are computed by means of a proprietary machine learning algorithm.

More in detail, the parameters in output are:

  • Risk index of the likelihood of company default in a time horizon of 36 months. It reflects the company performances based on the financial data.
  • Index of business continuity reflects the impacts of other risks on the previous measure.
  • Index of adequacy of the organizational model considers the impact of the governance on the performance of the company, highlighting possible issues in terms of conflict of interest or familiarity.
  • Debt service coverage ratio is a measurement of a firm’s available cash flow to pay current debt obligations

This is depicted in Figure 1 where the object of the standard is identified to be the intermediate format and the AI machine output.

Figure 1 – MPAI-CUI model

In some cases, internationally agreed input data formats exist. In several other cases a variety of formats exist. In these cases, meta formats to which existing formats can be converted should be defined.

The current framework has been made as general as possible taking into consideration the wide range of issues related to risk management. We are expecting the architecture to be enriched and extended according to inclusion of other risks and eventually the synergies with other MPAI applications.

Data confidentiality, privacy issues, etc. are for further consideration.

Benefits:  MPAI-CUI will bring benefits positively affecting:

  1. Technology providers need not develop full applications to put to good use their technol­ogies. They can concentrate on gradually introducing AI-based technologies that will allow a transition from traditionally approaches based on statistical methods, overcoming their limitations. This will enhance the accuracy of prediction and improve user experience.
  2. Service providers (e.g., Fintech and Insurtech companies, advisors, banks) can deliver accurate products and services supporting the decision-making process, with minimising time to market as the MPAI-CUI framework enables easy combination of internal and external-party compon­ents.
  3. End users as companies and local government can obtain an AI-decision support system to assess the financial health and deploy efficient strategies and action plans.
  4. Processing modules can be reused for different risk management applications.

Bottlenecks: The full potential of AI in MPAI-CUI would be limited by a market of AI-friendly data providers and by the adoption of a vast amount of information and data strictly dependent from the company and its context.

Social aspects:

A simplified access to the technologies under the MPAI-CUI standard will offer end users AI-based products promising for predictions and supporting the decision making in different contexts, reducing the effort of user in analyzing data and improving its experience which becomes more personal, but including a wide vision (e.g., thought benchmarking).

Moreover, the MPAI-CUI standard and the introduction of AI-based technologies will allow a transition from the present systems which are human readable, to machine readable technologies and services.

At the national level, governments can simulate the effects of public interventions and deploy proper strategies and plans in supporting the companies and economy.

Success criteria:

MPAI CUI becomes the bridge between traditional approaches in compliance with the actual regulation on prediction of business crisis and fully AI-based systems.

References

[1] Perboli G., Arabnezhad E., A Machine Learning-based DSS for Mid and Long-Term Company Crisis Prediction. CIRRELT-2020-29. July 2020.


MPAI-MMC

Multimodal Conversation

Multi-modal conversation (MPAI-MMC) aims to enable human-machine conversation that emulates human-human conversation in completeness and intensity by using AI.


Use Cases and Functional RequirementsFramework LicenceCall for TechnologiesTemplate for responses to the Call for TechnologiesApplication NoteClarifications

Clarifications of the Call for Use Cases and Functional Requirements

MPAI-5 has approved the MPAI-MMC Use Cases and Functional Requirements as attachment to the Call for Technologies N173. However, MMC-DC has identified some issues that are worth a clarification. This is posted on the MPAI web site and will be com­mun­icated to those who have informed the Secretariata of their intention to respond.

General issue

MPAI understands that the scope of both N151 and N153 is very broad. Therefore it reiterates the point made in N152 and N154 that:

Completeness of a proposal for a Use Case is a merit because reviewers can assess that the components are integrated. However, submissions will be judged for the merit of what is proposed. A submission on a single technology that is excellent may be considered instead of a submission that is complete but has a less performing technology.

Multimodal Question Answering (Use case #2 in N153)

MPAI welcomes submission that propose a standard set of “type of question intentions” and the means to indicate the language used in the Query Format.

MPAI welcomes proposals that propose a concept format for Reply in addition to or instead of a text format.

The assessment of submissions by Respondents who elect to not consider this point in their submission will not influence the assessment of the rest of their submission

References

  1. MPAI-MMC Use Cases & Functional Requirements; MPAI N153; https://mpai.community/standards/mpai-mmc/#UCFR
  2. MPAI-MMC Call for Technologies, MPAI N154; https://mpai.community/standards/mpai-mmc/#Technologies
  3. MPAI-MMC Framework Licence, N173; https://mpai.community/standards/mpai-mmc/#Licence

Use Cases and Functional Requirements

This document is also available in MS format MPAI-MMC Use Cases and Functional Requirements

1       Introduction.

2       The MPAI AI Framework (MPAI-AIF)

3       Use Cases.

3.1       Conversation with emotion (CWE)

3.1.1       Multimodal Question Answering (MQA)

3.1.2       Personalised Automatic Speech Translation (PST)

4       Functional Requirements.

4.1       Introduction.

4.2       Conversation with Emotion.

4.2.1       Implementation architecture.

4.2.2       AI Modules.

4.2.3       I/O interfaces of AI Modules.

4.2.4       Technologies and Functional Requirements.

4.3       Multimodal Question Answering.

4.3.1       Implementation Architecture.

4.3.2       AI Modules.

4.3.3       I/O interfaces of AI Modules.

4.3.4       Technologies and Functional Requirements.

4.4       Personalized Automatic Speech Translation.

4.4.1       Implementation Architecture.

4.4.2       AI Modules.

4.4.3       I/O interfaces of AI Modules.

4.4.4       Technologies and Functional Requirements.

5       Potential common technologies.

6       Terminology.

7       References.

1        Introduction

Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) is an international association with the mission to develop AI-enabled data coding standards. Research has shown that data coding with AI-based technologies is more efficient than with existing technologies.

The MPAI approach to developing AI data coding standards is based on the definition of standard interfaces of AI Modules (AIM). AIMs operate on input data having a standard format to provide output data having a standard format. AIMs can be combined and executed in an MPAI-specified AI-Framework called MPAI-AIF. The MPAI-AIF standard is being developed based on the responses to the Call for MPAI-AIF Technologies (N100) [2] satisfying the MPAI-AIF Use Cases and Functional Requirements (N74) [1].

While AIMs must expose standard interfaces to be able to operate in an MPAI AI Framework, the technologies used to implement them may influence their performance. MPAI bel­ieves that com­peting developers striving to provide more performing proprietary and inter­operable AIMs will promote horizontal markets of AI solutions that build on and further promote AI innovation.

This document is a collection of Use Cases and Functional Requirements for the MPAI Multi­modal Conversation (MPAI-MMC) application area. The MPAI-MMC Use Cases enable human-machine conversation that emulates human-human conversation in com­pleteness and intensity. Currently MPAI has identified three Use Cases falling in the Multimodal Communication area:

  1. Conversation with emotion (CWE)
  2. Multimodal Question Answering (MQA)
  3. Personalized Automatic Speech Translation (PST)

This document is to be read in conjunction with the document MPAI-MMC Call for Technologies (CfT) (N154) [4] as it provides the functional requirements of all the technologies that have been identified as required to implement the current MPAI-MMC Use Cases and Functional Requir­ements. Respon­dents to the MPAI-MMC CfT should make sure that their responses are aligned with the functional requirements expressed in this document.

In the future, MPAI may issue other Calls for Technologies falling in the scope of MPAI-MMC to support identified Use Cases.

It should also be noted that some technologies identified in this document are the same, similar, or related to technologies required to implement some of the Use Cases of the companion document MPAI-CAE Use Cases and Functional Requirements (N151) [3]. Readers of this document are advised that being familiar with the content of the said companion document is a prerequisite for a proper understanding of this document.

This document is structured in 7 chapters, including this Introduction.

Chapter 2 briefly introduces the AI Framework Reference Model and its six Components
Chapter 3 briefly introduces the 3 Use Cases.
Chapter 4 presents the 4 MPAI-MMC Use Cases with the following structure:

1.     Reference architecture

2.     AI Modules

3.     I/O data of AI Modules

4.     Technologies and Functional Requirements

Chapter 5 identifies the technologies likely to be common across MPAI-MMC and MPAI-CAE, a companion standard project whose Call for Technologies is issued simul­taneously with MPAI-MMC’s.
Chapter 6 gives suggested references.
Chapter 7 gives a basic list of relevant terms and their definition

For the reader’s convenience, the meaning of the acronyms of this document is given in Table 1.

Table 1 – Acronyms of used in this document

 

Acronym Meaning
AI Artificial Intelligence
AIF AI Framework
AIM AI Module
CfT Call for Technologies
CWE Conversation with emotion
DP Data Processing
KB Knowledge Base
ML Machine Learning
MQA Multimodal Question Answering
PST Personalized Automatic Speech Translation

2        The MPAI AI Framework (MPAI-AIF)

Most MPAI applications considered so far can be implemented as a set of AIMs – AI, ML and even traditional Data Processing (DP)-based units with standard interfaces assembled in suitable topol­ogies to achieve the specific goal of an application and executed in an MPAI-defined AI Frame­work. MPAI is making all efforts to identify processing modules that are re-usable and up­gradable without necessarily changing the inside logic. MPAI plans on completing the devel­op­ment of a 1st generation AI Framework called MPAI-AIF in July 2021.

The MPAI-AIF Architecture is given by Figure 1.

Figure 1 – The MPAI-AIF Architecture

MPAI-AIF is made up of 6 Components:

  1. Management and Control manages and controls the AIMs, so that they execute in the correct order and at the time when they are needed.
  2. Execution is the environment in which combinations of AIMs operate. It receives external inputs and produces the requested outputs, both of which are Use Case specific, activates the AIMs, exposes interfaces with Management and Control and interfaces with Communic­ation, Storage and Access.
  3. AI Modules (AIM) are the basic processing elements receiving processing specific inputs and producing processing specific outputs.
  4. Communication is the basic infrastructure used to connect possibly remote Components and AIMs. It can be implemented, e.g., by means of a service bus.
  5. Storage encompasses traditional storage and is used to e.g., store the inputs and outputs of the individual AIMs, intermediary results data from the AIM states and data shared by AIMs.
  6. Access represents the access to static or slowly changing data that are required by the application such as domain knowledge data, data models, etc.

3        Use Cases

3.1       Conversation with emotion (CWE)

When people talk, they use multiple modalities. Emotion is one of the key features to understand the meaning of the utterances made by the speaker. Therefore, a conversation system with the capability to recognize emotion can better understand the user and produce a better reply.

This MPAI-MMC Use Case handles conversation with emotion. It is a human-machine conver­sation system where the computer can recognize emotion in the user’s speech and/or text, also using the video information of the face of the human to produce a reply.

Emotion is recognised in the following way and reflected in the speech production side. First, a set of emotion related cues are extracted from text, voice and video. Then, each recognition module for text, voice and video, recognises emotion independently. The emotion recognition module determines the final emotion based on each emotion. The emotion will be transferred to dialog processing module. Then the dialog processing module produces the reply based on the final emotion and meaning from the text and video analysis. Finally, the speech synthesis module produces the speech from the reply in text.

3.1.1      Multimodal Question Answering (MQA)

Question Answering Systems (QA) answer a user’s question presented in natural language. Cur­rent QA system only deals with the case where input is in “text” form or “speech” form. However, more attention is paid these days to the case where mixed inputs such as speech with an image are presented to the system. For example, a user asks a question: “Where can I buy this tool?” showing the picture of the tool. In that case, the QA system should process the question in a text along with the image and should find out the answer to the question.

Question and image are recognised and analysed in the following way and answers are produced in the output speech: The meaning of the question is recognised in the form of text or voice. Image is analysed to find the object name which is sent to the language understanding module. Then, the integrated meaning from the multimodal inputs is generated from the language understanding module. The Intention analysis module determines the intention of the question and the intention is sent to the QA module. The QA module produces the answer based on the intention of the question, and the meaning from the Language understanding module. The speech synthesis module produces the speech from the answer in text.

3.1.2      Personalised Automatic Speech Translation (PST)

Automatic speech translation technology denotes technology that recognizes a voice uttered in a language by a speaker, converts the recognized voice into another language through automatic translation, and outputs a converted voice as text-type subtitles or as a synthesized voice preserving the speaker’s features in the translated speech. Recently, as interest in voice synthesis among main technologies for automatic interpretation increases, research concentrates on personalized voice synthesis, a technology that outputs a target language through voice recognition and automatic translation, as a synthesis voice similar to a tone (or an utterance style) of a speaker.

The automatic interpretation system for generating a synthetic sound having characteristics similar to those of an original speaker’s voice includes a speech recognition to generate text data for an original speech signal of an original speaker and extract characteristic information such as pitch information, vocal intensity information, speech speed information, and vocal tract characteristic information of the original speech. Then the text data produced by the speech recognition module go through the automatic translation module to generate a synthesis-target translation and a speech synthesis module to generate a synthetic sound that resembles the original speaker using the extracted characteristic information.

4        Functional Requirements

4.1       Introduction

The Functional Requirements developed in this document refer to the individual technologies identified as necessary to implement Use Cases belonging to MPAI-MMC application areas using AIMs operating in an MPAI-AIF AI Framework. The Functional Requirements developed adhere to the following guidelines:

  1. AIMs are defined to allow implementations by multiple technologies (AI, ML, DP)
  2. DP-based AIMs need interfaces such as to a Know­ledge Base. AI-based AIMs will typically require a learning process, however, support for this process is not included in the document. MPAI may develop further requirements covering that process in a future document.
  3. AIMs can be aggregated in larger AIMs. Some data flows of aggregated AIMs may not neces­sarily be exposed any longer.
  4. AIMs may be influenced by the companion MPAI-CAE Use Cases and Functional Requ­ir­ements [3] as some technologies needed by some MPAI-MMC AIMs share a significant num­ber of functional requirements.
  5. Current AIMs do not feed information back to AIMs upstream. Respondents to the MPAI-MMC Call for Technologies [5] are welcome to motivate such feed-back data flows and prop­ose assoc­iated requirements.

The Functional Requirements described in the following sections are the result of a dedicated effort by MPAI experts over many meetings where different AIM partitionings have been proposed, discussed and revised. MPAI is aware that alternative partitioning or alternative I/O data to/from AIMs are possible, and those reading this document for the purpose of submitting a response to the MPAI-MMC Call for Technologies (N154) [5] are welcome to propose in their submissions alternative partitioning or alternative I/O data. However, they are required to justify the proposed new partitioning and to determine the functional requirements of the relevant technol­ogies. The evaluation team will study the proposed alternative arrangement and may decide to accept all or part of the proposed new arrangement.

4.2       Conversation with Emotion

4.2.1      Implementation architecture

Possible architectures of this Use Case are given by Figure 2 and Figure 3. The two figures differ in the use of legacy DP technology vs AI technology:

  1. In Figure 2 some AIMs need a Knowledge Base to perform their tasks.
  2. In Figure 3 Knowledge Bases may not be required as the relevant information is embedded in neural network that are part of an AIM.

Intermediate arrangements with only some Knowledge Bases are also possible, but not represented in a figure.

Figure 2 – Conversation with emotion (using legacy DP technologies)

Figure 3 – Conversation with emotion (fully AI-based)

4.2.2      AI Modules

The AI Modules of Conversation with Emotion are given in Table 2

Table 2 – AI Modules of Conversation with Emotion

AIM Function
Language understanding Analyses natural language in a text format to produce its meaning and emotion included in the text
Speech Recognition Analyses the voice input and generates text output and emotion carried by it
Video analysis Analyses the video and recognises the emotion it carries
Emotion recognition Determines the final emotion from multi-source emotions
Dialog processing Analyses user’s Meaning and produces Reply based on the meaning and emotion implied by the user’s text
Speech synthesis Produces speech from Reply (the input text)
Face animation Produces an animated face consistent with the machine-generated Reply
Emotion KB (text) Contains words/phrases with associate emotion. Language understanding queries Emotion KB (text) to obtain the emotion associated with a text
Emotion KB (speech) Contains features extracted from speech recordings of different speakers reading/reciting the same corpus of texts with an agreed set of emotions and without emotion, for a set of languages and for different genders.

Speech recognition queries Emotion KB (speech) to obtain emotions corresponding to the features provided as input.

Emotion KB (video) Contains features extracted from video recordings of different people speaking with an agreed set of emotions and without emotion for different genders.

Video analysis queries Emotion KB (video) to obtain emotions corres­ponding to the features provided as input.

Dialog KB Contains sentences with associated dialogue acts. Dialog processing queries Dialog KB to obtain dialogue acts with associated sentences.

4.2.3      I/O interfaces of AI Modules

The I/O data of AIMs used in Conversation with Emotion are given in Table 3.

Table 3 – I/O data of Conversation with Emotion AIMs

AIM Input Data Output Data
Video analysis Video Emotion

Meaning

Time stamp

Speech recognition Input Speech

 

Response from Emotion KB (Speech)

Text

Emotion

Query to Emotion KB (Speech)

Language understanding Input Text

Recognised Text

Response from Emotion KB (Text)

Text

Emotion

Meaning

Query to Emotion KB (Text)

Emotion recognition Emotion (from text)

Emotion (from speech)

Emotion (from image)

Final Emotion
Dialog processing Text

Meaning

Final emotion

Meaning

Response from Dialogue KB

Reply

Text

Animation

 

Query to Dialogue KB

Speech synthesis Reply Speech
Face animation Animation parameters Video
Emotion KB (text) Query Response
Emotion KB (speech) Query Response
Emotion KB (video) Query Response
Dialog KB Query Response

4.2.4      Technologies and Functional Requirements

4.2.4.1     Text

Text should be encoded according to ISO/IEC 10646, Information technology – Universal Coded Character Set (UCS) to support most languages in use [6].

To Respondents

Respondents are invited to comment on this choice.

4.2.4.2     Digital Speech

Speech is sampled at a frequency between 8 kHz and 96 kHz and digitally represented between 16 bits/sample and 24 bits/sample (both linear).

To Respondents

Respondents are invited to comment on these two choices.

4.2.4.3     Digital Video

Digital video has the following features.

  1. Pixel shape: square
  2. Bit depth: 8-10 bits/pixel
  3. Aspect ratio: 4/3 and 16/9
  4. 640 < # of horizontal pixels < 1920
  5. 480 < # of vertical pixels < 1080
  6. Frame frequency 50-120 Hz
  7. Scanning: progressive
  8. Colorimetry: ITU-R BT709 and BT2020
  9. Colour format: RGB and YUV
  10. Compression: uncompressed, if compressed AVC, HEVC

 To Respondents

Respondents are invited to comment on these choices.

4.2.4.4     Emotion

By Emotion we mean an attribute that indicates an emotion out of a finite set of Emotions.

Emotion is extracted and digitally represented as Emotion from text, speech and video.

The most basic emotions are described by the set: “anger, disgust, fear, happiness, sadness, and surprise” [6], or “joy versus sadness, anger versus fear, trust versus disgust, and surprise versus anticipation” [8]. One of these sets can be taken as “universal” in the sense that they are common across all cultures. An Emotion may have different Grades [9,10].

 To Respondents

Respondents are invited to propose:

  1. A minimal set of Emotions whose semantics are shared across cultures.
  2. A set of Grades that can be associated to Emotions.
  3. A digital representation of Emotions and their Grades [11].

This CfT does not specifically address culture-specific Emotions. However, the proposed digital representation of Emotions and their grades should either be capable to accommodate or be extensible to support culture-specific Emotions.

4.2.4.5     Emotion KB (speech) query format

Emotion KB (speech) contains features extracted from speech recordings of different speakers reading/reciting the same corpus of texts with an agreed set of emotions and without emotion, for a set of languages and for different genders.

The Emotion KB (speech) is queried with a list of speech features. The Emotion KB responds with the emotions of the speech.

Speech features are extracted from the input speech and are used to determine the Emotion of the input speech.

Examples of features that have information about emotion are:

  1. Features to detect the arousal level of emotions: sequences of short-time prosody acoustic features (features estimated on a frame basis), e.g., short-term speech energy [15].
  2. Features related to the pitch signal (i.e., the glottal waveform) that depends on the tension of the vocal folds and the subglottal air pressure. Two parameters related to the pitch signal can be considered: pitch frequency and glottal air velocity. E.g., high velocity indicates a speech emotion like happiness. Low velocity is in harsher styles such as anger [17].
  3. The shape of the vocal tract is modified by the emotional states. The formants (characterized by a center frequency and a bandwidth) could be a representation of the vocal tract resonances. Features related to the number of harmonics due to the non-linear airflow in the vocal tract. E.g., in the emotional state of anger, the fast air flow causes additional excitation signals other than the pitch. Teager Energy Operator-based (TEO) features could be an example of measure of the harmonics and cross-harmonics in the spectrum [18].

An example solution of the features could be the Mel-frequency cepstrum (MFC) [19].

To Respondents

Respondents are requested to propose an Emotion KB (speech) query format that satisfies the following requirements:

  1. Capable of querying by specific speech features
  2. Speech features should be:
    1. Suitable for extraction of Emotion information from natural speech containing emotion:
    2. Extensible, i.e., capable to include additional speech features.

When assessing proposed Speech features, MPAI may resort to objective testing.

Note: An AI-based implementation may not need Emotion KB (Speech).

4.2.4.6     Emotion KB (text) query format

Emotion KB (text) contains text features extracted from a text corpus with an agreed set of Emotions, for a set of languages and for different genders.

The Emotion KB (text) is queried with a list of Text features. Text features considered are:

  1. grammatical features, e.g., parts of speech.
  2. named entities, places, people, organisations.
  3. semantic features, e.g., roles, such as agent [21].

The Emotion KB (text) responds by giving Emotions correlated with the text features provided as input.

To Respondents

Respondents are requested to propose an Emotion KB (text) query format that satisfies the fol­lowing requirements:

  1. Capable of querying by specific Text features.
  2. Text features should be:
    1. Suitable for extraction of Emotion information from natural language text containing Emotion.
    2. Extensible, i.e., capable to include additional text features.

When assessing the proposed Text features, MPAI may resort to objective testing.

Note: An AI-based implementation may not need Emotion KB (Text).

4.2.4.7     Emotion KB (video) query format

Emotion KB (video) contains features extracted from the video recordings of different speakers reading/reciting the same corpus of texts with and without an agreed set of emotions and meanings, for different genders.

Emotion KB (video) is queried with a list of Video features. Emotion KB responds with the associated Emotion, its Grade, and Meaning.

To Respondents

Respondents are requested to propose an Emotion KB (video) query format that satisfies the following requirements:

  1. Capable of querying by specific Video features.
  2. Video features should be:
    1. Suitable for extraction of emotion information from a video con­taining the face of a human expressing emotion.
    2. Extensible, i.e., capable of including additional Video features.

When assessing proposed Video features, MPAI may resort to objective testing.

Note: An AI-based implementation may not need Emotion KB (video).

4.2.4.8     Meaning

Meaning is information extracted from input text, speech and video such as question, statement, exclam­ation, expression of doubt, request, invitation [18].

To Respondents

Respondents are requested to propose an extensible list of meanings and their digital representations satisfying the following requir­ements:

  1. The meaning extracted from the input text shall have a structure that includes grammatical information and semantic information.
  2. The digital representation of meaning shall allow for the addition of new features to be used in different applications.

4.2.4.9     Dialog KB query format

Dialog KB contains sentence features with associated dialogue acts. Dialog processing AIM queries Dialog KB to obtain dialogue acts with associated sentence features.

The Dialog KB is queried with sentence features. The sentence features considered are:

  1. Sentences analysed by the language understanding AIM.
  2. Sentence structures.
  3. Sentences with semantic features for the words composing sentences, e.g., roles, such as agent [21].

The Dialog KB responds by giving dialog acts correlated with the sentence provided as input.

To Respondents

Respondents are requested to propose a Dialog KB query format that satisfies the fol­lowing requirements:

  1. Capable of querying by specific sentence features.
  2. Sentence features should be:
    1. Suitable for extraction of sentence structures and meaning.
    2. Extensible, i.e., capable to include additional sentence features.

When assessing the proposed Sentence features, MPAI may resort to objective testing.

Note: An AI-based implementation may not need Dialog KB.

4.2.4.10  Input to speech synthesis (Reply)

Respondents should propose suitable technology for driving the speech synthesiser. Note that “Text with emotion” and “Concept with emotion” are both candidates for consideration.

To Respondents

Text with emotion

A standard format for text with Emotions attached to different portions of the text. An example of how emotion in the text could be added to text is offered by emoticons.

Text should be encoded according to ISO/IEC 10646, Information technology – Universal Coded Character Set (UCS) to support most languages in use.

Respondents are requested to comment on the choice of the character set and to propose a solution for emotion added to a text satisfying the following requirements:

  1. It should include a scheme for annotating text with emotion either as text with emotion expressed with text or with additional characters.
  2. It should include an extensible emotion annotation representation scheme for basic emotions.
  3. The emotion annotation representation scheme should be language independent.

Concept with emotion

Respondents are requested to propose digital representation of concept that enables to go straight from meaning and emotion to “concept to speech synthesiser”, as, e.g., in [28].

4.2.4.11  Input to face animation

A face can be animated using the same parameters used to synthesise speech.

To respondents

Respondents are requested to provide the same types of data format as for speech of to propose and justify a different data format.

4.3       Multimodal Question Answering

4.3.1      Implementation Architecture

Possible architectures of this Use Case are given by Figure 4 and Figure 5. In the former case some AIMs need a Knowledge Base to perform their tasks. In the latter case Knowledge Bases may not be required as the relevant information is embedded in neural networks that are part of an AIM. Intermediate arrangements where only some Knowledge Bases are used, are also possible but not represented by a figure.

Figure 4 Multimodal Question Answering (using legacy DP technologies)

Figure 5 Multimodal Question Answering (fully AI-based)

4.3.2      AI Modules

The AI Modules of Multimodal Question Answering are given in Table 4.

Table 4 – AI Modules of Multimodal Question Answering

AIM Function
Language understanding Analyses natural language expressed as text using a language model to produce the meaning of the text
Speech Recognition Analyse the voice input and generate text output
Speech synthesis Converts input text to speech
Image analysis Analyses image and produces the object name in focus
Question analysis Analyses the meaning of the sentence and determines the Intention
Question Answering Analyses user’s question and produces a reply based on user’s Inten­tion
Intention KB Responds to queries using a question ontology to provide the features of the question
Image KB Responds to Image analysis’s queries providing the object name in the image
Online dictionary Allows Question Answering AIM to find answers to the question

4.3.3      I/O interfaces of AI Modules

The AI Modules of Multimodal Question Answering are given in Table 5.

Table 5 – I/O data of Multimodal Question Answering AIMs

 

AIM Input Data Output Data
Speech Recognition Digital Speech Text
Image analysis Image

Image KB response

Image KB query

Text

Language understanding Text

Text

Meaning

Meaning

Question analysis Meaning

Intention KB response

Intention

Intention KB query

QA Meaning

Text

Intention

Online dictionary query

Text

 

 

Online dictionary response

Speech synthesis Text Digital speech
Intention KB Query Response
Image KB Query Response
Online dictionary Query Response
Dialog KB Query Response

4.3.4      Technologies and Functional Requirements

4.3.4.1     Text

Text should be encoded according to ISO/IEC 10646, Information technology – Universal Coded Character Set (UCS) to support most languages in use [6].

To Respondents

Respondents are invited to comment on this choice.

4.3.4.2     Digital Speech

Multimodal QA (MQA) requires that speech be sampled at a frequency between 8 kHz and 96 kHz and digitally represented between 16 bits/sample and 24 bit/sample (linear).

To Respondents

Respondents are invited to comment on these two choices.

4.3.4.3     Digital Image

A Digital image is an uncompressed or a JPEG compressed picture [19].

To Respondents

Respondents are invited to comment on this choice.

4.3.4.4     Image KB query format

Image KB contains feature vectors extracted from different images of those objects intended to be used in this Use Case [29].

The Image KB is queried with a vector of image features extracted from the input image repres­enting an object [21]. The Image KB responds by giving the identifier of the object.

To Respondents

Respondents are requested to propose an Image KB query format that satisfies the following requirements:

  1. Capable of querying by specific Image features.
  2. Image features should be:
    1. Suitable for querying the Image KB.
    2. Extensible to include additional image features and additional object types.

When assessing proposed Image features, MPAI may resort to objective testing.

An AI-Based implementation may not need Image KB.

4.3.4.5     Object identifier

The object must be uniquely identified.

To Respondents

Respondents are requested to propose a universally applicable object classification scheme.

4.3.4.6     Meaning

Meaning is information extracted from the input text such as question, statement, exclamation, expression of doubt, request, invitation [18].

To Respondents

Respondents are requested to propose an extensible list of meanings and their digital repres­en­tations satisfying the following requirements:

  1. The meaning extracted from the input text shall have a structure that includes grammatical information and semantic information.
  2. The digital representation of meaning shall allow for the addition of new features to be used in different applications.

4.3.4.7     Intention KB query format

Intention KB contains question patterns extracted from the user questions that denote those intention types. It is the result of the question analysis.

For instance, what, where, from where, for whom, by whom, how… [22].

The Intention KB is queried by giving text as input. Intention KB responds with the type of question intention.

To Respondents

Respondents are requested to propose an Intention KB query format satisfying the following requirements:

  1. Capable of querying by questions with meaning provided by the Language Understanding AIM.
  2. Extensible, i.e., capable to include additional intention features.

Respondents are requested to propose an extensible classification of Intentions and their digital representations satisfying the following requirements:

  1. The intention of the question shall be represented as including question types, question focus and question topics.
  2. The digital representation of intention shall be extensible, i.e., allow for the addition of new features to be used in different applications.

An AI-Based implementation may not need Intention KB.

4.3.4.8     Online dictionary query format

Online dictionary contains structured data that include topics and related information in the form of summaries, table of contents and natural language text [23].

The Online dictionary is queried by giving text as input. The Online dictionary responds with paragraphs where to find answers that have high correlation with the user question.

To Respondents

Respondents are requested to propose an Online dictionary KB query format satisfying the following requirements:

  1. Capable of querying by text as keywords.
  2. Extensible, i.e., capable to include additional text features.

4.4       Personalized Automatic Speech Translation

4.4.1      Implementation Architecture

The AI Modules of a personalized automatic speech translator are configured as in Figure 6.

Figure 6 Personalized Automatic Speech Translation

4.4.2      AI Modules

The AI Modules of Personalized Automatic Speech Translation are given in Table 6.

Table 6 – AI Modules of Personalized Automatic Speech Translation

AIM Function
Speech Recognition Converts Speech into Text
Translation Translates the user text input in source language to the target language
Speech feature extraction Extracts Speech features such as tones, intonation, intensity, pitch, emotion, intensity or speed from the input voice specific of the speaker.
Speech synthesis Produces Speech from the text resulting from translation with the speech features extracted from the speaker of the source language

4.4.3      I/O interfaces of AI Modules

The AI Modules of Personalized Automatic Speech Translation are given in Table 7.

Table 7 – I/O data of Personalized Automatic Speech Translation AIMs

AIM Input Data Output Data
Speech Recognition Digital Speech Text
Translation Text

Speech

Translation result
Speech feature extraction Digital speech Speech features
Speech synthesis Translation result

Speech features

Digital speech

4.4.4      Technologies and Functional Requirements

4.4.4.1     Text

Text should be encoded according to ISO/IEC 10646, Information technology – Universal Coded Character Set (UCS) to support most languages in use [6].

To Respondents

Respondents are invited to comment on this choice.

4.4.4.2     Digital Speech

Speech should be sampled at a frequency between 8 kHz and 96 kHz and digitally represented between 16 bits/sample and 24 bit/sample (both linear).

To Respondents

Respondents are invited to comment on these two choices.

4.4.4.3     Speech features

Speech features such as tones, intonation, intensity, pitch, emotion or speed are used to encode speech features of the speaker.

The following features should be included in the speech features to describe the speaker’s voice: pitch, prosodic structures per intonation phrase, vocal intensity, speed of the utterance per word/sentence/intonation phrase, vocal tract characteristics of the speaker of the source language, and additional speech features associated with hidden variables. The vocal tract characteristics can be expressed as characteristic parameters of Mel-frequency cepstral coefficient (MFCC) and glottal wave.

To Respondents

Respondents are requested to propose a set of speech features that shall be suitable for

  1. Extracting voice characteristic information from natural speech containing personal features.
  2. Producing synthesized speech reflecting the original user’s voice characteristics.

When assessing proposed Speech features, MPAI may resort to subjective/objective testing.

4.4.4.4     Language identification

ISO 639 – Codes for the Representation of Names of Languages — Part 1: Alpha-2 Code.

To Respondents

Respondents are requested to comment on this choice.

4.4.4.5     Translation results

Respondents should propose suitable technology for driving the speech synthesiser. “Text to speech” and “concept to speech” are both considered.

To Respondents

Text to speech

Text should be encoded according to ISO/IEC 10646, Information technology – Universal Coded Character Set (UCS) to support most languages in use.

Respondents are requested to comment on the choice of character set.

Concept to speech

Respondents are requested to propose digital representation of concept that enables to go straight from translation result to “concept to speech synthesiser”, as, e.g., in [28].

5        Potential common technologies

Table 8 introduces the MPAI-CAE and MPAI-MMC acronyms.

Table 8 – Acronyms of MPAI-CAE and MPAI-MMC Use Cases

Acronym App. Area Use Case
EES MPAI-CAE Emotion-Enhanced Speech
ARP MPAI-CAE Audio Recording Preservation
EAE MPAI-CAE Enhanced Audioconference Experience
AOG MPAI-CAE Audio-on-the-go
CWE MPAI-MMC Conversation with emotion
MQA MPAI-MMC Multimodal Question Answering
PST MPAI-MMC Personalized Automatic Speech Translation

Table 9 gives all MPAI-CAE and MPAI-MMC technologies in alphabetical order.

Please note the following acronyms

KB Knowledge Base
QF Query Format

Table 9 – Alphabetically ordered MPAI-CAE and MPAI-MMC technologies

Notes UC=Use case
UCFR=Use  Cases and Functional Requirements document number
Section=Section of the above document
Technology=name of technology

  

UC UCFR Section Technology
EAE N151 4.4.4.4 Delivery
AOG N151 4.5.4.7 Delivery
CWE N153 4.2.4.9 Dialog KB query format
ARP N151 4.3.4.1 Digital Audio
AOG N151 4.5.4.1 Digital Audio
ARP N151 4.3.4.3 Digital Image
MQA N153 4.3.4.3 Digital Image
EES N151 4.2.4.1 Digital Speech
EAE N151 4.4.4.1 Digital Speech
CWE N153 4.2.4.2 Digital Speech
MQA N153 4.3.4.2 Digital Speech
PST N153 4.4.4.2 Digital Speech
ARP N151 4.3.4.2 Digital Video
CWE N153 4.2.4.3 Digital Video
EES N151 4.2.4.2 Emotion
CWE N153 4.2.4.4 Emotion
EES N151 4.2.4.4 Emotion descriptors
CWE N153 4.2.4.5 Emotion KB (speech) query format
CWE N153 4.2.4.6 Emotion KB (text) query format
CWE N153 4.2.4.7 Emotion KB (video) query format
EES N151 4.2.4.3 Emotion KB query format
MQA N153 4.3.4.4 Image KB query format
CWE N153 4.2.4.11 Input to face animation
CWE N153 4.2.4.10 Input to speech synthesis
MQA N153 4.3.4.7 Intention KB query format
PST N153 4.4.4.4 Language identification
CWE N153 4.2.4.8 Meaning
MQA N153 4.3.4.6 Meaning
EAE N151 4.4.4.2 Microphone geometry information
AOG N151 4.5.4.2 Microphone geometry information
MQA N153 4.3.4.5 Object identifier
MQA N153 4.3.4.8 Online dictionary query format
EAE N151 4.4.4.3 Output device acoustic model metadata KB query format
ARP N151 4.3.4.6 Packager
AOG N151 4.5.4.3 Sound array
AOG N151 4.5.4.4 Sound categorisation KB query format
AOG N151 4.5.4.5 Sounds categorisation
PST N153 4.4.4.3 Speech features
ARP N151 4.3.4.4 Tape irregularity KB query format
ARP N151 4.3.4.5 Text
CWE N153 4.2.4.1 Text
MQA N153 4.3.4.1 Text
PST N153 4.4.4.1 Text
PST 4.4.4.5 Translation results
AOG N151 4.5.4.6 User Hearing Profiles KB query format

The following technologies are shared or shareable across Use Cases:

  1. Delivery
  2. Digital speech
  3. Digital audio
  4. Digital image
  5. Digital video
  6. Emotion
  7. Meaning
  8. Microphone geometry information
  9. Text

Image features apply to different visual objects in MPAI-CAE and MPAI-MMC.

The Speech features in Use Cases of both standards are different. However, respondents may consider the possibility of proposing a unified set of Speech features, e.g., as proposed in [30].

6        Terminology

Table 10 – MPAI-MMC terms

Term Definition
Access Static or slowly changing data that are required by an application such as domain knowledge data, data models, etc.
AI Framework (AIF) The environment where AIM-based workflows are executed
AI Module (AIM) The basic processing elements receiving processing specific inputs and producing processing specific outputs
Communication The infrastructure that connects the Components of an AIF
Dialog processing An AIM that produces a reply based on the input speech/text
Digital Speech Digitised speech as specified by MPAI
Emotion An attribute that indicates an emotion out of a finite set of Emotions
Emotion Grade The intensity of an Emotion
Emotion Recognition An AIM that decides the final Emotion out of Emotions from different sources
Emotion KB (text) A dataset of Text features with corresponding emotion
Emotion KB (speech) A dataset of Speech features with corresponding emotion
Emotion KB (Video) A dataset of Video features with corresponding emotion
Emotion KB query format The format used to interrogate a KB to find relevant emotion
Execution The environment in which AIM workflows are executed. It receives external inputs and produces the requested outputs both of which are application specific
Image analysis An AIM that extracts Image features
Image KB A dataset of Image features with corresponding emotion
Intention Intention is the result of a question analysis that denotes information on the input question
Intention KB A question classification providing the features of a question
Language Understanding An AIM that analyses natural language as Text to produce its meaning and emotion included in the text
Management and Control Manages and controls the AIMs in the AIF, so that they execute in the correct order and at the time when they are needed
Meaning Information extracted from the input text such as syntactic and semantic information
Online Dictionary A dataset that includes topics and related information in the form of summaries, table of contents and natural language text
Question Analysis An AIM that analyses the meaning of a question sentence and determines its Intention
Question Answering An AIM that analyses the user’s question and produces a reply based on the user’s Inten­tion
Speech features Features used to extract Emotion from Digital Speech
Speech feature extraction An AIM that extracts Speech features from Digital speech
Speech Recognition An AIM that converts Digital speech to Text
Speech Synthesis An AIM that converts Text or concept to Digital speech
Storage Storage used to, e.g., store the inputs and outputs of the individual AIMs, data from the AIM’s state and intermediary results, shared data among AIMs
Text A collection of characters drawn from a finite alphabet
Translation An AIM that converts Text in a source language to Text in a target language

7        References

  1. MPAI-AIF Use Cases and Functional Requirements, N74; https://mpai.community/standards/mpai-aif/#Requirements
  2. MPAI-AIF Call for Technologies, N100; https://mpai.community/standards/mpai-aif/#Technologies
  3. MPAI-CAE Use Cases and Functional Requirements, N151; https://mpai.community/standards/mpai-cae/#Requirements
  4. MPAI-CAE Call for Technologies, N152; https://mpai.community/standards/mpai-cae/#Technologies
  5. MPAI-MMC Call for Technologies, N154; https://mpai.community/standards/mpai-mmc/#Technologies
  6. ISO/IEC 10646:2003 Information Technology — Universal Multiple-Octet Coded Character Set (UCS)
  7. Ekman, P. (1999). Basic Emotions. In T. Dalgleish and T. Power (Eds.) The Handbook of Cognition and Emotion pp. 45–60. Sussex, U.K.: John Wiley & Sons, Ltd.
  8. Plutchik R., Emotion: a psychoevolutionary synthesis, New York Harper and Row, 1980
  9. Russell, James (1980). “A circumplex model of affect”. Journal of Personality and Social Psychology. 39 (6): 1161–1178. doi:10.1037/h0077714
  10. Cahn, J. E., The Generation of Affect in Synthesized Speech, Journal of the American Voice I/O Society, 8, July 1990, p. 1-19
  11. https://www.w3.org/TR/2014/REC-emotionml-20140522/
  12. Burkhardt, F., & Sendlmeier, W. F., Verification of Acoustical Correlates of Emotional Speech using Formant-Synthesis, ISCA Workshop on Speech & Emotion, Northern Ireland 2000, p. 151-156.
  13. Scherer, K. R., Ladd, D. R., & Silverman, K., Vocal cues to speaker affect: Testing two models, Journal of the Acoustic Society of America, 76(5), 1984, p. 1346-1356
  14. Kasuya, H., Maekawa, K., & Kiritani, S., Joint Estimation of Voice Source and Vocal Tract Parameters as Applied to the Study of Voice Source Dynamics, ICPhS 99, p. 2505-2512
  15. Mozziconacci, S. J. L., Speech Variability and Emotion: Production and Perception, PhD Thesis, Technical University Eindhoven, 1998
  16. Burkhardt, F., & Sendlmeier, W. F., Verification of Acoustical Correlates of Emotional Speech using Formant-Synthesis, ISCA Workshop on Speech & Emotion, Northern Ireland 2000, p. 151-156.
  17. Cahn, J. E., The Generation of Affect in Synthesized Speech, Journal of the American Voice I/O Society, 8, July 1990, p. 1-19
  18. Hamed Beyramienanlou, Nasser Lotfivand, “An Efficient Teager Energy Operator-Based Automated QRS Complex Detection”, Journal of Healthcare Engineering, vol. 2018, Article ID 8360475, 11 pages, 2018. https://doi.org/10.1155/2018/8360475]
  19. Davis S B. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 1980, 28(4):65-74
  20. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), Reykjavik, Iceland, pp. 3501–3504, May 2014. 2- Moataz El Ayadi, Mohamed S. Kamel, Fakhri Karray. Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition Journal, Elsevier, 44 (2011) 572–587
  21. Mohamed Zakaria Kurdi (2017). Natural Language Processing and Computational Linguistics: semantics, discourse, and applications, Volume 2. ISTE-Wiley.
  22. Semaan, P. (2012). Natural Language Generation: An Overview. Journal of Computer Science & Research (JCSCR)-ISSN, 50-57
  23. Hudson, Graham; Léger, Alain; Niss, Birger; Sebestyén, István; Vaaben, Jørgen (31 August 2018). “JPEG-1 standard 25 years: past, present, and future reasons for a success”. Journal of Electronic Imaging. 27 (4)
  24. Hobbs, Jerry R.; Walker, Donald E.; Amsler, Robert A. (1982). “Natural language access to structured text”. Proceedings of the 9th conference on Computational linguistics. 1. pp. 127–32.
  25. MMP Petrou, C Petrou, Image processing: the fundamentals – 2010, Wiley
  26. Suman Kalyan Maity, Aman Kharb, Animesh Mukherjee, Language Use Matters: Analysis of the Linguistic Structure of Question Texts Can Characterize Answerability in Quora, ICWSM 2017
  27. Xanh HoAnh-Khoa Duong NguyenSaku SugawaraAkiko Aizawa, Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps, COLING 2020
  28. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.433.7322&rep=rep1&type=pdf
  29. Mohamed Elgendy, Deep Learning for Vision Systems, Manning Publication, 2020
  30. Problem Agnostic Speech Encoder; https://github.com/santi-pdp/pase

Use Cases and Functional RequirementsFramework LicenceCall for TechnologiesTemplate for responses to the Call for TechnologiesApplication Note

Framework Licence

This document is also available in MS format MPAI-MMC Framework Licence

1        Coverage

The MPAI Multimodal Conversation (MPAI-MMC) standard as will be defined in document Nxyz of Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI).

MPAI-MMC specifies the input and output interface of AIMs defined for 3 use cases in N153 that satisfy the requirements in N153.

2        Definitions

Term Definition
Data Any digital representation of a real or computer-generated entity, such as moving pictures, audio, point cloud, computer graphics, sensor and actu­ator. Data includes, but is not restricted to, media, manufacturing, auto­mot­ive, health and generic data.
Development Rights License to use MPAI-MMC Essential IPRs to develop Implementations
Enterprise Any commercial entity that develops or implements the MPAI-MMC standard
Essential IPR Any Proprietary Rights, (such as patents) without which it is not possible on technical (but not commercial) grounds, to make, sell, lease, otherwise dispose of, repair, use or operate Implementations without infringing those Proprietary Rights
Framework License A document, developed in compliance with the gener­ally accepted principles of competition law, which contains the conditions of use of the License without the values, e.g., currency, percent, dates etc.
Implementation A hardware and/or software reification of the MPAI-MMC standard serving the needs of a professional or consumer user directly or through a service
Implementation Rights License to reify the MPAI-MMC standard
License This Framework License to which values, e.g., currency, percent, dates etc., related to a specific Intellectual Property will be added. In this Framework License, the word License will be used as singular. However, multiple Licenses from different IPR holders may be issued
Profile A particular subset of the technologies that are used in MPAI-MMC standard and, where applicable, the classes, subsets, options and parameters relevant to the subset

3        Conditions of use of the License

  1. The License will be in compliance with generally accepted principles of competition law and the MPAI Statutes
  2. The License will cover all of Licensor’s claims to Essential IPR practiced by a Licensee of the MPAI-MMC standard.
  3. The License will cover Development Rights and Implementation Rights
  4. The License for Development and Implementation Rights, to the extent it is developed and implemented only for the purpose of evaluation or demo solutions or for technical trials, will be free of charge
  5. The License will apply to a baseline MPAI-MMC profile and to other profiles containing additional technologies
  6. Access to Essential IPRs of the MPAI-MMC standard will be granted in a non-discriminatory fashion.
  7. The scope of the License will be subject to legal, bias, ethical and moral limitations
  8. Royalties will apply to Implementations that are based on the MPAI-MMC standard
  9. Royalties will apply on a worldwide basis
  10. Royalties will apply to any Implementation, with the exclusion of the type of implementations specified in clause 4
  11. An MPAI-MMC Implementation may use other IPR to extend the MPAI-MMC Implementation or to provide additional functionalities
  12. The License may be granted free of charge for particular uses if so decided by the licensors
  13. A license free of charge for limited time and a limited amount of forfeited royalties will be granted on request
  14. A preference will be expressed on the entity that should administer the patent pool of holders of Patents Essential to the MPAI-MMC standard
  15. The total cost of the Licenses issued by IPR holders will be in line with the total cost of the Licenses for similar technologies standardised in the context of Standard Development Organisations
  16. The total cost of the Licenses will take into account the value on the market of the AI Framework technology Standardised by MPAI.

Use Cases and Functional RequirementsFramework LicenceCall for TechnologiesTemplate for responses to the Call for TechnologiesApplication Note

Call for Technologies

This document is also available in MS Word format MPAI-MMC Call for Technologies

1       Introduction.

2       How to submit a response.

3       Evaluation Criteria and Procedure.

4       Expected development timeline.

5       References.

Annex A: Information Form..

Annex B: Evaluation Sheet

Annex C: Requirements check list

Annex D: Technologies that may require specific testing.

Annex E: Mandatory text in responses.

1        Introduction

Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) is an international non-profit organisation with the mission to develop standards for Artificial Intelligence (AI) enabled digital data coding and for technologies that facilitate integration of data coding components into ICT systems. With the mechanism of Framework Licences, MPAI seeks to attach clear IPR licensing frameworks to its standards.

MPAI has found that the application area called “Context-based Audio Enhancement” is particul­arly relevant for MPAI standardisation because using context information to act on the input audio content can substantially improve the user experience of a variety of audio-related applications that include entertainment, communication, teleconferencing, gaming, post-produc­tion, restor­ation etc. for a variety of contexts such as in the home, in the car, on-the-go, in the studio etc.

Therefore, MPAI intends to develop a standard – to be called MPAI-MMC – that will provide standard tech­nologies to implement four Use Cases identified so far:

  1. Conversation with emotion
  2. Multimodal Question Answering
  3. Personalized Automatic Speech Translation

This document is a Call for Technologies (CfT) that

  1. Satisfy the MPAI-MMC Functional Requirements of N153 [6] and
  2. Are released according to the MPAI-MMC Framework Licence (N173) [9], if selected by MPAI for in­clusion in the MPAI-MMC standard.

The standard will be developed with the following guidelines:

  1. To satisfy the MPAI-MMC Functional Requirements (N153) [6]. In the future, MPAI may decide to extend MPAI-MMC to support other Use Cases as a part of this MPAI-CAE standard or as a future extension of it.
  2. To use, where feasible and desirable, the same basic tech­nol­ogies required by the companion document MPAI-CAE Use Cases and Functional Requir­ements [3].
  3. To be suitable for implementation as AI Modules (AIM) conforming to the emerging MPAI AI Framework (MPAI-AIF) standard based on the responses to the MPAI-AIF Call for Technologies (N100) [1] MPAI-AIF Functional Requirements (N74) [1].

MPAI has decided to base its application standards on the AIM and AIF notions whose functional requirements have been identified in [1] rather than follow the approach of defining end-to-end systems. It has done so because:

  1. AIMs allow the reduction of large problems to sets of smaller problems.
  2. AIMs can be independently developed and made available to an open competitive market.
  3. An application developer can build a sophisticated and complex system MPAI system with potentially limited know­ledge of all the tech­nologies required by the system.
  4. An MPAI system has a high-level of inherent explainability.
  5. MPAI systems allow for competitive comparisons of functionally equivalent AIMs.

Respondents should be aware that:

  1. Use Cases that make up MPAI-MMC, the Use Cases themselves and the AIM internals will be non-normative.
  2. The input and output interfaces of the AIMs, whose requirements have been derived to support the Use Cases, will be normative.

Therefore, the scope of this Call for Technologies is restricted to technologies required to implement the input and output interfaces of the AIMs identified in N153 [6].

However, MPAI invites comments on any technology or architectural component identified in N153, specifically,

  1. Additions or removals of input/output signals to the identified AIMs with justification of the changes and identification of data formats required by the new input/output signals.
  2. Possible alternative partitioning of the AIMs implementing the example cases providing:
    1. Arguments in support of the proposed partitioning
    2. Detailed specifications of the input and output data of the proposed new AIMs
  3. New Use Cases fully described as in N153.

All parties who believe they have relevant technologies satisfying all or most of the requirements of one or more than one Use Case described in N153 are invited to submit proposals for consid­eration by MPAI. MPAI membership is not a prerequisite for responding to this CfT. However, proponents should be aware that, if their proposal or part thereof is accepted for inclusion in the MPAI-MMC standard, they shall immediately join MPAI, or their accepted technologies will be discarded.

MPAI will select the most suitable technologies based on their technical merits for inclusion in MPAI-MMC. However, MPAI in not obligated, by virtue of this CfT, to select a particular tech­nology or to select any technology if those submitted are found inadequate.

Submissions are due on 2021/04/12T23:59 UTC and should be sent to the MPAI secretariat (secretariat@mpai.community). The secretariat will acknowledge receipt of the submission via email. Submissions will be reviewed according to the schedule that the 7th MPAI General Assembly (MPAI-7) will define at its online meeting on 2021/04/14. For details on how submitters who are not MPAI members can attend the said review please contact the MPAI secretariat (secretariat@mpai.community).

2        How to submit a response

Those planning to respond to this CfT:

  1. Are advised that online events will be held on 2021/02/24 and 2021/03/10 to present the MPAI-MMC CfT and respond to questions. Logistic information on these events will be posted on the MPAI web site.
  2. Are requested to communicate their intention to respond to this CfT with an initial version of the form of Annex A to the MPAI secretariat (secretariat@mpai.community) by 2021/03/16. A potential submitter making a communication using the said form is not required to actually make a submission. A submission will be accepted even if the submitter did not communicate their intention to submit a response by the said date.
  3. Are advised to visit regularly the https://mpai.community/how-to-join/calls-for-technologies/ web site where relevant information will be posted.

Responses to this MPAI-MMC CfT shall/may include:

Table 1 – Mandatory and optional elements of a response

Item Status
Detailed documentation describing the proposed technologies mandatory
The final version of Annex A mandatory
The text of Annex B duly filled out with the table indicating which requirements identified in MPAI N151 [3] are satisfied. If all the requirements of a Use Case are not satisfied, this should be explained. mandatory
Comments on the completeness and appropriateness of the MPAI-MMC functional requirem­ents and any motivated suggestion to amend and/or extend those requir­ements. optional
A preliminary demonstration, with a detailed document describing it. optional
Any other additional relevant information that may help evaluate the submission, such as additional use cases. optional
The text of Annex E. mandatory

Respondents are invited to take advantage of the check list of Annex C before submitting their response and filling out Annex A.

Respondents are mandatorily requested to present their submission at a teleconference meeting to be properly announced to submitters by the MPAI Secretariat. If no presenter od a submission will attend the meeting, the submission will be discarded.

Respondents are advised that, upon acceptance by MPAI of their submission in whole or in part for further evaluation, MPAI will require that:

  • A working implementation, including source code – for use in the development of the MPAI-MMC Reference Software and later publication as an MPAI standard– be made available before the technology is accepted for inclusion in the MPAI-MMC standard. Software may be written in programming languages that can be compiled or interpreted and in hardware description languages.
  • The working implementation be suitable for operation in the MPAI AIF Framework (MPAI-AIF).
  • A non-MPAI member immediately join MPAI. If the non-MPAI member elects not to do so, their submission will be discarded. Direction on how to join MPAI can be found online.

Further information on MPAI can be obtained from the MPAI website.

3        Evaluation Criteria and Procedure

Proposals will be assessed using the following process:

  1. Evaluation panel is created from:
    1. All MMC-DC members attending.
    2. Non-MPAI members who are respondents.
    3. Non respondents/non MPAI member experts invited in a consulting capacity.
  2. No one from 1.1.-1.2. will be denied membership in the Evaluation panel.
  3. Respondents present their proposals.
  4. Evaluation Panel members ask questions.
  5. If required subjective and/or objective tests are carried out:
    1. Define required tests.
    2. Carry out the tests.
    3. Produce report.
  6. At least 2 reviewers will be appointed to review & report about specific points in a proposal if required.
  7. Evaluation panel members fill out Annex B for each proposal.
  8. Respondents respond to evaluations.
  9. Proposal evaluation report is produced.

4        Expected development timeline

Timeline of the CfT, deadlines and response evaluation:

Table 2 – Dates and deadlines

Step Date
Call for Technologies 2021/02/17
CfT introduction conference call 1 2021/02/24T14:00 UTC
CfT introduction conference call 2 2021/03/10T15:00 UTC
Notification of intention to submit proposal 2021/03/16T23.59 UTC
Submission deadline 2021/04/12T23.59 UTC
Evaluation of responses will start 2021/04/14 (MPAI-7)

Evaluation to be carried out during 2-hour sessions according to the calendar agreed at MPAI-7.

5        References

  1. MPAI-AIF Use Cases and Functional Requirements, N74; https://mpai.community/standards/mpai-aif/
  2. MPAI-AIF Call for Technologies, MPAI N100; https://mpai.community/standards/mpai-aif/#Technologies
  3. MPAI-AIF Framework Licence, MPAI N171; https://mpai.community/standards/mpai-aif/#Licence
  4. MPAI-CAE Use Cases & Functional Requirements; MPAI N151; https://mpai.community/standards/mpai-cae/#UCFR
  5. MPAI-CAE Call for Technologies, MPAI N152; https://mpai.community/standards/mpai-cae/#Technologies
  6. MPAI-CAE Framework Licence, MPAI N171; https://mpai.community/standards/mpai-cae/#Licence
  7. Draft MPAI-MMC Use Cases & Functional Requirements; MPAI N153; https://mpai.community/standards/mpai-mmc/#UCFR
  8. Draft MPAI-MMC Call for Technologies, MPAI N154; https://mpai.community/standards/mpai-mmc/#Technologies
  9. MPAI-MMC Framework Licence, MPAI N173; https://mpai.community/standards/mpai-mmc/#Licence

Annex A: Information Form

This information form is to be filled in by a Respondent to the MPAI-MMC CfT

  1. Title of the proposal
  2. Organisation: company name, position, e-mail of contact person
  3. What are the main functionalities of your proposal?
  4. Does your proposal provide or describe a formal specification and APIs?
  5. Will you provide a demonstration to show how your proposal meets the evaluation criteria?

Annex B: Evaluation Sheet

NB: This evaluation sheet will be filled out by members of the Evaluation Team.

Proposal title:

Main Functionalities:

Response summary: (a few lines)

Comments on Relevance to the CfT (Requirements):

Comments on possible MPAI-MMC profiles[1]

Evaluation table:

Table 3Assessment of submission features

Note 1 The semantics of Submission features is provided by Table 4
Note 2 Evaluation elements indicate the elements used by the evaluator in assessing the submission
Note 3 Final Assessment indicates the ultimate assessment based on the Evaluation Elements

 

Submission features Evaluation elements Final Assessment
Completeness of description

Understandability

Extensibility

Use of Standard Technology

Efficiency

Test cases

Maturity of reference implementation

Relative complexity

Support of MPAI use cases

Support of non-MPAI use cases

Content of the criteria table cells:

Evaluation facts should mention:

  • Not supported / partially supported / fully supported.
  • What supported these facts: submission/presentation/demo.
  • The summary of the facts themselves, e.g., very good in one way, but weak in another.

Final assessment should mention:

  • Possibilities to improve or add to the proposal, e.g., any missing or weak features.
  • How sure the evaluators are, i.e., evidence shown, very likely, very hard to tell, etc.
  • Global evaluation (Not Applicable/ –/ – / + / ++)

 New Use Cases/Requirements Identified:

(please describe)

Evaluation summary:

  • Main strong points, qualitatively:
  •  Main weak points, qualitatively:
  • Overall evaluation: (0/1/2/3/4/5)

0: could not be evaluated

1: proposal is not relevant

2: proposal is relevant, but requires significant more work

3: proposal is relevant, but with a few changes

4: proposal has some very good points, so it is a good candidate for standard

5: proposal is superior in its category, very strongly recommended for inclusion in standard

Additional remarks: (points of importance not covered above.)

The submission features in Table 3 are explained in the following Table 4.

Table 4 – Explanation of submission features

Submission features Criteria
Completeness of description Evaluators should

1.     Compare the list of requirements (Annex C of the CfT) with the submission.

2.     Check if respondents have described in sufficient detail to what part of the requirements their proposal refers to.

NB1: Completeness of a proposal for a Use Case is a merit because reviewers can assess that the components are integrated.

NB2: Submissions will be judged for the merit of what is proposed. A submission on a single technology that is excellent may be considered instead of a submission that is complete but has a less performing technology.

Understandability Evaluators should identify items that are demonstrably unclear (incon­sistencies, sentences with dubious meaning etc.)
Extensibility Evaluators should check if respondent has proposed extensions to the Use Cases.

NB: Extensibility is the capability of the proposed solution to support use cases that are not supported by current requirements.

Use of standard Technology Evaluators should check if new technologies are proposed where widely adopted technologies exist. If this is the case, the merit of the new tech­nology shall be proved.
Efficiency Evaluators should assess power consumption, computational speed, computational complexity.
Test cases Evaluators should report whether a proposal contains suggestions for testing the technologies proposed
Maturity of reference implementation Evaluators should assess the maturity of the proposal.

Note 1: Maturity is measured by the completeness, i.e., having all the necessary information and appropriate parts of the HW/SW implementation of the submission disclosed.

Note 2: If there are parts of the implementation that are not disclosed but demonstrated, they will be considered if and only if such components are replicable.

Relative complexity Evaluators should identify issues that would make it difficult to implement the proposal compared to the state of the art.
Support of MPAI-MMC use cases Evaluators should check how many use cases are supported in the submission
Support of non MPAI-MMC use cases Evaluators should check whether the technologies proposed can demonstrably be used in other significantly different use cases.

Annex C: Requirements check list

Please note the following acronyms

KB Knowledge Base
QF Query Format

Table 5 – List of technologies identified in MPAI-MMC N153 [6]

Note: The numbers in the first column refer to the section numbers of N153 [6].

Technologies by Use Cases Response
Conversation with Emotion
4.2.4.1 Text Y/N
4.2.4.2 Digital Speech Y/N
4.2.4.3 Digital Video Y/N
4.2.4.4 Emotion Y/N
4.2.4.5 Emotion KB (speech) query format Y/N
4.2.4.6 Emotion KB (text) query format Y/N
4.2.4.7 Emotion KB (video) query format Y/N
4.2.4.8 Meaning Y/N
4.2.4.9 Dialog KB query format – by tomorrow Y/N
4.2.4.10 Input to speech synthesis (Reply) Y/N
4.2.4.11 Input to face animation Y/N
Multimodal Question Answering
4.3.4.1 Text Y/N
4.3.4.2 Digital Speech Y/N
4.3.4.3 Digital Image Y/N
4.3.4.4 Image KB query format Y/N
4.3.4.5 Object identifier Y/N
4.3.4.6 Meaning Y/N
4.3.4.7 Intention KB query format Y/N
4.3.4.8 Online dictionary query format Y/N
Personalized Automatic Speech Translation
4.4.4.1 Text Y/N
4.4.4.2 Digital Speech Y/N
4.4.4.3 Speech features Y/N
4.4.4.4 Language identification Y/N
4.4.4.5 Translation results Y/N

Respondent should consult the equivalent list in N152 [5]

Annex D: Technologies that may require specific testing

Conversation with Emotion Speech features
Conversation with Emotion Text features
Conversation with Emotion Video features
Multimodal Question Answering Image features
Personalised Automatic Speech Translation Speech features

 Additional technologies may be identified during the evaluation phase.

Annex E: Mandatory text in responses

A response to this MPAI-MMC CfT shall mandatorily include the following text

<Company/Member> submits this technical document in response to MPAI Call for Technologies for MPAI project MPAI-MMC (N153).

 <Company/Member> explicitly agrees to the steps of the MPAI standards development process defined in Annex 1 to the MPAI Statutes (N80), in particular <Company/Member> declares that  <Com­pany/Member> or its successors will make available the terms of the Licence related to its Essential Patents according to the Framework Licence of MPAI-MMC (N171), alone or jointly with other IPR holders after the approval of the MPAI-MMC Technical Specif­ication by the General Assembly and in no event after commercial implementations of the MPAI-MMC Technical Specification become available on the market.

In case the respondent is a non-MPAI member, the submission shall mandatorily include the following text

If (a part of) this submission is identified for inclusion in a specification, <Company>  understands that  <Company> will be requested to immediately join MPAI and that, if  <Company> elects not to join MPAI, this submission will be discarded.

Subsequent technical contribution shall mandatorily include this text

<Member> submits this document to MPAI-MMC Development Committee (MMC-DC) as a con­tribution to the development of the MPAI-MMC Technical Specification.

 <Member> explicitly agrees to the steps of the MPAI standards development process defined in Annex 1 to the MPAI Statutes (N80), in particular  <Company> declares that <Company> or its successors will make available the terms of the Licence related to its Essential Patents according to the MPAI-MMC Framework Licence (N173), alone or jointly with other IPR holders after the approval of the MPAI-MMC Technical Specification by the General Assembly and in no event after commercial implementations of the MPAI-MMC Technical Specification become available on the market.

[1] Profile of a standard is a particular subset of the technologies that are used in a standard and, where applicable, the classes, subsets, options and parameters relevan for the subset


Use Cases and Functional RequirementsFramework LicenceCall for TechnologiesTemplate for responses to the Call for TechnologiesApplication Note

Template for responses to the Call for Technologies

This document is also available in MS Word format Template for responses to the MPAI-MMC Call for Technologies

Abstract

This document is provided as a help to those who intend to submit responses to the MPAI-CAE Call for Technologies. Text in red(as in this sentence) provides guidance to submitters and should not be included in a submission. Text in green shall be mandatorily included in a submission. If a submission does not include the green text, the submission will be rejected.

If the submission is in multiple files, each file shall include the green statement.

Text in white is the text suggested to respondents for use in a submission.

1        Introduction

This document is submitted by <organisation name> (if an MPAI Member) and/or by <organ­is­ation name>, a <company, university etc.> registered in … (if a non-MPAI member) in response to the MPAI-MMC Call for Technol­ogies issued by Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) on 2021/02/17 as MPAI document N154.

In the opinion of the submitter, this document proposes technologies that satisfy the requirements of MPAI-MMC Use Cases & Functional Requirements issued by MPAI on 2020/02/17 as MPAI document N153.

Possible additions

This document also contains comments on the requirements as requested by N153.

This document also contains proposed technologies that satisfy additional requirements as allowed by N153.

<Company and/or Member> explicitly agrees to the steps of the MPAI standards development process defined in Annex 1 to the MPAI Statutes (N80), in particular <Company and or Member> declares that  <Company and or Member> or its successors will make available the terms of the Licence related to its Essential Patents according to the MPAI-MMC Framework Licence (N171), alone or jointly with other IPR holders after the approval of the MPAI-MMC Technical Specif­ication by the MPAI General Assembly and in no event after commercial implem­entations of the MPAI-MMC Technical Specification become available on the market.

< Company and/or Member> acknowledges the following points:

  1. MPAI in not obligated, by virtue of this CfT, to select a particular technology or to select any technology, if those submitted are found inadequate.
  2. MPAI may decide to use the same technology for functionalities also requested in the MPAI-CAE Call for Technologies (N172) and associated Functional Requirements (N171).
  3. A representative of <Company and/or Member> shall present this submission at a CAE-DC meeting communicated by MPAI Secretariat (mailto:secretariat@mpai.community). If no <Company and/or Member> will attend the meeting and present the submission, this submission will be discarded.
  4. <Company and/or Member> shall make available a working implementation, including source code – for use in the development of the MPAI-CAE Reference Software and eventual public­ation by MPAI as a normative standard – before the technology submitted is accepted for the MPAI-CAE standard
  5. The software submitted may be written in programming languages that can be compiled or interpreted and in hardware description languages, upon acceptance by MPAI for further eval­uation of their submission in whole or in part.
  6. <Company> shall immediately join MPAI upon acceptance by MPAI for further evaluation of this submission in whole or in part.
  7. If <Company> does not join MPAI, this submission shall be discarded.

2        Information about the submission

This information corresponds to Annex A on N154. It is included here for submitter’s convenience.

  1. Title of the proposal
  2. Organisation: company name, position, e-mail of contact person
  3. What are the main functionalities of your proposal?
  4. Does your proposal provide or describe a formal specification and APIs?
  5. Will you provide a demonstration to show how your proposal meets the evaluation criteria?

3        Comments on/extensions to requirements (if any)

 

4        Overview of Requirements supported by submission

Please answer Y or N. Detail on the specific answers can be provided in the submission.

Technologies by Use Cases Response
Conversation with Emotion
4.2.4.1 Text Y/N
4.2.4.2 Digital Speech Y/N
4.2.4.3 Digital Video Y/N
4.2.4.4 Emotion Y/N
4.2.4.5 Emotion KB (speech) query format Y/N
4.2.4.6 Emotion KB (text) query format Y/N
4.2.4.7 Emotion KB (video) query format Y/N
4.2.4.8 Meaning Y/N
4.2.4.9 Dialog KB query format – by tomorrow Y/N
4.2.4.10 Input to speech synthesis (Reply) Y/N
4.2.4.10 Input to face animation Y/N
Multimodal Question Answering
4.3.4.1 Text Y/N
4.3.4.2 Digital Speech Y/N
4.3.4.3 Digital Image Y/N
4.3.4.4 Image KB query format Y/N
4.3.4.5 Object identifier Y/N
4.3.4.6 Meaning Y/N
4.3.4.7 Intention KB query format Y/N
4.3.4.8 Online dictionary query format Y/N
Personalized Automatic Speech Translation
4.4.4.1 Text Y/N
4.4.4.2 Digital Speech Y/N
4.4.4.3 Speech features Y/N
4.4.4.4 Language identification Y/N
4.4.4.5 Translation results Y/N

5        New Proposed requirements (if any)

1. Y/N
2. Y/N
3. Y/N

6. Detailed description of submission

6.1       Proposal chapter #1

6.2       Proposal chapter #2

….

7        Conclusions

 


Use Cases and Functional RequirementsFramework LicenceCall for TechnologiesTemplate for responses to the Call for TechnologiesApplication Note

 

MPAI Application Note #6

Multi-Modal Conversation (MPAI-MMC)

Proponent: Miran Choi (ETRI)

Description: Owing to recent advances of AI technologies, natural language processing started to be widely used in various applications. One of the useful applications is the conversational partner which provides the user with information, entertains, chats and answers questions through the speech interface. However, an application should include more than just a speech interface to provide a better service to the user. For example, emotion recognizer and gesture interpreter are needed for better multi-modal interfaces.

Multi-modal conversation (MPAI-MMC) aims to enable human-machine conversation that emulates human-human conversation in completeness and intensity by using AI.

The interaction of AI processing modules implied by a multi-modal conversation system would look approximately as presented in Figure 1, where one can see a language understanding module, a speech recognition module, image analysis module, a dialog processing module, and a speech synthesis module.

Figure 1 – Multi-Modal Conversation (emotion-focused)

Comments: The processing modules of the MPAI-MMC instance of Figure 1 would be operated in the MPAI-AIF framework.

Examples

The example of MMC is the conversation between a human user and a computer/robot as in the following list. The input from the user can be voice, text or image or combination of different inputs. Considering emotion of the human user, MMC will output responses in a text, speech, music depending on the user’s needs.

  • Chats: “I am bored. What should I do now?” – “You look tired. Why don’t you take a walk?”
  • Question Answering: “Who is the famous artist in Barcelona?” – “Do you mean Gaudi?”
  • Information Request: “What’s the weather today?” – “It is a little cloudy and cold.”
  • Action Request: “Play some classical music, please” – “OK. Do you like Brahms?”

Processing modules involved in MMC:

A preliminary list of processing modules is given below:

  1. Fusion of multi-modal input information
  2. Natural language understanding
  3. Natural language generation
  4. Speech recognition
  5. Speech synthesis
  6. Emotion recognition
  7. Intention understanding
  8. Image analysis
  9. Knowledge fusion from different sources such as speech, facial expression, gestures, etc
  10. Dialog processing
  11. Question Answering
  12. Machine Reading Comprehension (MRC)
  13. Speech Synthesis

Requirements:

This is the initial functional requirements that will be developed in the full set in the Functional Requirements (FR) phase..

  1. The standard shall specify the following natural input signals
  • Sound signals from microphone
  • Text from keyboard or keypad
  • Images from the camera
  1. The standard shall specify a user profile format (e.g. gender, age, specific needs, etc.)
  2. The standard shall support emotion-based dialog processing that uses emotion result from the emotion recognition as input and decides the replies based on the user’s intention as output.
  3. The standard should provide means to carry emotion and user preferences in the speech synthesis processing module.
  4. Processing modules should be agnostic to AI, ML or DP technology: it should be general enough to avoid limitations in terms of algorithmic structure, storage and communication and allow full interoperability with other processing modules.
  5. The standard should provide support for the storage of, and access to:
  • Unprocessed data in speech, text or image form
  • Processed data in the form of annotations (semantic labelling). Such annotations can be produced as the result of primary analysis of the unprocessed data or come from external sources such as knowledge base.
  • meta-data (such as collection date and place; classification data)
  • Support for the structured data produced from the raw data.
  1. The standard should also provide support for:
  • The combination into a general analysis workflow of a number of computational blocks that access processed, and possibly unprocessed, data such as input channels, and produce output as a sequence of vectors in a space of arbitrary dimension.
  • The possibility of defining and implementing a novel processing block from scratch in terms of either some source code or a proprietary binary codec
  • A number of pre-defined blocks that implement well-known analysis methods (such as NN-based methods).
  • The parallel and sequential combination of processing modules that comprise different services.
  • The real time processing for the conversation between the user and the robot/computer.

 Object of standard: Interfaces of processing components utilized in multimodal communication.

  • Input interfaces: how to deal with inputs in different formats
  • Processing component interfaces: interfaces between a set of updatable and extensible processing modules
  • Delivery protocol interfaces: Interfaces of the processed data signal to a variety of delivery protocols
  • Framework: the glue keeping the pieces together => mapping to MPAI-AIF

Benefits:

  1. Decisively improve communication between humans and machines and the user experience
  2. Reuse of processing components for different applications
  3. Create a horizontal market of multimodal conversational components
  4. Make market more competitive

 Bottlenecks:

Some processing units should be improved because end-to-end processing has lower performances compared to modular approaches. Therefore, the standard should be able to cover the traditional method as well as hybrid approaches.

 Social aspects:

Enhanced user interfaces will provide accessibility for people with disabilities. MMC can also be used in care giving services for elderly and patients.

Success criteria:

  • How MMC can be extended to different services by combining several processing modules easily and easily.
  • The performance of multi-modality compared to uni-modality in the user interface.
  • Interconnection and integration among different processing modules