Established on 30 September 2020, MPAI spent its first 3 months giving itself a structure to execute its mission of developing Artificial Intelligence (AI)-based data coding standards.

Its first full year of operation – 2021 – has been engaging but rewarding:

  • 5 Technical Specifications (TS)  have been approved and released in the following domains:
    • Finance.
    • Human-machine communication.
    • Audio enhancement.
    • AI Framework
    • Ecosystem Governance.
  • The Company Performance Assessment TS was complemented by 3 additional specifications:
    • Reference Software (RS). a conforming implementation of the TS,
    • Conformance Testing (CT), to test that an implementation is technically correct and provides an adequate user experience
    • Performance Assessment (PA), to assess implementation reliability and trustworthiness.

A goal can be declared as reached only if the next goal is known, and the purpose of this post is to disclose exactly that.

The AI Framework (AIF), depicted in Figure 1, is a cornerstone of the MPAI architecture.

Figure 1 – The AI Framework (AIF) Reference Model and its Components

  • The AIF
    • Is Operating System-independent.
    • Has a local and distributed component-based Zero-Trust architecture.
    • Can create AI Workflows (AIW) made of elementary units called AI Modules (AIM).
    • Can access validated AIWs and AIMs by interfacing to the MPAI Store.
    • Can execute in a range of computing environments: from MCUs to HPCs.
    • Can interact with other AIFs operating in proximity.
    • Supports Machine Learning functionalities.
  • Its AIMs
    • Encapsulate components to abstract them from the development environment.
    • Call the Controller via standard interfaces.
    • Can be AI-based or data processing-based.
    • Can be in software or in hardware.

2022 MPAI Goal #1: AI Framework (MPAI-AIF)

  1. Development of the Reference Software (RS).
  2. Development of the Conformance Testing.

MPAI has already developed 3 application oriented Technical Specifications: MPAI-CAE (Enhanced audio), MPAI-CUI (Company Performance Prediction) and MPAI-MMC (Multimodal human-machine conversation). It total there are 10 AIWs and some 20 AIMs (several of them are used in different AIWs).

An active MPAI generates an ecosystem with the following actors:

  1. MPAI develop standards.
  2. Implem­enters develop MPAI standard implementations
  3. Users access such im­plemen­tations.

MPAI is all about facilitating a market of AI applications. Releasing standards enables a market but does not ensure that the market is functional. How can a user be sure that an implementation is secure, technically correct, unbiased? Note that by “user” we do not necessarily mean an end user, but also an app developer (i.e., AIW) who may need an AIM and does not have the resources or the competence to answer the 3 questions.

In its Governance of the MPAI Ecosystem TS, MPAI has envisaged two more players:

  1. Performance Assessors who assess that implementations are reliable and trustworthy.
  2. The MPAI Store where uploaded implementations are:
    1. Checked for security
    2. Tested for conformance
    3. Posted to the Store with a clear indication of level of performance.

Note that MPAI appoints Performance Assessors, and establishes and controls the MPAI Store, a not-for-profit commercial entity.

Figure 2 depicts the operation of the MPAI Ecosystem.

Figure 2 – The MPAI Ecosystem and its Governance

2022 MPAI Goal #2: Governance of the MPAI Ecosystem (MPAI-GME)

  1. Design the MPAI Store corporate structure
  2. Design and operate the MPAI Store
  3. Develop and run the MPAI Store IT service
  4. Design and operate the Performance Assessor network.

In 2020 MPAI has developed 3 application oriented TSs:

Compression and Understanding of Industrial Data (MPAI-CUI) with 1 use case.

Multimodal Conversation (MPAI-MMC) with 5 use cases.

Context-based Audio Enhancement (MPAI-CAE) with 4 use cases.


Figure 3 depicts the reference model of the Company Performance Prediction Use Case.

AI-based Company Performance Prediction measures the performance of a Company by providing Default Probability, Organisational Model Index, and Business Discontinuity Probability of the Company within a given Prediction Horizon using the Company’s Governance, Financial and Risk data
Figure 3 – The Company Performance Prediction CUI-CPP) Reference Model

MPAI-CUI includes the Reference Software (RS), Conformance Testing (CT) and Performance Assessment (PA) Specifications of the AI-based Company Performance Prediction (CPP).

2022 MPAI Goal #3: Compression and Understanding of Industrial Data (MPAI-CUI)

  1. Integration of the RS in MPAI-AIF
  2. Submission of RS to MPAI Store
  3. Development of Version 2 (extension of functionality of existing AIMs and new AIWs to support more risks).

Multi-modal conversation (MPAI-MMC) uses AI to enable human-machine conversation emul­ating human-human conversation in completeness and intensity. It includes 5 Use Cases: Conversation with EmotionMultimodal Question AnsweringUnidirectional Speech TranslationBidirectional Speech Translation and One-to-Many Unidirectional Speech Translation.

The figures below show the reference models of the MPAI-MMC Use Cases.

Conversation with Emotion (CWE) enables a human to holds an audio-visual conver­sation using audio and video with a computational system that is impersonated by a synthetic voice and an animated face, both expressing emotion appropriate to the emotional state of the human.
Figure 4 – Conversation with Emotion
Multimodal Question Answering (MQA) enables a user to request information using speech concerning an object the user displays and to receive the requested information from a computational system via synthetic speech.
Figure 5 – Multimodal Question Answering
Unidirectional Speech Translation (UST) allows a user to select a language different from the one s/he uses and to get a spoken utterance translated into the desired language with a synthetic voice that optionally preserves the personal vocal traits of the spoken utterance.
Figure 6 – Unidirectional Speech Translation
Bidirectional Speech Translation (BST) allows a human to hold a dialogue with another human. Both speech their own language and their translated speech is a synthetic speech that optionally preserves their personal vocal traits.
Figure 7 – Bidirectional Speech Translation
One-to-Many Speech Translation (MST) enables a human to select a number of languages and have their speech translates to the selected languages using a synthetic speech that optionally preserves their personal vocal traits.
Figure 8 – One-to-Many Speech Translation

Currently, only the MPAI-MMC TS is available. Thereforethe

2022  MPAI Goal #4 for Multimodal Conversation (MPAI-MMC)

  1. Development of the RS of the 5 Use Cases, integration in AIF and submission to the Store
  2. Development of the CT specification of the 5 Use Cases
  3. Development of the PA specification of the 5 Use Cases
  4. Development of Version 2 that includes extension of functionality of existing AIMs and new AIWs, some coming from projects under development such as MPAI-CAV (Connected Autonomous Vehicles) and MPAI-MCS (Mixed-reality Collaborative Spaces).

The 4 use cases considered are: Emotion Enhanced SpeechAudio Recording PreservationSpeech Restoration System and Enhanced Audioconference.

The figures below shows the reference models of the MPAI-CAE Use Cases. Note that an Implementation is supposed to run in the MPAI-specified AI Framework (MPAI-AIF).

Emotion-Enhanced Speech (EES) enables a user to indicate a model utterance or an Emotion to obtain an emotionally charged version of a given utterance.

In many use cases, emotional force can usefully be added to speech which by default would be neutral or emotionless,

Figure 9 – Emotion Enhanced Speech
Audio Recording Preservation (ARP) Use Case enables a user to create of digital copies  of a digitised audio of open-reel magnetic tapes suitable for long-term preservation and for correct play back of the digitised recording (restored, if necessary).
Figure 10 – Audio Recording Preservation
Speech Restoration System (SRS) enables a user to restore a Damaged Segment of an Audio Segment containing only speech from a single speaker. No filtering or signal processing is involved. Instead, replacements for the damaged vocal elements are synthesised using a speech model.
Figure 11 – Speech Restoration System
Enhanced Audioconference Experience (EAE) enables a user to improve the auditory quality of audioconference experience by processing speech signals recorded by microphone arrays and  provide speech signals free from back­ground noise and acoustics-related artefacts .
Figure 12 – Enhanced Audioconference Experience

Currently, only the MPAI-CAE TS is available. Therefore

MPAI Goal #5 in 2022 is further development of MPAI-CAE

  1. Development of RS of the 4 Use Cases, integration in AIF and submission to the Store
  2. Development of the CT specification of the 4 Use Cases
  3. Development of the PA specification of the 4 Use Cases
  4. Development of Version 2 that will include extension of functionality of existing AIMs and new AIWs, some coming from projects under development such as MPAI-CAV (Connected Autonomous Vehicles) and MPAI-MCS (Mixed-reality Collaborative Spaces).

MPAI has 7 projects at different levels of development. For each of these a Goal is assigned.

2022 MPAI Goal #6 in 2022 is development of MPAI-SPG

  1. TS, RS, CT, PA of Server-based Predictive Multiplayer Gaming
2022 MPAI Goal #7 for Connected Automotive Vehicles (MPAI-CAV)

  1. TS, RS, CT, PA of Connected Automotive Vehicles. This will include interactions with MPAI-MMC and MPAI-CAE
2022 MPAI Goal #8 for Mixed-reality Collaborative Spaces (MPAI-MCS)

  1. TS, RS, CT, PA of Mixed-reality Collaborative Spaces. This will include interactions with MPAI-MMC and MPAI-CAE
2022 MPAI Goal #9 for Integrative Genomic/Sensor Analysis (MPAI-GSA)

  1. TS, RS, CT, PA of Integrative Genomic/Sensor Analysis
2022 MPAI Goal #10 for AI-Enhanced Video Coding (MPAI-EVC)

  1. The AI-Enhanced Video Coding (MPAI-EVC) Evidence Project will continue toward reaching the goal of 25% improvement over MPEG-5 EEV
2022 MPAI Goal #11 for AI-based End-to-End Video Coding (MPAI-EEV)

  1. AI-based End-to-End Video Coding (MPAI-EVC) will continue harnessing the potential of an unconstrained approach ti AI-based Video Coding.
2022 MPAI Goal #12 for Visual Object and Scene Description (MPAI-OSD)

  1. Visual Object and Scene Description (MPAI-OSD) will continue collecting use cases where visual information coding is required.