Moving Picture, Audio and Data Coding
by Artificial Intelligence

All posts by Leonardo Chiariglione

MPAI talks to industry

Those who have been following this blog should already know quite a few things about MPAI. It is fair to say, however, one cannot really be sure say to know everything about any topic, especially one as vast and articulated as MPAI. That is why I recommend you to watch a series of videos where you can see some of the major MPAI players introducing some of the areas MPAI is engaged in. The daunting side is that there are some 3 hours of videos in total :-(, but the relaxing side is that the videos are organised as a playlist :-).

To save you a click, here is the list of videos with a guide to each entry.

# Title Speaker Country
1 Introduction to MPAI Leonardo CH
2 MPAI-AIF – AI Framework Andrea IT
3 MPAI-CAE – Context-based Audio Enhancement Marina US
4 MPAI-MMC – Multimodal Conversation Miran KR
5 MPAI-CUI – Compression and Understanding if Industrial Data Guido IT
6 Reference Software, Conformance and Performance Panos UK
7 MPAI-GME – Governance of the MPAI Ecosystem Paolo UK
8 MPAI-SPG – Server-based Predictive Multiplayer Gaming Marco IT
9 MPAI-EVC – AI-Enhanced Video Coding Roberto IT
10 MPAI-EEV – AI-based End-to-End Video Coding Chuanmin CN
11 MPAI-CAV – Connected Autonomous Vehicles Gianluca IT
12 Conclusions Leonardo CH

Introduction to MPAI made by Leonardo introduces the 4 pillars on which MPAI works: standards development process,, setting IPR Guidelines before developing a standard, component-based AI-based data coding standards and Governance of the AI Ecosystem. A very short introduction of the 5 standards and of some of the current projects follows.

Andrea introduces MPAI-AIF – AI Framework, the standard that sets the context of all MPAI application standards: AI Modules (AIM), AI Workflow (AIW) and, indeed, AI Framework (AIF) and how and AIF implementation can execute AI applications.

Marina introduces MPAI-CAE – Context-based Audio Enhancement, the standard collecting 4 use cases where AI enhances audio to offer an improved user experience.

Miran introduces MPAI-MMC – Multimodal Conversation, the standard collecting 5 use cases where conversation between human and machine is enhanced beyond the traditional speech recognition/synthesis by adding emotion and by representing an avtar whose speech and face are enhanced by emotion.

MPAI-CUI – Compression and Understanding if Industrial Data, presented by Guido, introduces the Company Performance Prediction Use Case where a machine containing a critically important neural network receives financial, organisational and risk data of a company and predicts the company’s probability of default and business continuity, and assigns a governance adequacy index to the company.

Panos presents the foundational notions of Reference Software, Conformance and Performance. Reference Software and Conformance are established notions with a specific variations made by MPAI. Performance is the entirely new notion in standards introduced because of the particular nature of artificial intelligence.

Paolo introduces MPAI-GME – Governance of the MPAI Ecosystem, a foundational standard strictly connected with the preceding presentation because it lays down the rules that govern submission of and access to MPAI standard implementations with attributes of Reliability, Robustness, Replicability and Fairness.

Marco presents MPAI-SPG – Server-based Predictive Multiplayer Gaming a project seeking to mitigate discontinuities caused by high latency or packet losses in online gaming and to detect fake data sent by game players to get an unfair advantage.

Roberto presents MPAI-EEV – AI-based End-to-End Video Coding, a video compression project for a standard that substantially enhances the performance of a traditional video codec by improving or replacing traditional tools with AI-based tools.

Chuanmin describes MPAI-EVC – AI-Enhanced Video Coding, a project seeking to compress video by exploiting AI-based data coding technologies (so called end-to-end coding) without constraints, as there are for MPAI-EVC, by how data processing technologies have been traditionally applied to video coding.

Gianluca walks you through MPAI-CAV – Connected Autonomous Vehicles, project aiming at standardising all IT components required to implement a Connected Autonomous Vehicle (CAV), i.e., a system capable of moving autonomously based on the data from a range of sensors exploring the environment, and the information transmitted by other sources in range, e.g., CAVs

Finally, presenting the Conclusions Leonardo summarises the gist of the presentation and invites you to join MPAI, share the fun, build the future.

 


MPAI outlines plans for the MPAI Store Foundation

Geneva, Switzerland – 23 February 2022. Today the Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) standards developing organisation has concluded its 17th General Assembly. Among the outcomes are: progress towards the establishment of a patent pool for its published standards and a roadmap to establish the MPAI Store Foundation.

The MPAI Statutes define a standard development process whereby holders of standard essential patents (SEP) select their preferred patent pool administrator. The General Assembly was informed that SEP holders in approved MPAI standards are currently engaged in this activity.

The Governance of the MPAI Ecosystem (MPAI-GME) standard envisions an “MPAI Store” tasked with receiving submissions of implementations, verifying their security and conformance, and making them available to other implementers and consumers. Because of the specific characteristics of AI technologies, the MPAI Store coordinates with MPAI-appointed performance assessors that guarantee that implementations are reliable, robust, replicable and fair. The MPAI Store will be a not-for-profit commercial entity where both MPAI members and associations representing the society at large are present.

The General Assembly approved a set of documents guiding the development of use cases and functional requirements for Neural Network Watermarking and the publication of a series of short videos with the title “MPAI talks to industry” illustrating the various aspects of MPAI activities.

MPAI develops data coding standards for applications that have AI as the core enabling technology. Any legal entity supporting the MPAI mission may join MPAI, if able to contribute to the development of standards for the efficient use of data.

MPAI is currently engaged in extending some of the already approved standards and developing other 9 standards (those in italic in the list below).

Name of standard Acronym Brief description
AI Framework MPAI-AIF Specifies an infrastructure enabling execution of implementations and access to the MPAI Store.
Context-based Audio Enhancement MPAI-CAE Improves the user experience of audio-related applications in a variety of contexts.
Multimodal Conversation MPAI-MMC Enables human-machine conversation emulating human-human conversation.
Compression and Understanding of Industrial Data MPAI-CUI Predicts the company performance from governance, financial and risk data.
Governance of the MPAI Ecosystem MPAI-GME Establishes the rules governing submission of and access to interoperable implementations.
Server-based Predictive Multiplayer Gaming MPAI-SPG Trains a network to com­pensate data losses and detects false data in online multiplayer gaming.
AI-Enhanced Video Coding MPAI-EVC Improves existing video coding with AI tools for short-to-medium term applications.
End-to-End Video Coding MPAI-EEV Explores the promising area of AI-based “end-to-end” video coding for longer-term applications.
Connected Autonomous Vehicles MPAI-CAV Specifies components for Environment Sensing, Autonomous Motion, and Motion Actuation.
Avatar Representation and Animation MPAI-ARA Specifies descriptors of avatar impersonating real humans.
Neural Network Watermarking MPAI-NNW Measures the impact of adding ownership and licensing information in models and inferences.
Integrative Genomic/Sensor Analysis MPAI-GSA Compresses high-throughput experiments data combining genomic/proteomic and other.
Mixed-reality Collaborative Spaces MPAI-MCS Supports collaboration of humans represented by avatars in virtual-reality spaces called Ambients
Visual Object and Scene Description MPAI-OSD Describes objects and their attributes in a visual scene and the semantic description of the objects.

Visit the MPAI web site, contact the MPAI secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedInTwitterFacebook , Instagram and YouTube.

Most important: join MPAI, share the fun, build the future.


Conclusions

This is a small book talking about a big adventure: standards for the most sensitive objects of all – data – using the most prominent technology of all – AI – for pervasive and trustworthy use by billions of people. At the end of this book, it is thus appropriate to assess what the authors think will be the likely impact of MPAI on industry and society.

The first impact will be the availability of standards for a technology that is best used to transform the data but has not seen any so far. Standards that are driven by the same principles that guided another great adventure – MPEG – that replaced standards meant to be exclusively used by certain countries or industries with standards serving humankind.

The second impact will be a right that used to be taken for granted by implementers but has ceased to be a right some time ago. An implementer wishing to use a published standard should be allowed to do so, of course after remunerating those who invested money and talent to produce the technology enabling the standard.

The third impact is a direct consequence of the preceding two. In the mid-18th century, trade did not develop as it could because feudal traditions allowed petty lords to erect barriers to trade for the sake of a few livres or pounds. Today we do not live in a feudal age, but we still see petty lords here and there obstructing progress for the sake of a few dollars.

The fourth impact is the mirror of the third. An industry freed from shackles, with access to global AI-based data coding standards and operating in an open competitive market will be able to churn out interoperable AI-based products, services, and applications in response to consumer needs which are known today and the many more which are not yet known.

The fifth impact is a direct consequence of the fourth. An industry using sophisticated technologies such as AI and forced to be maximally competitive will have a need to foster an accelerated progress of those technologies. We can confidently look forward to a new spring of research and advancement of science in a field to which today it is difficult to place boundaries.

The sixth impact will be caused by MPAI’s practical Performance Assessor based solution to the concerns of many: AI technologies are as potentially harmful to humankind as they are powerful. The ability of AI technologies to hold vast knowledge without simple means for users to check how representative of the world they are – when they are used to handle information and possibly make decisions – opens our minds to apocalyptic scenarios.

The seventh impact is speculative, but no less important. The idea of intelligent machines able to deal with humans has always attracted the intellectual interest of writers. Machines dealing with humans are no longer speculations but facts. As objects embedding AI – physical and virtual – increase their ramification into our lives, more issues than “Performance” will come to the surface and will have to be addressed. MPAI, with its holistic view of AI as the technology enabling a universal data representation, proposes itself as the body where such issues as enabled by progress of technology can be addressed and ways forward found.

Figure 36 – The expected MPAI impacts

The results achieved by MPAI in 15 months of activity and the plans laid down for the future demonstrate that the seven impacts identified above are not just wishful thinking. MPAI invites people of good will to join and make the potential real.


With 5 standards approved, MPAI enters a new phase

Geneva, Switzerland – 26 January 2022. Today the Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) standards developing organisation has concluded its 16th General Assembly, the first of 2022, approving its 2022 work program.

The work program includes the development of reference software, conformance testing and performance assessment for 2 application standards (Context-based Audio Enhancement and Multimodal Conversation), reference software, conformance assessment for 1 infrastructure standard (AI Framework), and the establishment of the MPAI Store, a non-profit foundation with the mission to distribute verified implementations of MPAI standards, as specified in another MPAI infrastructure standard (Governance of the MPAI Ecosystem).

An important part of the work program addresses the development of performance assessment specifications for the 2 application standards. The purpose of performance assessment is to enable MPAI-appointed entities to assess the grade of reliability, robustness, replicability and fairness of implementations. While performance will not be mandatory for an implementation to be posted to the MPAI Store, users downloading an implementation will be informed of its status.

Another section of the work program concerns the development of extensions of existing standards. Company Performance Prediction (part of Compression and Understanding of Industrial Data) will include more risks in addition to seismic and cyber; Multimodal Conversation will enhance the features of some of its use cases, e.g., by applying them to the interaction of a human with a connected autonomous vehicle; and Context-based Audio Enhancement will enter the domain of separation of useful sounds from the environment.

An important part of the work program is assigned to developing new standards for the areas that have been explored in the last few months, such as:

  1. Server-based Predictive Multiplayer Gaming (MPAI-SPG) using AI to train a network that com­pensates data losses and detects false data in online multiplayer gaming.
  2. AI-Enhanced Video Coding (MPAI-EVC) improving existing video coding with AI tools for short-to-medium term applications.
  3. End-to-End Video Coding (MPAI-EEV) exploring on the promising area of AI-based “end-to-end” video coding for longer-term applications.
  4. Connected Autonomous Vehicles (MPAI-CAV) using AI for such features as Environment Sensing, Autonomous Motion, and Motion Actuation.

Finally, MPAI welcomes new activities proposed by its members to its work program:

  1. Avatar Representation and Animation (MPAI-ARA) targeting the specification of avatar descriptors.
  2. Neural Network Watermarking (MPAI-NNW) developing measures of the impact of adding ownership and licensing information inside a neural network.

MPAI develops data coding standards for applications that have AI as the core enabling technology. Any legal entity supporting the MPAI mission may join MPAI, if able to contribute to the development of standards for the efficient use of data.

Visit the MPAI web site, contact the MPAI secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook , Instagram and YouTube.

Most important: join MPAI, share the fun, build the future.


Digital humans and MPAI

“Digital human”  has recently become a trendy expression and different meanings can be attached to it. MPAI says that it is “a digital object able to receive text/audio/video/commands (“Information”) and generate Information that is congruent with the received Information”.

MPAI has been developing several standards for “digital humans” and plans on extending them and developing more.

Let’s have an overview.

In Conversation with Emotion a digital human perceives text or speech from and video of a human. It then generates text or speech that is congruent with content and emotion of the perceived media, and displays itself as an avatar whose lips move in sync with its speech and according to the emotion embedded in the synthetic speech.

In Multimodal Question Answering a digital human perceives text or speech from a human asking a question about an object held by the human, and the video of the human holding the object. In response it generates text or speech that is a response to the human question and is congruent with the perceived media data including the emotional state of the human.

Adding an avatar whose lips move in sync with the generated speech could have a more satisfactory rendering of the speech generated by the digital human.

In Automatic Speech Translation a digital human is told to translate speech or text generated by the human into a specified language and to preserve or not the speech features of the input speech, in case the input is speech. The digital human then generates translated text, if the input is text and translated speech preserving or not the input speech features, if the input is speech.

Adding an avatar whose lips move in sync with the generated speech and according to its embedded emotion could have a more satisfactory rendering of the digital human speech.

In Emotion Enhanced Speech a digital human is told to add an emotion to an emotion-less speech by giving

  1. A model utterance: the digital human extracts and adds the speech features of the model utterance to the emotion-less speech segment.
  2. An emotion taken from the MPAI standard list of emotions: the digital human adds the speech features obtained by combining the speech features proper of the selected emotion to the speech features of the emotion-less speech to the emotion-less speech.

In both cases an avatar can be animated by the emotion-enhanced speech.

MPAI has more digital human use cases under development:

  1. Human-CAV Interaction: A digital human (a face) speaks to a group of humans gazing at the human it is responding to.

  1. Mixed-reality Collaborative Spaces: A digital human (a torso) utters the speech of a participant in a virtual videoconference while its torso moves in sync with the participant’s torso.

  1. Conversation about a scene: A digital human (a face) converses with a human about the objects of a scene the human is part of gazing at the human or at an object.


What does MPAI do in a week?

The year 2021 was very productive for MPAI. In January it started with a Call for Technologies on AI Framework and ended in December with 5standards approved and 7 projects in the pipeline.

How was that possible? Simple: intense collaborative work.

OK, but exactly how?

So far MPAI has not held physical meetings. MPAI does all current work online in 1, 2 or 3 one-hour sessions a day in the 13-18 UTC time frame. Purpose of this post is to describe how an MPAI work week unfolds. All times are UTC. All meetings last 1 hour.

Monday @15Mixed-reality Collaborative Spaces (MCS) is a project finalising the MCS Use Cases and Functional Requirements document. The use cases considered are “Avatar-Based Videoconference” and “Virtual eLearning” where avatars with levels of similarity with the person they represent hold meetings or assist to lectures. Extension of avatars to volumetric video is being considered. To know more, visit mcs.mpai.community.

Monday @16Governance of the MPAI Ecosystem (GME) is the name of an MPAI standard developed, approved and published by MPAI. It envisages the establishment of the MPAI Store, a non-profit commercial organisation whose role is to receive implementations of MPAI standards, verify they are secure, test that they are conforming with MPAI specifications, possibly collect the results proving that the implementation is reliable, robust, replicable and fair – attributes that MPAI labels as Performance – and post the implementation on the MPAI Store web site for users to download. MPAI is moving to the implementation of the MPAI Store. To know more, visit gme.mpai.community.

Monday @17AI Framework (AIF) is the name of an MPAI standard developed, approved and published by MPAI. MPAI is now developing a software implementation of the standard. To know more, visit aif.mpai.community.

Tuesday @14Multimodal Conversation (MMC) is the name of an MPAI standard developed, approved, and published by MPAI. MPAI is now developing software implementations of the four MPAI-MMC use cases, conformance testing and performance assessment and extending the current V1 specification to support more use cases. To know more, visit mmc.mpai.community.

Tuesday @16Context-based Audio Enhancement is the name of an MPAI standard developed, approved, and published by MPAI. MPAI is now developing software implementations of the four MPAI-MMC use cases, conformance testing and performance assessment and extending the current V1 specification to support more use cases. To know more, visit cae.mpai.community.

Wednesday @13Connected Autonomous Vehicles (CAV) is a project finalising the CAV Use Cases and Functional Requirements document. The MPAI CAV is composed of 4 technology-laden subsystems. To know more, visit cav.mpai.community.

Wednesday @15AI-Enhanced Video Coding (EVC) is a project investigating the enhancement or replacement of existing video coding tools in the MPEG-5 EVC codec with AI tools. When MPAI will reach an improvement of 25%, a Call for Technologies will be issued. To know more, visit evc.mpai.community. On alternate week, the session is taken by End-to-End Video Coding (EEV), a project addressing video coding without the constraint of traditional video coding architectures. To know more, visit eev.mpai.community.

Wednesday @16Compression and Understanding of Industrial Data (CUI) is the name of an MPAI standard developed, approved, and published by MPAI. As all 4 components of an MPAI standard – technical specification, reference software, conformance testing and performance assessment have been published, the group is preparing for a new version of the standard that includes additional risks to seismic and cyber.

Thursday @14Server-based Predictive Multiplayer Gaming (SPG) is a project completing the validation of the SPG model. Once the validation is completed, it will finalise the SPG Use Cases and Functional Requirements. To know more, visit spg.mpai.community.

In addition to these technical meeting sessions, MPAI holds a General Assembly to discuss and ratify any proposed results from the technical groups.

The Board of Directors typically meets twice between two General Assemblies. Two advisory groups hold meetings:

Thursday @15Communication Advisory Committee manages the manifold MPAI communication activities. Some of them are press releases, newsletter, online presentations and social media.

Friday @15Industry and Standards Advisory Committee manages the relationships of MPAI with all external entities that are relevant for MPAI activities.

A lot has been happening in MPAI, and a lot is now happening so that a lot can happen next outside MPAI.

 


A better experience for audioconference users

Today video/audio conference is a virtual space where many of us spend their working hours. Still the experience of a conference suffer from many deficiencies depending on the fact that the way audio is captured and conveyed to the virtual space is inadequate. Our brains can separate the voice of competing speakers, and remove the effect of non-ideal acoustical properties of the physical space and/or the background noise in the same in the same physical environment.

However, when the acoustic signals from the different physical environments are merged in the virtual space, the operation of our brains can well not be as efficient. The result is a reduction in intelligibility of speech causing participants not fully understanding what their interlocutors are saying. The very purpose of the conference may be harmed and, at the end of the day, participants may very well feel more stressed than when they could meet people in person.

Many of these problems can be alleviated or resolved if a microphone array is used to capture the speakers’ speech signals. The individual speech signals can be separated, the non-ideal acoustics of the space can be reduced and any background noise can be substantially suppressed.

The fourth context of Context-based Audio Enhancement (MPAI-CAE), called Enhanced Audioconference Experience (CAE-EAE), aims to provide a complete solution enabling the processing of speech signals from a microphone array to provide clear speech signals free from back­ground noise and acoustics-related artefacts to improve the auditory quality of an audioconference experience. Specifically, CAE-EAE addresses the situation where one or more speakers are active in a noisy meeting room and try to communicate using speech with one or more interlocutors over the network.

The AIM Modules (AIM) required by this use case extract the speech signals of the individual speakers from the microphone array and reduce background noise and reverberation. CAE-EAE can also extract the spatial attributes of the speakers with respect to the position of the microphone array. This information, multiplexed with the multichannel audio can be properly used at the receiver side to create a spatial representation of the speech signals.

Figure 1 depicts the AI Module structure whose operation is described below.

Figure 1 – Enhanced Audioconference Experience Reference Model

  1. Analysis Transform AIM performs a time-frequency transformation to enable the operations downstream to be carried out in the frequency domain.
  2. Sound Field Description AIM converts the output from the Analysis Transform AIM into the spherical frequency domain.
  3. Speech Detection and Separation AIM detects the directions of active sound sources and separates the sources using the Source Model KB which provides simple acoustic source models. The separated sources can either be speech or non-speech.
  4. Noise Cancellation AIM eliminates background noise and reverberation producing a Denoised Speech in the frequency domain.
  5. Synthesis Transform AIM applies the inverse transform to Denoised Speech.
  6. Packager AIM produces a multiplexed stream which contains separated Multichannel Speech Streams and Audio Scene Geometry.

The MPAI CAE-EAE standard can change the experience of audio/video teleconference users.

 

 


Restoring damaged speech

The third context of the Context-based Audio Enhancement (MPAI-CAE) standard is restoration of damaged speech.

Unlike Audio Recording Preservation where audio has clear provenance – the magnetic tape of an open reel whose analogue audio has been digitised – Speech Restoration System does not make a reference to anything analogue. It assumes that there is a file containing digital speech. For whatever reason, it may be so that portions of the file are damaged – maybe the physical medium from which the file was created was partly corrupted. However, the use case assumes that the text that the creator used to make their speech is available.

Figure 1 shows the AI Modules (AIM), i.e., the components of the system

Figure 1 – The Speech Restoration System Reference Model

The basic idea is to create a speech model using a sufficient number of undamaged audio segments. The model is then served to a neural network acting as a speech synthesiser of the original that is used to synthesise all damaged speech segments in the list of damaged speech segments using the text corresponding to the damaged segment.

The result is an entirely restored speech file where the damaged segments have been replaced by the best estimate of the speech produced by the speaker.

It is time to become an MPAI member https://mpai.community/how-to-join/join/. Join the fun – build the future!


MPAI springs forward to an intense 2022

Established on 30 September 2020, MPAI spent its first 3 months giving itself a structure to execute its mission of developing Artificial Intelligence (AI)-based data coding standards.

Its first full year of operation – 2021 – has been engaging but rewarding:

  • 5 Technical Specifications (TS)  have been approved and released in the following domains:
    • Finance.
    • Human-machine communication.
    • Audio enhancement.
    • AI Framework
    • Ecosystem Governance.
  • The Company Performance Assessment TS was complemented by 3 additional specifications:
    • Reference Software (RS). a conforming implementation of the TS,
    • Conformance Testing (CT), to test that an implementation is technically correct and provides an adequate user experience
    • Performance Assessment (PA), to assess implementation reliability and trustworthiness.

A goal can be declared as reached only if the next goal is known, and the purpose of this post is to disclose exactly that.

The AI Framework (AIF), depicted in Figure 1, is a cornerstone of the MPAI architecture.

Figure 1 – The AI Framework (AIF) Reference Model and its Components

  • The AIF
    • Is Operating System-independent.
    • Has a local and distributed component-based Zero-Trust architecture.
    • Can create AI Workflows (AIW) made of elementary units called AI Modules (AIM).
    • Can access validated AIWs and AIMs by interfacing to the MPAI Store.
    • Can execute in a range of computing environments: from MCUs to HPCs.
    • Can interact with other AIFs operating in proximity.
    • Supports Machine Learning functionalities.
  • Its AIMs
    • Encapsulate components to abstract them from the development environment.
    • Call the Controller via standard interfaces.
    • Can be AI-based or data processing-based.
    • Can be in software or in hardware.

2022 MPAI Goal #1: AI Framework (MPAI-AIF)

  1. Development of the Reference Software (RS).
  2. Development of the Conformance Testing.

MPAI has already developed 3 application oriented Technical Specifications: MPAI-CAE (Enhanced audio), MPAI-CUI (Company Performance Prediction) and MPAI-MMC (Multimodal human-machine conversation). It total there are 10 AIWs and some 20 AIMs (several of them are used in different AIWs).

An active MPAI generates an ecosystem with the following actors:

  1. MPAI develop standards.
  2. Implem­enters develop MPAI standard implementations
  3. Users access such im­plemen­tations.

MPAI is all about facilitating a market of AI applications. Releasing standards enables a market but does not ensure that the market is functional. How can a user be sure that an implementation is secure, technically correct, unbiased? Note that by “user” we do not necessarily mean an end user, but also an app developer (i.e., AIW) who may need an AIM and does not have the resources or the competence to answer the 3 questions.

In its Governance of the MPAI Ecosystem TS, MPAI has envisaged two more players:

  1. Performance Assessors who assess that implementations are reliable and trustworthy.
  2. The MPAI Store where uploaded implementations are:
    1. Checked for security
    2. Tested for conformance
    3. Posted to the Store with a clear indication of level of performance.

Note that MPAI appoints Performance Assessors, and establishes and controls the MPAI Store, a not-for-profit commercial entity.

Figure 2 depicts the operation of the MPAI Ecosystem.

Figure 2 – The MPAI Ecosystem and its Governance

2022 MPAI Goal #2: Governance of the MPAI Ecosystem (MPAI-GME)

  1. Design the MPAI Store corporate structure
  2. Design and operate the MPAI Store
  3. Develop and run the MPAI Store IT service
  4. Design and operate the Performance Assessor network.

In 2020 MPAI has developed 3 application oriented TSs:

Compression and Understanding of Industrial Data (MPAI-CUI) with 1 use case.

Multimodal Conversation (MPAI-MMC) with 5 use cases.

Context-based Audio Enhancement (MPAI-CAE) with 4 use cases.


Figure 3 depicts the reference model of the Company Performance Prediction Use Case.

AI-based Company Performance Prediction measures the performance of a Company by providing Default Probability, Organisational Model Index, and Business Discontinuity Probability of the Company within a given Prediction Horizon using the Company’s Governance, Financial and Risk data
Figure 3 – The Company Performance Prediction CUI-CPP) Reference Model

MPAI-CUI includes the Reference Software (RS), Conformance Testing (CT) and Performance Assessment (PA) Specifications of the AI-based Company Performance Prediction (CPP).

2022 MPAI Goal #3: Compression and Understanding of Industrial Data (MPAI-CUI)

  1. Integration of the RS in MPAI-AIF
  2. Submission of RS to MPAI Store
  3. Development of Version 2 (extension of functionality of existing AIMs and new AIWs to support more risks).

Multi-modal conversation (MPAI-MMC) uses AI to enable human-machine conversation emul­ating human-human conversation in completeness and intensity. It includes 5 Use Cases: Conversation with EmotionMultimodal Question AnsweringUnidirectional Speech TranslationBidirectional Speech Translation and One-to-Many Unidirectional Speech Translation.

The figures below show the reference models of the MPAI-MMC Use Cases.

Conversation with Emotion (CWE) enables a human to holds an audio-visual conver­sation using audio and video with a computational system that is impersonated by a synthetic voice and an animated face, both expressing emotion appropriate to the emotional state of the human.
Figure 4 – Conversation with Emotion
Multimodal Question Answering (MQA) enables a user to request information using speech concerning an object the user displays and to receive the requested information from a computational system via synthetic speech.
Figure 5 – Multimodal Question Answering
Unidirectional Speech Translation (UST) allows a user to select a language different from the one s/he uses and to get a spoken utterance translated into the desired language with a synthetic voice that optionally preserves the personal vocal traits of the spoken utterance.
Figure 6 – Unidirectional Speech Translation
Bidirectional Speech Translation (BST) allows a human to hold a dialogue with another human. Both speech their own language and their translated speech is a synthetic speech that optionally preserves their personal vocal traits.
Figure 7 – Bidirectional Speech Translation
One-to-Many Speech Translation (MST) enables a human to select a number of languages and have their speech translates to the selected languages using a synthetic speech that optionally preserves their personal vocal traits.
Figure 8 – One-to-Many Speech Translation

Currently, only the MPAI-MMC TS is available. Thereforethe

2022  MPAI Goal #4 for Multimodal Conversation (MPAI-MMC)

  1. Development of the RS of the 5 Use Cases, integration in AIF and submission to the Store
  2. Development of the CT specification of the 5 Use Cases
  3. Development of the PA specification of the 5 Use Cases
  4. Development of Version 2 that includes extension of functionality of existing AIMs and new AIWs, some coming from projects under development such as MPAI-CAV (Connected Autonomous Vehicles) and MPAI-MCS (Mixed-reality Collaborative Spaces).

The 4 use cases considered are: Emotion Enhanced SpeechAudio Recording PreservationSpeech Restoration System and Enhanced Audioconference.

The figures below shows the reference models of the MPAI-CAE Use Cases. Note that an Implementation is supposed to run in the MPAI-specified AI Framework (MPAI-AIF).

Emotion-Enhanced Speech (EES) enables a user to indicate a model utterance or an Emotion to obtain an emotionally charged version of a given utterance.

In many use cases, emotional force can usefully be added to speech which by default would be neutral or emotionless,

Figure 9 – Emotion Enhanced Speech
Audio Recording Preservation (ARP) Use Case enables a user to create of digital copies  of a digitised audio of open-reel magnetic tapes suitable for long-term preservation and for correct play back of the digitised recording (restored, if necessary).
Figure 10 – Audio Recording Preservation
Speech Restoration System (SRS) enables a user to restore a Damaged Segment of an Audio Segment containing only speech from a single speaker. No filtering or signal processing is involved. Instead, replacements for the damaged vocal elements are synthesised using a speech model.
Figure 11 – Speech Restoration System
Enhanced Audioconference Experience (EAE) enables a user to improve the auditory quality of audioconference experience by processing speech signals recorded by microphone arrays and  provide speech signals free from back­ground noise and acoustics-related artefacts .
Figure 12 – Enhanced Audioconference Experience

Currently, only the MPAI-CAE TS is available. Therefore

MPAI Goal #5 in 2022 is further development of MPAI-CAE

  1. Development of RS of the 4 Use Cases, integration in AIF and submission to the Store
  2. Development of the CT specification of the 4 Use Cases
  3. Development of the PA specification of the 4 Use Cases
  4. Development of Version 2 that will include extension of functionality of existing AIMs and new AIWs, some coming from projects under development such as MPAI-CAV (Connected Autonomous Vehicles) and MPAI-MCS (Mixed-reality Collaborative Spaces).

MPAI has 7 projects at different levels of development. For each of these a Goal is assigned.

2022 MPAI Goal #6 in 2022 is development of MPAI-SPG

  1. TS, RS, CT, PA of Server-based Predictive Multiplayer Gaming
2022 MPAI Goal #7 for Connected Automotive Vehicles (MPAI-CAV)

  1. TS, RS, CT, PA of Connected Automotive Vehicles. This will include interactions with MPAI-MMC and MPAI-CAE
2022 MPAI Goal #8 for Mixed-reality Collaborative Spaces (MPAI-MCS)

  1. TS, RS, CT, PA of Mixed-reality Collaborative Spaces. This will include interactions with MPAI-MMC and MPAI-CAE
2022 MPAI Goal #9 for Integrative Genomic/Sensor Analysis (MPAI-GSA)

  1. TS, RS, CT, PA of Integrative Genomic/Sensor Analysis
2022 MPAI Goal #10 for AI-Enhanced Video Coding (MPAI-EVC)

  1. The AI-Enhanced Video Coding (MPAI-EVC) Evidence Project will continue toward reaching the goal of 25% improvement over MPEG-5 EEV
2022 MPAI Goal #11 for AI-based End-to-End Video Coding (MPAI-EEV)

  1. AI-based End-to-End Video Coding (MPAI-EVC) will continue harnessing the potential of an unconstrained approach ti AI-based Video Coding.
2022 MPAI Goal #12 for Visual Object and Scene Description (MPAI-OSD)

  1. Visual Object and Scene Description (MPAI-OSD) will continue collecting use cases where visual information coding is required.

 


31 December 2021 – MPAI takes stock of the work done

One year ago today, MPAI could take stock of 3 months of work: an established organisation with the mission of developing Artificial Intelligence (AI)-based data coding standards, a first identification of a program of work, progression of several work items and a published Call for Technologies for one of them.

What can MPAI say today? That it has lived a very intense year and that it can declare itself satisfied of what it has achieved.

The first is that it has refined its method of work to make it solid but also capable to overcome problems plaguing other Standards Developing Organisations (SDO). A standard project goes through 8 stages, progression to a new stage requiring approval by the MPAI General Assembly. A Call for Technology is issued with Functional and Commercial Requirements.

The second is that it has developed 3 Technical Specifications (TS) that use AI to enable the industry to accelerate deployment of AI-based applications. Two words for each of them:

Context-based Audio Enhancement (MPAI-CAE) – supports 4 use cases:

  1. Emotion-Enhanced Speech (EES) allows a user to give a machine a sentence uttered without emotion and obtain one that it is uttered with a given emotion, say, happy, or sad, or cheerful etc., or uttered with the colour of a specific model utterance.
  2. Audio Recording Preservation (ARP) allows a user to preserve old audio tapes by providing a high-quality digital version and a digital version restored using AI with a documented set of irregularities found in the tape.
  3. Speech Restoration System (SRS) allows a user to automatically recover damaged speech segments using a speech model obtained from the undamaged part of the speech.
  4. Enhanced Audioconference Experience (EAE) improves a participant’s audioconference experience by using a microphone array and extracting the spatial attributes of the speakers with respect to the position of the microphone array to allow spatial representation of the speech signals at the receiver.

Compression and Understanding of Industrial Data (MPAI-CUI) support one use case: Company Performance Prediction. This gives the financial risk assessment industry new, powerful and extensible means to predict the performance of a company several years into the future in terms of company default probability, business discontinuity probability and adequacy index of company organisational model.

Multimodal Conversation (MPAI-MMC) – supports 5 use cases:

  1. Conversation with Emotion (CWE) allows a user to have a full conversation with a machine impersonated by a synthetic speech and an animated face. The machine understands the emotional state of the user and its speech and face are congruent with that emotional state.
  2. Multimodal Question Answering (MQA) allows a user to ask a machine via speech information about an object held in their hand and obtain a verbal response from the machine.
  3. Unidirectional Speech Translation (UST) allows a user to express a verbal sentence in a language and obtain a verbal translation in another language that preserves the user’s vocal featurers.
  4. Bidirectional Speech Translation (BST) allows two users to have a dialogue each using their own language and hearing the other user’s translated voice with that user’s native speech features.
  5. One-to-Many Speech Translation (MST); allows a user to select a set of languages and have their speech translated to the selected languages with the possibility to decide whether to preserve or not their speech features in the translations.

The first MPAI Call for Technologies was issued on 16 December 2020 and concerned the AI Framework (MPAI-AIF) for creation and execution of AI Workflows (AIW) composed of AI Modules (AIM). These may have been developed in any environment using any proprietary framework for any operating system, be AI- and non-AI-based, implemented in hardware or software or in a hybrid hardware and software combination, for execute in MCUs up to HPC in local and distributed environments, and in proximity with other AIFs, irrespective of the AIM provider. The three TSs mentioned above rely on the MPAI-AIF TS for their implementations.

Finally, in 2021 MPAI has develop the Governance of the MPAI Ecosystem (MPAI-GME) TS. This lays down the rules governing the MPAI Ecosystem composed

  1. MPAI developing standards.
  2. Implem­enters developing implementations
  3. The MPAI established and controlled not-for-profit MPAI Store where implementations are uploaded, checked for security, and tested for conformance.
  4. MPAI-appointed performance assessors who assess that implementations are reliable and trustworthy.
  5. Users who can access secure MPAI standard im­plemen­tations guar­an­teed for Conformance and Performance.

In 2020, MPAI has developed the four MPAI components – Technical Specifications, Reference Software, Conformance Testing and Performance Assessment – for Compression and Understanding of Industrial Data (MPAI-CUI). These, together with the other TSs, are published on the MPAI web site.

These are firm results for standards that industry can take up, but MPAI has carried out substantial more work preparing for the future:

MPAI-CAV: Connected Autonomous Vehicles

MPAI-EEV: AI-based End-to-End Video Coding

MPAI-EVC: AI-Enhanced Video Coding

MPAI-MCS: Mixed-reality Collaborative Spaces

MPAI-SPG: Server-based Predictive Multiplayer Gaming

This huge work has been carried out by a network of technical groups that MPAI thanks for their efforts and results.

Want to know more? Read “Towards Pervasive and Trustworthy Artificial Intelligence”, the book that illustrates the results achieved by MPAI in its 15 months of operation and the plans for the next 12 months.

The work has just begun. Become an MPAI member. Join the fun – build the future!