Moving Picture, Audio and Data Coding
by Artificial Intelligence

MPAI springs forward to an intense 2022

Established on 30 September 2020, MPAI spent its first 3 months giving itself a structure to execute its mission of developing Artificial Intelligence (AI)-based data coding standards.

Its first full year of operation – 2021 – has been engaging but rewarding:

  • 5 Technical Specifications (TS)  have been approved and released in the following domains:
    • Finance.
    • Human-machine communication.
    • Audio enhancement.
    • AI Framework
    • Ecosystem Governance.
  • The Company Performance Assessment TS was complemented by 3 additional specifications:
    • Reference Software (RS). a conforming implementation of the TS,
    • Conformance Testing (CT), to test that an implementation is technically correct and provides an adequate user experience
    • Performance Assessment (PA), to assess implementation reliability and trustworthiness.

A goal can be declared as reached only if the next goal is known, and the purpose of this post is to disclose exactly that.

The AI Framework (AIF), depicted in Figure 1, is a cornerstone of the MPAI architecture.

Figure 1 – The AI Framework (AIF) Reference Model and its Components

  • The AIF
    • Is Operating System-independent.
    • Has a local and distributed component-based Zero-Trust architecture.
    • Can create AI Workflows (AIW) made of elementary units called AI Modules (AIM).
    • Can access validated AIWs and AIMs by interfacing to the MPAI Store.
    • Can execute in a range of computing environments: from MCUs to HPCs.
    • Can interact with other AIFs operating in proximity.
    • Supports Machine Learning functionalities.
  • Its AIMs
    • Encapsulate components to abstract them from the development environment.
    • Call the Controller via standard interfaces.
    • Can be AI-based or data processing-based.
    • Can be in software or in hardware.

2022 MPAI Goal #1: AI Framework (MPAI-AIF)

  1. Development of the Reference Software (RS).
  2. Development of the Conformance Testing.

MPAI has already developed 3 application oriented Technical Specifications: MPAI-CAE (Enhanced audio), MPAI-CUI (Company Performance Prediction) and MPAI-MMC (Multimodal human-machine conversation). It total there are 10 AIWs and some 20 AIMs (several of them are used in different AIWs).

An active MPAI generates an ecosystem with the following actors:

  1. MPAI develop standards.
  2. Implem­enters develop MPAI standard implementations
  3. Users access such im­plemen­tations.

MPAI is all about facilitating a market of AI applications. Releasing standards enables a market but does not ensure that the market is functional. How can a user be sure that an implementation is secure, technically correct, unbiased? Note that by “user” we do not necessarily mean an end user, but also an app developer (i.e., AIW) who may need an AIM and does not have the resources or the competence to answer the 3 questions.

In its Governance of the MPAI Ecosystem TS, MPAI has envisaged two more players:

  1. Performance Assessors who assess that implementations are reliable and trustworthy.
  2. The MPAI Store where uploaded implementations are:
    1. Checked for security
    2. Tested for conformance
    3. Posted to the Store with a clear indication of level of performance.

Note that MPAI appoints Performance Assessors, and establishes and controls the MPAI Store, a not-for-profit commercial entity.

Figure 2 depicts the operation of the MPAI Ecosystem.

Figure 2 – The MPAI Ecosystem and its Governance

2022 MPAI Goal #2: Governance of the MPAI Ecosystem (MPAI-GME)

  1. Design the MPAI Store corporate structure
  2. Design and operate the MPAI Store
  3. Develop and run the MPAI Store IT service
  4. Design and operate the Performance Assessor network.

In 2020 MPAI has developed 3 application oriented TSs:

Compression and Understanding of Industrial Data (MPAI-CUI) with 1 use case.

Multimodal Conversation (MPAI-MMC) with 5 use cases.

Context-based Audio Enhancement (MPAI-CAE) with 4 use cases.


Figure 3 depicts the reference model of the Company Performance Prediction Use Case.

AI-based Company Performance Prediction measures the performance of a Company by providing Default Probability, Organisational Model Index, and Business Discontinuity Probability of the Company within a given Prediction Horizon using the Company’s Governance, Financial and Risk data
Figure 3 – The Company Performance Prediction CUI-CPP) Reference Model

MPAI-CUI includes the Reference Software (RS), Conformance Testing (CT) and Performance Assessment (PA) Specifications of the AI-based Company Performance Prediction (CPP).

2022 MPAI Goal #3: Compression and Understanding of Industrial Data (MPAI-CUI)

  1. Integration of the RS in MPAI-AIF
  2. Submission of RS to MPAI Store
  3. Development of Version 2 (extension of functionality of existing AIMs and new AIWs to support more risks).

Multi-modal conversation (MPAI-MMC) uses AI to enable human-machine conversation emul­ating human-human conversation in completeness and intensity. It includes 5 Use Cases: Conversation with EmotionMultimodal Question AnsweringUnidirectional Speech TranslationBidirectional Speech Translation and One-to-Many Unidirectional Speech Translation.

The figures below show the reference models of the MPAI-MMC Use Cases.

Conversation with Emotion (CWE) enables a human to holds an audio-visual conver­sation using audio and video with a computational system that is impersonated by a synthetic voice and an animated face, both expressing emotion appropriate to the emotional state of the human.
Figure 4 – Conversation with Emotion
Multimodal Question Answering (MQA) enables a user to request information using speech concerning an object the user displays and to receive the requested information from a computational system via synthetic speech.
Figure 5 – Multimodal Question Answering
Unidirectional Speech Translation (UST) allows a user to select a language different from the one s/he uses and to get a spoken utterance translated into the desired language with a synthetic voice that optionally preserves the personal vocal traits of the spoken utterance.
Figure 6 – Unidirectional Speech Translation
Bidirectional Speech Translation (BST) allows a human to hold a dialogue with another human. Both speech their own language and their translated speech is a synthetic speech that optionally preserves their personal vocal traits.
Figure 7 – Bidirectional Speech Translation
One-to-Many Speech Translation (MST) enables a human to select a number of languages and have their speech translates to the selected languages using a synthetic speech that optionally preserves their personal vocal traits.
Figure 8 – One-to-Many Speech Translation

Currently, only the MPAI-MMC TS is available. Thereforethe

2022  MPAI Goal #4 for Multimodal Conversation (MPAI-MMC)

  1. Development of the RS of the 5 Use Cases, integration in AIF and submission to the Store
  2. Development of the CT specification of the 5 Use Cases
  3. Development of the PA specification of the 5 Use Cases
  4. Development of Version 2 that includes extension of functionality of existing AIMs and new AIWs, some coming from projects under development such as MPAI-CAV (Connected Autonomous Vehicles) and MPAI-MCS (Mixed-reality Collaborative Spaces).

The 4 use cases considered are: Emotion Enhanced SpeechAudio Recording PreservationSpeech Restoration System and Enhanced Audioconference.

The figures below shows the reference models of the MPAI-CAE Use Cases. Note that an Implementation is supposed to run in the MPAI-specified AI Framework (MPAI-AIF).

Emotion-Enhanced Speech (EES) enables a user to indicate a model utterance or an Emotion to obtain an emotionally charged version of a given utterance.

In many use cases, emotional force can usefully be added to speech which by default would be neutral or emotionless,

Figure 9 – Emotion Enhanced Speech
Audio Recording Preservation (ARP) Use Case enables a user to create of digital copies  of a digitised audio of open-reel magnetic tapes suitable for long-term preservation and for correct play back of the digitised recording (restored, if necessary).
Figure 10 – Audio Recording Preservation
Speech Restoration System (SRS) enables a user to restore a Damaged Segment of an Audio Segment containing only speech from a single speaker. No filtering or signal processing is involved. Instead, replacements for the damaged vocal elements are synthesised using a speech model.
Figure 11 – Speech Restoration System
Enhanced Audioconference Experience (EAE) enables a user to improve the auditory quality of audioconference experience by processing speech signals recorded by microphone arrays and  provide speech signals free from back­ground noise and acoustics-related artefacts .
Figure 12 – Enhanced Audioconference Experience

Currently, only the MPAI-CAE TS is available. Therefore

MPAI Goal #5 in 2022 is further development of MPAI-CAE

  1. Development of RS of the 4 Use Cases, integration in AIF and submission to the Store
  2. Development of the CT specification of the 4 Use Cases
  3. Development of the PA specification of the 4 Use Cases
  4. Development of Version 2 that will include extension of functionality of existing AIMs and new AIWs, some coming from projects under development such as MPAI-CAV (Connected Autonomous Vehicles) and MPAI-MCS (Mixed-reality Collaborative Spaces).

MPAI has 7 projects at different levels of development. For each of these a Goal is assigned.

2022 MPAI Goal #6 in 2022 is development of MPAI-SPG

  1. TS, RS, CT, PA of Server-based Predictive Multiplayer Gaming
2022 MPAI Goal #7 for Connected Automotive Vehicles (MPAI-CAV)

  1. TS, RS, CT, PA of Connected Automotive Vehicles. This will include interactions with MPAI-MMC and MPAI-CAE
2022 MPAI Goal #8 for Mixed-reality Collaborative Spaces (MPAI-MCS)

  1. TS, RS, CT, PA of Mixed-reality Collaborative Spaces. This will include interactions with MPAI-MMC and MPAI-CAE
2022 MPAI Goal #9 for Integrative Genomic/Sensor Analysis (MPAI-GSA)

  1. TS, RS, CT, PA of Integrative Genomic/Sensor Analysis
2022 MPAI Goal #10 for AI-Enhanced Video Coding (MPAI-EVC)

  1. The AI-Enhanced Video Coding (MPAI-EVC) Evidence Project will continue toward reaching the goal of 25% improvement over MPEG-5 EEV
2022 MPAI Goal #11 for AI-based End-to-End Video Coding (MPAI-EEV)

  1. AI-based End-to-End Video Coding (MPAI-EVC) will continue harnessing the potential of an unconstrained approach ti AI-based Video Coding.
2022 MPAI Goal #12 for Visual Object and Scene Description (MPAI-OSD)

  1. Visual Object and Scene Description (MPAI-OSD) will continue collecting use cases where visual information coding is required.

 


31 December 2021 – MPAI takes stock of the work done

One year ago today, MPAI could take stock of 3 months of work: an established organisation with the mission of developing Artificial Intelligence (AI)-based data coding standards, a first identification of a program of work, progression of several work items and a published Call for Technologies for one of them.

What can MPAI say today? That it has lived a very intense year and that it can declare itself satisfied of what it has achieved.

The first is that it has refined its method of work to make it solid but also capable to overcome problems plaguing other Standards Developing Organisations (SDO). A standard project goes through 8 stages, progression to a new stage requiring approval by the MPAI General Assembly. A Call for Technology is issued with Functional and Commercial Requirements.

The second is that it has developed 3 Technical Specifications (TS) that use AI to enable the industry to accelerate deployment of AI-based applications. Two words for each of them:

Context-based Audio Enhancement (MPAI-CAE) – supports 4 use cases:

  1. Emotion-Enhanced Speech (EES) allows a user to give a machine a sentence uttered without emotion and obtain one that it is uttered with a given emotion, say, happy, or sad, or cheerful etc., or uttered with the colour of a specific model utterance.
  2. Audio Recording Preservation (ARP) allows a user to preserve old audio tapes by providing a high-quality digital version and a digital version restored using AI with a documented set of irregularities found in the tape.
  3. Speech Restoration System (SRS) allows a user to automatically recover damaged speech segments using a speech model obtained from the undamaged part of the speech.
  4. Enhanced Audioconference Experience (EAE) improves a participant’s audioconference experience by using a microphone array and extracting the spatial attributes of the speakers with respect to the position of the microphone array to allow spatial representation of the speech signals at the receiver.

Compression and Understanding of Industrial Data (MPAI-CUI) support one use case: Company Performance Prediction. This gives the financial risk assessment industry new, powerful and extensible means to predict the performance of a company several years into the future in terms of company default probability, business discontinuity probability and adequacy index of company organisational model.

Multimodal Conversation (MPAI-MMC) – supports 5 use cases:

  1. Conversation with Emotion (CWE) allows a user to have a full conversation with a machine impersonated by a synthetic speech and an animated face. The machine understands the emotional state of the user and its speech and face are congruent with that emotional state.
  2. Multimodal Question Answering (MQA) allows a user to ask a machine via speech information about an object held in their hand and obtain a verbal response from the machine.
  3. Unidirectional Speech Translation (UST) allows a user to express a verbal sentence in a language and obtain a verbal translation in another language that preserves the user’s vocal featurers.
  4. Bidirectional Speech Translation (BST) allows two users to have a dialogue each using their own language and hearing the other user’s translated voice with that user’s native speech features.
  5. One-to-Many Speech Translation (MST); allows a user to select a set of languages and have their speech translated to the selected languages with the possibility to decide whether to preserve or not their speech features in the translations.

The first MPAI Call for Technologies was issued on 16 December 2020 and concerned the AI Framework (MPAI-AIF) for creation and execution of AI Workflows (AIW) composed of AI Modules (AIM). These may have been developed in any environment using any proprietary framework for any operating system, be AI- and non-AI-based, implemented in hardware or software or in a hybrid hardware and software combination, for execute in MCUs up to HPC in local and distributed environments, and in proximity with other AIFs, irrespective of the AIM provider. The three TSs mentioned above rely on the MPAI-AIF TS for their implementations.

Finally, in 2021 MPAI has develop the Governance of the MPAI Ecosystem (MPAI-GME) TS. This lays down the rules governing the MPAI Ecosystem composed

  1. MPAI developing standards.
  2. Implem­enters developing implementations
  3. The MPAI established and controlled not-for-profit MPAI Store where implementations are uploaded, checked for security, and tested for conformance.
  4. MPAI-appointed performance assessors who assess that implementations are reliable and trustworthy.
  5. Users who can access secure MPAI standard im­plemen­tations guar­an­teed for Conformance and Performance.

In 2020, MPAI has developed the four MPAI components – Technical Specifications, Reference Software, Conformance Testing and Performance Assessment – for Compression and Understanding of Industrial Data (MPAI-CUI). These, together with the other TSs, are published on the MPAI web site.

These are firm results for standards that industry can take up, but MPAI has carried out substantial more work preparing for the future:

MPAI-CAV: Connected Autonomous Vehicles

MPAI-EEV: AI-based End-to-End Video Coding

MPAI-EVC: AI-Enhanced Video Coding

MPAI-MCS: Mixed-reality Collaborative Spaces

MPAI-SPG: Server-based Predictive Multiplayer Gaming

This huge work has been carried out by a network of technical groups that MPAI thanks for their efforts and results.

Want to know more? Read “Towards Pervasive and Trustworthy Artificial Intelligence”, the book that illustrates the results achieved by MPAI in its 15 months of operation and the plans for the next 12 months.

The work has just begun. Become an MPAI member. Join the fun – build the future!


Audio preservation saves memory

Another context of MPAI Audio Enhancement is preservation. Many audio archives urgently need to digitise their records, especially analogue magnetic tapes, because of their life expectancy is short if compared to paper records. International institutions (e.g., International Association of Sound and Audio-visuals Archives, IASA; World Digital Library, WDL; Europeana) have defined guidelines, sometimes only partly compatible, but appropriate international standards are lacking.

The Audio Recording Preservation (ARP) use case of the MPAI-CAE standard (CAE-ARP) opens the way to effectively respond to methodological questions of reliability with respect to audio recordings as documentary sources, while clarifying the concept of “historical faithfulness”. The magnetic tape carrier may hold important information: multiples splices; annotations (by the composer or by the technicians) and/or display several types of irregularities (e.g., corruptions of the carrier, tape of different colour or chemical composition).

AI can have a significant impact on cultural heritage because it can make its safeguarding sustainable by drastically changing the way it is preserved, accessed, added value. Audio archives, an important part of this heritage require important resources in term of people, time, and funding.

An important example of how AI can drastically reduce the resources necessary to preserve and make accessible analogue recordings is provided by CAE-ARP providing a workflow for managing open-reel tape audio recordings. It focuses on audio read from magnetic tapes, digitised and fed into a preservation system together with the data from a video camera pointed to the head reading the magnetic tape. The output of the restoration process is composed by a preservation master file that contains the high-resolution audio signal and several other information types created by the preservation process. The goal is to cover the whole “philologically informed” archival process of an audio document, from the active preservation of sound documents to the access to digitised files.

Figure 1 depicts the CAE-ARP workflow. Its operation is concisely described below.

Figure 1 – Audio recording preservation

  1. The Audio Analyser and Video Analyser AIMs analyse the Preservation Audio File (a high-quality audio signal) and the Preservation Audio-Visual File (video of the reading head).
  2. All detected Audio and Image irregularities are sent to the Tape Irregularity Classifier AIM, which selects those most relevant for restoration and access.
  3. The Tape Audio Restoration AIM uses the irregularities to correct potential errors occurred at the time the audio signal was analogue-to-digital converted.
  4. The Restored Audio File, the Editing List (used to produce the Restored Audio File, the Irregularity Images, and the Irregularity File containing information about the irregularities) are inserted in the Packager.
  5. The Packager produces the Access Copy Files. These are used, as the name implies, to access the audio content and the Preservation Master Files, with the original inputs and data produced during the analysis, used for preservation.

The ARP workflow described above is complex and involves different audio and video competences. Therefore, the MPAI approach of subdividing complex systems in smaller components is well-suited to advance different algorithms and functionalities typically involving different professionals or companies.

Currently, ARP is limited to mono audio recordings on open-reel magnetic tape, The goal is to extend it to more complex recordings and additional analogue carriers such as audiocassettes or vinyl.


A standard for “better” audio

Standards for audio exist: MPEG-1 Audio layer II and layer III (so called MP3) and a slate of AAC standards serving all tastes offer efficient ways to store and transmit different types of mono, stereo and multichannel audio . MPEG-H offers ways to transmit and present 3D audio experiences.

Never before, if not at the level of company products, however, was there a standard whose goal is not to preserve audio quality at low bitrates, but to improve it or, as the name of the standard – “Context-based Audio Enhancement”, acronym MPAI-CAE – says, enhance it.

Of course there are probably as many ways to enhance audio as there are target users, so what does audio enhancement mean and how can a standard be produced for such a goal?

The magic word that changes the perspective is the word “context”. The MPAI-CAE standard identifies contexts in which audio can be enhanced. The next clarification comes from the fact that the standard is not monolithic, in other words, it identifies several contexts to which the standard can be applied.

Context #1: imagine that you have a sentence that you would like to be able to pronounce with a particular emotional charge: say, happy, or sad, or cheerful etc. or as if it were pronounced with the colour of a specific model utterance. If we were in a traditional encoder-decoder setting, there would be little to standardise. If you have the know how, you do it. If you don’t, you ask someone who has that know how to do it for you.

So, why should there be a standard for context #1?

To answer the question, I need to go back to a definition that I found years ago in the Encyclopaedia Britannica:

Standardisation, in industry: setting of guidelines that permit large production runs of component parts that are readily fitted to other parts without adjustment.

In practice the definition means that if there is a standard for nuts and bolts, and you have a standard nut, you can find someone who has the bolt to which your nut fits.

MPAI-CAE Context #1 is a straightforward application of the Encyclopaedia Britannica definition because it defines the components that can be assembled to make a system that lets you do one of the following:

  1. It receives your vocal utterance without colour and pronounces it using the speech features of the model utterance
  2. It receives your vocal utterance without colour, the indication of one or more emotions, the indication of a language and pronounces it with the particular emotion(s) and the “intonation” of the specified language.

There is one point that I must make clear. I said that the standard “defines the components” of the system, but I should have said that the “defines the interfaces of the components”. This is no different than the “nuts and bolts standard”. That standard defines neither the nuts nor the bolts. It defines the threading, i.e., the “interface” between the nut and the bolts.

Lets now go to a block diagram

 Figure 1 – Reference Model of Emotion Enhanced Speech

Here we see how the MPAI standardisation model works.

  1. Speech Feature Analyser2 is a very sophisticated technology component that must be able to extract your speech features which are very specific of you and embedded deeply in your vocal utterances.
  2. Emotion Feature Inserter is an even more sophisticated technology component because it must be able to take the Features of your Emotionless Speech, the Emotion, say, “cheerful” (whose semantics is defined by MPAI-CAE standard), and the Language, and generate Speech Features that convey your personal speech features, the cheerful Emotion, and the specifics of the selected language.
  3. The Emotion Inserter, another very sophisticated component, receives the Speech Features from the Emotion Feature Inserter together with your Emotionless Speech and produces an emotionally charged vocal utterance according to your wishes.

A similar process unfolds for the upper branch of the diagram where is used. a model utterance.

In principle, each of the identified components – that MPAI calls AI Modules (AIM) – can be re-used in other context. We will see how that is done because this is just the first MPAI-CAE context. There will be soon opportunities to introduce other contexts,


The why of the MPAI mission

In research, a technology that had attracted the interest of researchers decades ago and stayed at that level for a long time, may suddenly come into focus. This is the case of the collection of different technologies called Artificial Intelligence (AI). Although this moniker might suggest that machines are able to replicate the main human trait, in practice such techniques boil down to algorithmically sophisticated pattern matching enabled by training on large collections of input data.  Embedded today in a range of applications, AI has started affecting the life of millions of people and is expected to do so even more in the future.

AI provides tools to “get inside” the meaning of data to an extent not reached by previous technologies. The word “data” is used to indicate anything that represents information in digital form ranging from the US Library of Congress to a sequenced DNA, to the output of a video camera or an array of microphones, to the data generated by a company. Through AI, the number of bits required to represent information can be reduced, “anomalies” in the data discovered, and a machine can spot patterns that might not be immediately evident to humans.

AI is already among us doing useful things. There is keen commercial interest in implementing more AI-centric processes unleashing its full potential. Unfortunately, the way a technology leaves the initial narrow scientific scope to become mainstream and pervasive for products, services and applications is usually not linear nor fast. However, exceptions exist. Looking back to the history of MPEG, we can see digital media standards not only accelerated the mass availability of products enabled by new technologies, but also generated new products never thought of before.

In fact, the MPEG phenomenon was revolutionary because its standards were conceived to be industry neutral, and the process unfolded successfully because it had been designed around this feature. The revolution, however, was kind of “limited” because MPEG was confined to “media” (even though it tried to escape from that walled garden).

Here we talk about AI-centric data coding standards, which do not have such limitations. AI tools are flexible and can reasonably be adapted to any type of data. Therefore, as digital media standards have positively influenced industry and billions of people, so AI-based data coding standards are expected to have a similar, if not stronger impact. Research shows that AI-based data coding is generally more efficient than existing technologies for, e.g., data compression and description.

These considerations have led a group of companies and institutions to establish the Moving Picture, Audio and Data Coding by AI – MPAI – as an international, unaffiliated not-for-profit Standards Developing Organisation (SDO).

However, standards are useful to people and industry if they enable open markets. Still, the industry might invest hundreds of millions into the development of a standard, only to find that it is not practically usable or it is only accessible to a lucky few. In this case rather than enabling markets, the standard itself causes market distortion. This is a rather new situation for official standards, caused by the industry’s recent inability to cope with tectonic changes induced by technology and market. As a result, developing a standard today may appear like a laudable goal, but the current process can actually turn into a disappointment for industry. A standards development paradigm more attuned to the current situation is needed.

Therefore, to compensate for some standards organisations’ shortcomings in their handling of patents, the MPAI scope extends beyond the development of standards for a technology area to include Intellectual Property Rights guidelines.

Let’s briefly compare how the incumbent Data Processing (DP) technology and AI work. When they apply DP, humans study the nature of the data and design a priori methods to process it. When they apply AI, prior understanding of the data is not paramount – a suitably “prepared” machine is subjected to many possible inputs so that it can “learn” from the actual data what the data “means”.

In a sense, the results of bad training are similar in humans and machines. As an education with “bad” examples can make “bad” humans, a “bad”, i.e., insufficient, sectorial, biased etc. education makes machines do a “bad” job. The conclusion is that, when designing a standard for an AI-based application, the technical specification is not sufficient. So, MPAI’s stated goal to make AI applications interoperable and hence pervasive through standards is laudable, but the result is possibly perverse if ungoverned “bad” AI applications pollute a society relying on them.

For these reasons, MPAI has been designed to operate beyond the typical remit of a standards-developing organisation – albeit it fulfills this mission quite effectively, with five full-fledged standards developed in 15 months of operation. An essential part of the MPAI mission consists of providing the users with quantitative means to make informed decisions about which implementations should be preferred for a given task.

Thanks to MPAI, implementers have available standards that can be used to provide trustworthy products, applications and services, and users can make informed decisions as to which one is best suited to their needs. This will result in a more widespread acceptance of AI-based technology, paving the way for its benefits to be fully reaped by the society.

To know more you should read the book “Towards Pervasive and Trustworthy Artificial Intelligence” available from Amazon https://www.amazon.com/dp/B09NS4T6WN/


MPAI concludes 2021 approving new Context-based Audio Enhancement standard

Geneva, Switzerland – 22 December 2021. Today the Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) standards developing organisation has concluded the year 2022, its first full year of operation approving its fifth standard for publication.

The standards developed and published by MPAI so far are:

  1. Context-based Audio Enhancement (MPAI-CAE) – approved today – supports 4 identified use cases: adding a desired emotion to an emotion-less speech segment, preserving old audio tapes, restoring audio segments and improving the audio conference experience.
  2. AI Framework (MPAI-AIF) enables creation and automation of mixed Machine Learning, Artificial Intelligence, Data Processing and inference workflows. The Framework can be implemented as software, hardware, or hybrid software and hardware.
  3. Compression and Understanding of Industrial Data (MPAI-CUI) gives the financial risk assessment industry new, powerful and extensible means to predict the performance of a company several years into the future.
  4. Multimodal Conversation (MPAI-MMC) enables advanced human-machine conversation forms such as: holding an audio-visual conversation with a machine impersonated by a synthetic voice and an animated face; requesting and receiving information via speech about a displayed object; inter­preting speech to one, two or many languages using a synthetic voice that preserves the features of the human speech.
  5. Governance of the MPAI Ecosystem (MPAI-GME) lays down the rules governing an ecosystem of implem­enters and users of secure MPAI standard im­plemen­tations guar­an­teed for Conformance and Performance, and acces­sible through the not-for-profit MPAI Store.

The Book “Towards Pervasive and Trustworthy Artificial Intelligence” illustrates the results achieved by MPAI in its 15 months of operation and the plans for the next 12 months.

MPAI is currently working on several other standards, some of which are:

  1. Server-based Predictive Multiplayer Gaming (MPAI-SPG) uses AI to train a network that com­pensates data losses and detects false data in online multiplayer gaming.
  2. AI-Enhanced Video Coding (MPAI-EVC), a candidate MPAI standard improving existing video coding tools with AI and targeting short-to-medium term applications.
  3. End-to-End Video Coding (MPAI-EEV) is a recently launched MPAI exploration promising a fuller exploitation of the AI potential in a longer-term time frame that MPAI-EVC.
  4. Connected Autonomous Vehicles (MPAI-CAV) uses AI in key features: Human-CAV Interac­tion, Environment Sensing, Autonomous Motion, CAV to Everything and Motion Actuation.
  5. Mixed Reality Collaborative Spaces (MPAI-MCS) creates AI-enabled mixed-reality spaces populated by streamed objects such as avatars, other objects and sensor data, and their descriptors for use in meetings, education, biomedicine, science, gaming and manufacturing.

MPAI develops data coding standards for applications that have AI as the core enabling technology. Any legal entity supporting the MPAI mission may join MPAI if able to contribute to the development of standards for the efficient use of data.

Visit the MPAI web site and contact the MPAI secretariat for specific information.


Towards Pervasive and Trustworthy Artificial Intelligence

Fifteen months after coming into life and celebrating the successful development of 5 major AI-based data coding standards, MPAI has published the book “Towards Pervasive and Trustworthy Artificial Intelligence: How standards can put a great technology at the service of humankind”.

For the authors, representatives of the articulated MPAI world, it has been quite an effort but rewarded by the sight of the published book, now available online.

In its 110 B5 pages the book offers

  1. A summary of the reasons that have led to the creation of MPAI
  2. An analysis of the promises but also the potential dangers of AI
  3. An overview of the main Machine Learning and Neural Network technologies
  4. An analysis of state of the art of data coding in some of the fields addressed by MPAI:
    1. Speaking humans and machines
    2. Visual humans and machines
    3. Humans conversing with machines
    4. Audio for humans
    5. Video for humans and machines
    6. Data for machines
  5. An introduction to the measures Europe, USA and China are taking to regulate AI
  6. A description of the why we need, how we implement and what are the benefits of the AI Framework
  7. A brief introduction to some of the key applications supported by the first MPAI standards
    1. Conversation with emotion
    2. Conversation about an object
    3. Feature-preserving speech translation
    4. Emotion enhanced speech
    5. Speech restoration system
    6. Audio recording preservation
    7. Enhanced audioconference experience
    8. Company performance prediction
  8. An explanation of what is an MPAI standard, a collection of
    1. Technical Specification
    2. Reference Software
    3. Conformance Testing
    4. Performance Assessment
  9. A presentation of some of the technologies already standardised by MPAI
    1. Emotion
    2. Intention
    3. Meaning
    4. Speech features
    5. Microphone array geometry
    6. Audio scene geometry
  10. Some words about the “fuel” and the “machine” that drives MPAI standardisation
  11. The plan adopted by MPAI to govern its sophisticated ecosystem
  12. MPAI’s views of the vital role of patents and how it can be preserved
  13. An anticipation of the coming MPAI standards
    1. AI-enhanced video coding
    2. End-to-end video coding
    3. Server-based predictive multiplayer gaming
    4. Connected autonomous vehicles
    5. Conversation about a scene
    6. Mixed-reality collaborative spaces
    7. Audio on the go
  14. The 7 impacts MPAI standardisation are expected to have on industry, innovation and users.

What should be expected from this book?

  • For you, an opportunity to benefit yourself from the hard work of some 20 people distilling the most important information generated in the last 15 months by MPAI and also to share the opportunity with friends.
  • For MPAI the opportunity to welcome more of you on board this exciting initiative.
  • For me the unique experience of working with outstanding people in the editing of the book.
  • For all, best wishes for the coming holydays!

 


Connected Autonomous Vehicles in MPAI

For about a year, MPAI has developed Use Cases and Functional Requirements for the Connected Autonomous Vehicles (MPAI-CAV) project, and the document has reach good maturity. MPAI is now publishing the results achieved so far on this and other web sites, on social networks and newsletters.

Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) is an international, unaffiliated, non-profit organisation with the mission is to develop Artificial Intelligence (AI) enabled data coding specifications, with clear Intellectual Property Rights (IPR) licensing frameworks. The scope of data coding includes any instance in which digital data needs to be converted to a different format to suit a specific application. Notable examples are: compression and feature extraction.

MPAI-CAV is an MPAI project seeking to standardise the components required to implement a Connected Autonomous Vehicle (CAV). MPAI defines a CAV as a mechanical system capable of moving autonomously – save for the exceptional intervention of a human – based on the analysis of the data produced by a range of sensors exploring the environment and the information transmitted by other sources in range, e.g., CAVs and roadside units (RSU).

This is the first of several articles reporting about the progress achieved and the developments planned in the MPAI-CAV project.

The current focus is the development of Use Cases and Functional Requirements (MPAI-CAV UCFR), the first step towards publishing a Call for Technologies. Responses to the Call will then enable MPAI to actually develop the MPAI-CAV standard.

There are 3 ways for you to get involved in the MPAI-CAV project:

  1. Monitor the progress of the MPAI-CAV project.
  2. Participate in MPAI-CAV activities (confcall meetings) as a non-member.
  3. Join MPAI as a member.

MPAI invites professionals and researchers to contribute to further developing the MPAI-CAV UCFR.

MPAI-CAV has identified 5 main subsystems of a Connected Autonomous Vehicle, as depicted in Figure 1.

Figure 1 – The 5 MPAI-CAV subsystems

The functions of the individual subsystems can be described as follows:

  1. Human-CAV interaction (HCI) recognises the human CAV rights holder, responds to humans’ commands and queries, provides extended environment representation (Full World Representation) for humans to use, senses human activities during the travel and may activate other Subsystems as required by humans.
  2. Environment Sensing Subsystem (ESS) acquires information from the environment via a variety of sensors and produces a representation of the environment (Basic World Representation), its best guess given the available sensory data.
  3. Autonomous Motion Subsystem (AMS) computes the Route to destination, uses different sources of information – CAV sensors, other CAVs and transmitting units – to produce a Full World Representation and gives commands that drive the CAV to the intended destination.
  4. CAV to Everything Subsystem (V2X) sends/receives information to/from external sources, including other CAVs, other CAV-capable vehicles, Roadside Units (RSU).
  5. Motion Actuation Subsystem (MAS) provides non-electromagnetic information anout the environment¸ receives and actuates motion commands in the environment.

The next publications will deal with

  1. Why an MPAI-CAV standard?
  2. Introduction to MPAI-CAV Subsystems
  3. Human-CAV interaction
  4. Environment Sensing Subsystem
  5. CAV to Everything
  6. Autonomous Motion Subsystem
  7. Motion Actuation Subsystem

For any communication or intention to join MPAI-CAV activities, or any other MPAI standards development activities, send an email to Secretariat (secretariat@mpai.community).


Ten good reasons to join MPAI

  1. MPAI develops standards and Standards accelerate technology exploitation into products.
  2. MPAI standards are Artificial Intelligence-enabled and AI has the highest potential to yield high-performance solutions to data coding.
  3. MPEG has shown that standards should be developed by applying a given technology across the board and MPAI is the only standards organisation doing so for AI-based data coding.
  4. MPEG has shown that industry is well served by increasing accessibility to standards and MPAI makes available Framework Licences to accelerate access to its standards.
  5. MPAI has solid foundations: it has experience and a rigorous standards-development process.
  6. MPAI addresses high-profile data coding areas: AI framework, audio enhancement, human-machine conversation, prediction of company performance, video coding, online gaming, autonomous vehicles and more.
  7. MPAI is productive: in 15 months it has developed 4 standards (governance, human-machine conversation, company performance prediction and AI Framework), by end of year it will complete another standard (audio enhancement) and has 7 more standards in the pipeline.
  8. MPAI standards are viral: products conforming to MPAI standards are already present on the market.
  9. MPAI develops methods to assess the level of conformance and reliability of standard implementations, including methods ensuring that implementations are bias-free.
  10. MPAI plans on establishing the MPAI Store, a not-for-profit commercial organisation with the task to test implementations for security and conformance, and verify they are bias-free

Join the fun, build the future!


Ten good reasons to join MPAI

  1. MPAI develops standards, unique means to accelerate exploitation of technology in products.
  2. MPAI standards are Artificial Intelligence-enabled and AI has the highest potential to yield high-performance solutions to data coding.
  3. MPEG has shown that standards should be developed by applying technology across the board and MPAI is the only standards organisation doing so for AI-based data coding.
  4. MPEG has shown that industry is well served by increasing accessibility to standards and MPAI makes available Framework Licences to accelerate access to its standards.
  5. MPAI has solid foundations: it has experience and a rigorous standards-development process.
  6. MPAI addresses high-profile data coding areas: AI framework, audio enhancement, human-machine conversation, prediction of company performance, video coding, online gaming, autonomous vehicles and more.
  7. MPAI is productive: in 15 months it has developed 4 standards (governance, human-machine conversation, company performance prediction and AI Framework), by end of year it will complete another standard (audio enhancement) and has 7 more standards in the pipeline.
  8. MPAI standards are viral: products conforming to MPAI standards are already present on the market.
  9. MPAI develops methods to assess the level of conformance and reliability of standard implementations, including methods ensuring that implementations are bias-free.
  10. MPAI plans on establishing the MPAI Store, a not-for-profit commercial organisation with the task to test implementations for security and conformance, and verify they are bias-free.

    Join the fun, build the future!