Moving Picture, Audio and Data Coding
by Artificial Intelligence

MPAI-MMC to be adopted as IEEE standard

On the day MPAI Multimodal Conversation (MPAI-MMC) reached its full 6 months since its approval, the IEEE hosted the kick-off meeting of the P3300 working group tasked with the adoption of the MPAI technical specification as an IEEE standard. Earlier, MPAI and IEEE had signed an agreement whereby MPAI grants IEEE the right to publish MPAI-MMC as an IEEE standard.

At its first meeting, the WG has approved the working draft of IEEE 3300 and requested IEEE to ballot the WD. In a couple of months, MPAI-MMC is expected to become IEEE 3300.

The creation of the WG and the development of the IEEE 3300 standard are the natural steps following the issuance of the Call for Patent Pool Administrator by the MPAI-MMC patent holders. The next step will be the development of the Use Cases and Functional Requirements for MPAI-MMC Version 2 that MMC-DC and other groups are busy preparing.

The IEEE 3300 WD is verbatim MPAI-MMC, so this article is a good opportunity to recall the MPAI document and its structure. If you want to follow this description with the actual text, please download it.

Chapter 1 is an informative introduction to MPAI, the AI Framework (MPAI-AIF) approach to AI data coding standards including the notion of AI Modules (AIM) organised in and AI Workflow executed in the AI Framework (AIF), and the governance of the MPAI ecosystem.

Chapter 2 is a functional specification of the 5 use cases:

“Conversation with Emotion” (CWE):  a human is holding an audio-visual conversation with a machine impersonated by a synthetic voice and an animated face. Both the human and the machine express emotion.
“Multimodal Question Answering” (MQA): a human is holding an audio-visual conversation with a machine impersonated by a synthetic voice. The human asks a question about an object held in thei hand.
Three Uses Cases supporting conversational translation applications. In each Use Case, users can specify whether speech or text is used as input and, if it is speech, whether their speech features are preserved in the interpreted speech:

– “Unidirectional Speech Translation” (UST).
– “Bidirectional Speech Translation” (BST).
– “One-to-Many Speech Translation” (MST).

Chapter 3 contains definitions of terms that are specific to MPAI-MMC.

Chapter 4 contains normative and information references.

Chapter 5 contains the specification of the 5 use cases. For each of them, the following is specified:

  1. The Scope of the Use Case
  2. The syntax and semantics of the data entering and leaving the AIW
  3. The Architecture of AIMs composing the AIW implementing the Use Case
  4. The functions of the AIMs
  5. The JSON Metadata describing the AIW

Chapter 6 contains the specification of all the AIMs of all the Use Cases:

  1. A note about the meaning of AIM interoperability
  2. The syntax and semantics of the data entering and leaving all the AIMs of the 5 AIWs
  3. The formats of all the AIM data

Annex 1 defines the terms not specific to MPAI-MMC

Annex 2 contains notices and disclaimers concerning MPAI standards (informative)

Annex 3 provides a brief introduction to the Governance of the MPAI Ecosystem (informative)

Annex 4 and the following annexes provide the AIW and AIM metadata of all MPAI-MMC Use Cases.

MPAI-MMC is just the initial step. Two more MPAI Technical Specifications have been submitted for adoption: AI Framework (MPAI-AIF) and Context-based Audio Enhancement.

MPAI is looking forward to a mutually beneficial collaboration with IEEE.


MPAI issues a Call for Patent Pool Administrator on behalf of the MPAI-CAE and MPAI-MMC patent holders

Geneva, Switzerland – 23 March 2022. Today the international, non-profit, unaffiliated Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) standards developing organisation has concluded its 18th General Assembly. Among the outcomes is the publication of Call for Patent Pool Administrators for two of its approved Technical Specifications.

The MPAI process of standard development prescribes that Active Principal Members, i.e., those intending to participate in the development of a Technical Specification, adopt a Framework Licence before initiating the development. All those contributing to the work are requested to accept the Framework Licence. If they are not Members, they are requested to join MPAI. Once a Technical Specification is approved, MPAI identifies patent holders and facilitates the creation of a patent pool.

Patent holders of Context-based Audio Enhancement (MPAI-CAE) and Multimodal Conversation (MPAI-MMC) have agreed to issue a Call for Patent Pool Administrator and have asked MPAI to publish the call on its website. The Patent Holders expect to work with the selected Entity to facilitate a licensing program that responds to the requirements of the licensees while ensuring the commercial viability of the program. In the future, the coverage of the patent pool may be extended to new versions of MPAI-CAE and MPAI-MMC, and/or other MPAI standards.

Parties interested in being selected as Entity are requested to communicate, no later than 1 May 2022, their interest and provide appropriate material as a qualification to the MPAI Secretariat. The Secretariat will forward the received material to the Patent Holders.

While Version 1 of MPAI-CAE and MPAI-MMC are progressing toward practical deployment, work is ongoing to develop Use Cases and Functional Requirements of MPAI-CAE and MPAI-MMC V2. These will extend the V1 technologies to support new use cases, i.e.,

  1. Conversation about a Scene (CAS), enabling a human holds a conversation with a machine on the objects in a scene.
  2. Human to Connected Autonomous Vehicle Interaction (HCI), enabling humans to have rich interaction, including question answering and conversation with a Connected Autonomous Vehicle (CAV).
  3. Mixed-reality Collaborative Spaces (MCS), enabling humans to develop collaborative activities in a Mixed-Reality space via their avatars.

MPAI develops data coding standards for applications that have AI as the core enabling technology. Any legal entity supporting the MPAI mission may join MPAI, if able to contribute to the development of standards for the efficient use of data.

MPAI is currently engaged in extending some of the already approved standards and developing other 9 standards (those in italic in the list below).

Name of standard Acronym Brief description
AI Framework MPAI-AIF Specifies an infrastructure enabling the execution of implementations and access to the MPAI Store.
Context-based Audio Enhancement MPAI-CAE Improves the user experience of audio-related applications in a variety of contexts.
Compression and Understanding of Industrial Data MPAI-CUI Predicts the company performance from governance, financial, and risk data.
Governance of the MPAI Ecosystem MPAI-GME Establishes the rules governing the submission of and access to interoperable implementations.
Multimodal Conversation MPAI-MMC Enables human-machine conversation emulating human-human conversation.
Server-based Predictive Multiplayer Gaming MPAI-SPG Trains a network to com­pensate data losses and detects false data in online multiplayer gaming.
AI-Enhanced Video Coding MPAI-EVC Improves existing video coding with AI tools for short-to-medium term applications.
End-to-End Video Coding MPAI-EEV Explores the promising area of AI-based “end-to-end” video coding for longer-term applications.
Connected Autonomous Vehicles MPAI-CAV Specifies components for Environment Sensing, Autonomous Motion, and Motion Actuation.
Avatar Representation and Animation MPAI-ARA Specifies descriptors of avatars impersonating real humans.
Neural Network Watermarking MPAI-NNW Measures the impact of adding ownership and licensing information in models and inferences.
Integrative Genomic/Sensor Analysis MPAI-GSA Compresses high-throughput experiments data combining genomic/proteomic and other.
Mixed-reality Collaborative Spaces MPAI-MCS Supports collaboration of humans represented by avatars in virtual-reality spaces called Ambients
Visual Object and Scene Description MPAI-OSD Describes objects and their attributes in a scene and the semantic description of the objects.

Visit the MPAI website, contact the MPAI secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook, Instagram, and YouTube.

Most importantly: join MPAI, share the fun, build the future.

 

 


What is the probability a company defaults?

A definite answer to such a question is noto going to come anythime soon, and many have attempted to develop algorithms providing an answer.

MPAI could not miss the opportunity to give its own answer and has published the MPAI-CUI standard, or Compression and Understand of Financial Data. This contains one use case: Company Performance Prediction (CPP).

What does CUI-CPP offer? Imagine that you have a company and you would like to know what is the probability that your company defaults in the next, say, 5 years. The future is not written, but it certainly depends on how your company has performed in the last few years and on the solidity of your company’s governance.

Another element that you would like to know is what is the probability that your company suspends its operations because an unexpected event such as a cyber attack or an earthquake has happened. Finally, you should probably also want to know how adequate is the organisation of your company (but many don’t want to be told ;-).

CUI-CPP needs financial statements, governance data, and risk data as input. The first thing the standard does is to compute the financial and governance features from the financial statements and governance data. These features are fed to a neural network that has been trained with many company data and provides the default probability and the organisational model adequacy index (0=inadequate, 1=adequate).

Then CUI-CPP computes a risk matrix based on cyber and seismic risks and uses this to perturb the default probability and obtain the business discontinuity probability.

The full process is described in Figure 1.

Figure 1 – Reference Model of Company Performance Prediction

Many think that once the technical specification is developed, the work is over. Well, that would be like saying that, after you have produced a law, society can develop. That is not the case, because society needs tribunals to assess whether a deed performed by an individual conforms with the law.

Therefore, MPAI standards do not just contain the technical specification, i.e., “the minimum you must know to make an implementation of the standard”, but 4 components in total, of which the technical specification is the first.

The second is the reference software specification, a text describing how to use a software implementation that lets you understand “how the standard works”. Reference software is an ingredient of many standards and many require that the reference software be open source.

MPAI is in an unusual position in the sense that it does not specify the internals of the AI Modules (AIM), but only their function, interfaces, and connections with the other AIMs that make up an AI Workflow (AIW) with a specified function. Therefore, MPAI does not oblige the developers of an MPAI standard to provide the source code of an AIM, the may provide a compiled AIM (of course, they are welcome if they do, more so if the AIM has a high performance).

In the case of CUI-CPP, all AIMs are provided as open-source software, including the neural network called Prediction. Of course, you should expect that the reference software “demonstrates” how the standard works, not that it makes very accurate predictions of a particular company

The third component is conformance testing. Many standards do not distinguish between “how to make a thing” (the technical specification, i.e., the law) from “how to test that a thing is correctly implemented” (the conformance testing specification, i.e., the tribunal). MPAI provides a Conformance Testing Specifcitation that specifies the “Means”, i.e.:

  1. The Conformance Testing Datasets and/or the methods to generate them,
  2. The Tools, and
  3. The Procedures

to verify that the AIMs and/or the AIW of a Use Case of a Technical Specification:

  1. Produce data whose semantics and format conform with the Normative clauses of the selected Use Case of the Technical Specification, and
  2. Provide a user experience level equal to or greater than the level specified by the Conformance Testing Specification.

In the case of CUI-CPP, MPAI provides test vectors, and the Technical Specification specifies the tolerance of the output vectors produced by an implementation of the AIMs or AIW when fed with the test vectors.

The fourth component is the performance assessment to verify that an implementation is not just “technically correct”, but also “reliable, robust, fair and replicable”. Essentially, this is about assessing that an implementation has been correctly trained, i.e., it is not biased.

The performance assessment specification provides the Means, i.e.:

  1. The methods to generate the Performance Testing Datasets,
  2. The Tools, and
  3. The Procedures

to verify that the training of the Prediction AIM (the only one that it makes sense to implement with a neural network) is not biased against some geographic locations and industry types (service, public, commerce, and manufacturing).

The CUI-CPP performance assessment specification assumes that there are two performance assessment datasets:

  1. Dataset #1 not containing geographic location and industry type information.
  2. Dataset #2 containing geographic location and industry type information.

The performance of an implementation is assessed by applying the following procedure:

  1. For each company compute:
    1. The Default Probabilities of all records in Dataset #1 and in Dataset #2.
    2. The Organisational Model Index of all records in Dataset #1 and in Dataset #2.
  2. Verify that the average of the differences of all
    1. Default Probabilities in 1.a is < 2%.
    2. Organisational Model Indices in 1.b is < 2%.

If both 2.a and 2.b are verified, the implementation passes the performance assessment.


Watermarking, Intellectual Property and Neural Networks

Watermarking has been used for a long time. One of its uses in the physical world is paper money where a hard to imitate watermark assures users that a banknote is authentic.

In the digital domain, watermarking can be used to carry information about ownership in a file or stream. The Secure Digital Music Initiative (SDMI) selected a strong (i.e., hard to remove) digital watermark to identify an MP3 soundtrack that had been released “after” and attempted to define a weak (i.e., easy to remove) watermark.

Neural networks are a high-priority topic in MPAI. Is there a reason why MPAI should be concerned with watermarking? The answer is yes, and the reason is that developing neural networks may be a very costly undertaking, e.g., several tens of thousand USD and developers may indeed want to identify that a neural network is theirs.

MPAI has begun to investigate two related but distinct issues: watermarking for neural networks and watermarking for the data produced by a neural network fed with data and generating inference.

By using a specific watermarking technology, the neural network creator can claim that a particular neural network instance:

  1. Has been produced by the them.
  2. Is a derivative of their network.
  3. Has been modified in a particular part of the network.

A related story applies to the inferences. The inference of a neural network can also be watermarked. The purpose is not necessarily that of protecting the creator or a licensee of a neural network. The end user of a neural network may need to be assured that an inference has been produced by the intended network.

So, what is MPAI actually doing in this field? The MPAI Neural Network Watermarking (NNW) project is developing requirements for a future MPAI standard with the goal to measure, for a given size of the watermarking payload:

  1. The impact on the performance of the neural network caused by adding a watermark to a neural network.
  2. The resistance of the watermark to modifications, e.g., caused by transfer learning, pruning of the weights etc.
  3. The cost of watermark injection because a neural network may be very large and adding a watermark costs time and processing.

Read The MPAI Neural Network Watermarking (NNW) project for more details.

If you wish to participate in this work you have the following options:

  1. Join MPAI
  2. Participate until the MPAI-NNW Functional Requirements are approved (after that only MPAI members may participate) by sending an email to the MPAI Secretariat.

MPAI talks to industry

Those who have been following this blog should already know quite a few things about MPAI. It is fair to say, however, one cannot really be sure say to know everything about any topic, especially one as vast and articulated as MPAI. That is why I recommend you to watch a series of videos where you can see some of the major MPAI players introducing some of the areas MPAI is engaged in. The daunting side is that there are some 3 hours of videos in total :-(, but the relaxing side is that the videos are organised as a playlist :-).

To save you a click, here is the list of videos with a guide to each entry.

# Title Speaker Country
1 Introduction to MPAI Leonardo CH
2 MPAI-AIF – AI Framework Andrea IT
3 MPAI-CAE – Context-based Audio Enhancement Marina US
4 MPAI-MMC – Multimodal Conversation Miran KR
5 MPAI-CUI – Compression and Understanding if Industrial Data Guido IT
6 Reference Software, Conformance and Performance Panos UK
7 MPAI-GME – Governance of the MPAI Ecosystem Paolo UK
8 MPAI-SPG – Server-based Predictive Multiplayer Gaming Marco IT
9 MPAI-EVC – AI-Enhanced Video Coding Roberto IT
10 MPAI-EEV – AI-based End-to-End Video Coding Chuanmin CN
11 MPAI-CAV – Connected Autonomous Vehicles Gianluca IT
12 Conclusions Leonardo CH

Introduction to MPAI made by Leonardo introduces the 4 pillars on which MPAI works: standards development process,, setting IPR Guidelines before developing a standard, component-based AI-based data coding standards and Governance of the AI Ecosystem. A very short introduction of the 5 standards and of some of the current projects follows.

Andrea introduces MPAI-AIF – AI Framework, the standard that sets the context of all MPAI application standards: AI Modules (AIM), AI Workflow (AIW) and, indeed, AI Framework (AIF) and how and AIF implementation can execute AI applications.

Marina introduces MPAI-CAE – Context-based Audio Enhancement, the standard collecting 4 use cases where AI enhances audio to offer an improved user experience.

Miran introduces MPAI-MMC – Multimodal Conversation, the standard collecting 5 use cases where conversation between human and machine is enhanced beyond the traditional speech recognition/synthesis by adding emotion and by representing an avtar whose speech and face are enhanced by emotion.

MPAI-CUI – Compression and Understanding if Industrial Data, presented by Guido, introduces the Company Performance Prediction Use Case where a machine containing a critically important neural network receives financial, organisational and risk data of a company and predicts the company’s probability of default and business continuity, and assigns a governance adequacy index to the company.

Panos presents the foundational notions of Reference Software, Conformance and Performance. Reference Software and Conformance are established notions with a specific variations made by MPAI. Performance is the entirely new notion in standards introduced because of the particular nature of artificial intelligence.

Paolo introduces MPAI-GME – Governance of the MPAI Ecosystem, a foundational standard strictly connected with the preceding presentation because it lays down the rules that govern submission of and access to MPAI standard implementations with attributes of Reliability, Robustness, Replicability and Fairness.

Marco presents MPAI-SPG – Server-based Predictive Multiplayer Gaming a project seeking to mitigate discontinuities caused by high latency or packet losses in online gaming and to detect fake data sent by game players to get an unfair advantage.

Roberto presents MPAI-EEV – AI-based End-to-End Video Coding, a video compression project for a standard that substantially enhances the performance of a traditional video codec by improving or replacing traditional tools with AI-based tools.

Chuanmin describes MPAI-EVC – AI-Enhanced Video Coding, a project seeking to compress video by exploiting AI-based data coding technologies (so called end-to-end coding) without constraints, as there are for MPAI-EVC, by how data processing technologies have been traditionally applied to video coding.

Gianluca walks you through MPAI-CAV – Connected Autonomous Vehicles, project aiming at standardising all IT components required to implement a Connected Autonomous Vehicle (CAV), i.e., a system capable of moving autonomously based on the data from a range of sensors exploring the environment, and the information transmitted by other sources in range, e.g., CAVs

Finally, presenting the Conclusions Leonardo summarises the gist of the presentation and invites you to join MPAI, share the fun, build the future.

 


MPAI outlines plans for the MPAI Store Foundation

Geneva, Switzerland – 23 February 2022. Today the Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) standards developing organisation has concluded its 17th General Assembly. Among the outcomes are: progress towards the establishment of a patent pool for its published standards and a roadmap to establish the MPAI Store Foundation.

The MPAI Statutes define a standard development process whereby holders of standard essential patents (SEP) select their preferred patent pool administrator. The General Assembly was informed that SEP holders in approved MPAI standards are currently engaged in this activity.

The Governance of the MPAI Ecosystem (MPAI-GME) standard envisions an “MPAI Store” tasked with receiving submissions of implementations, verifying their security and conformance, and making them available to other implementers and consumers. Because of the specific characteristics of AI technologies, the MPAI Store coordinates with MPAI-appointed performance assessors that guarantee that implementations are reliable, robust, replicable and fair. The MPAI Store will be a not-for-profit commercial entity where both MPAI members and associations representing the society at large are present.

The General Assembly approved a set of documents guiding the development of use cases and functional requirements for Neural Network Watermarking and the publication of a series of short videos with the title “MPAI talks to industry” illustrating the various aspects of MPAI activities.

MPAI develops data coding standards for applications that have AI as the core enabling technology. Any legal entity supporting the MPAI mission may join MPAI, if able to contribute to the development of standards for the efficient use of data.

MPAI is currently engaged in extending some of the already approved standards and developing other 9 standards (those in italic in the list below).

Name of standard Acronym Brief description
AI Framework MPAI-AIF Specifies an infrastructure enabling execution of implementations and access to the MPAI Store.
Context-based Audio Enhancement MPAI-CAE Improves the user experience of audio-related applications in a variety of contexts.
Multimodal Conversation MPAI-MMC Enables human-machine conversation emulating human-human conversation.
Compression and Understanding of Industrial Data MPAI-CUI Predicts the company performance from governance, financial and risk data.
Governance of the MPAI Ecosystem MPAI-GME Establishes the rules governing submission of and access to interoperable implementations.
Server-based Predictive Multiplayer Gaming MPAI-SPG Trains a network to com­pensate data losses and detects false data in online multiplayer gaming.
AI-Enhanced Video Coding MPAI-EVC Improves existing video coding with AI tools for short-to-medium term applications.
End-to-End Video Coding MPAI-EEV Explores the promising area of AI-based “end-to-end” video coding for longer-term applications.
Connected Autonomous Vehicles MPAI-CAV Specifies components for Environment Sensing, Autonomous Motion, and Motion Actuation.
Avatar Representation and Animation MPAI-ARA Specifies descriptors of avatar impersonating real humans.
Neural Network Watermarking MPAI-NNW Measures the impact of adding ownership and licensing information in models and inferences.
Integrative Genomic/Sensor Analysis MPAI-GSA Compresses high-throughput experiments data combining genomic/proteomic and other.
Mixed-reality Collaborative Spaces MPAI-MCS Supports collaboration of humans represented by avatars in virtual-reality spaces called Ambients
Visual Object and Scene Description MPAI-OSD Describes objects and their attributes in a visual scene and the semantic description of the objects.

Visit the MPAI web site, contact the MPAI secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedInTwitterFacebook , Instagram and YouTube.

Most important: join MPAI, share the fun, build the future.


Conclusions

This is a small book talking about a big adventure: standards for the most sensitive objects of all – data – using the most prominent technology of all – AI – for pervasive and trustworthy use by billions of people. At the end of this book, it is thus appropriate to assess what the authors think will be the likely impact of MPAI on industry and society.

The first impact will be the availability of standards for a technology that is best used to transform the data but has not seen any so far. Standards that are driven by the same principles that guided another great adventure – MPEG – that replaced standards meant to be exclusively used by certain countries or industries with standards serving humankind.

The second impact will be a right that used to be taken for granted by implementers but has ceased to be a right some time ago. An implementer wishing to use a published standard should be allowed to do so, of course after remunerating those who invested money and talent to produce the technology enabling the standard.

The third impact is a direct consequence of the preceding two. In the mid-18th century, trade did not develop as it could because feudal traditions allowed petty lords to erect barriers to trade for the sake of a few livres or pounds. Today we do not live in a feudal age, but we still see petty lords here and there obstructing progress for the sake of a few dollars.

The fourth impact is the mirror of the third. An industry freed from shackles, with access to global AI-based data coding standards and operating in an open competitive market will be able to churn out interoperable AI-based products, services, and applications in response to consumer needs which are known today and the many more which are not yet known.

The fifth impact is a direct consequence of the fourth. An industry using sophisticated technologies such as AI and forced to be maximally competitive will have a need to foster an accelerated progress of those technologies. We can confidently look forward to a new spring of research and advancement of science in a field to which today it is difficult to place boundaries.

The sixth impact will be caused by MPAI’s practical Performance Assessor based solution to the concerns of many: AI technologies are as potentially harmful to humankind as they are powerful. The ability of AI technologies to hold vast knowledge without simple means for users to check how representative of the world they are – when they are used to handle information and possibly make decisions – opens our minds to apocalyptic scenarios.

The seventh impact is speculative, but no less important. The idea of intelligent machines able to deal with humans has always attracted the intellectual interest of writers. Machines dealing with humans are no longer speculations but facts. As objects embedding AI – physical and virtual – increase their ramification into our lives, more issues than “Performance” will come to the surface and will have to be addressed. MPAI, with its holistic view of AI as the technology enabling a universal data representation, proposes itself as the body where such issues as enabled by progress of technology can be addressed and ways forward found.

Figure 36 – The expected MPAI impacts

The results achieved by MPAI in 15 months of activity and the plans laid down for the future demonstrate that the seven impacts identified above are not just wishful thinking. MPAI invites people of good will to join and make the potential real.


With 5 standards approved, MPAI enters a new phase

Geneva, Switzerland – 26 January 2022. Today the Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) standards developing organisation has concluded its 16th General Assembly, the first of 2022, approving its 2022 work program.

The work program includes the development of reference software, conformance testing and performance assessment for 2 application standards (Context-based Audio Enhancement and Multimodal Conversation), reference software, conformance assessment for 1 infrastructure standard (AI Framework), and the establishment of the MPAI Store, a non-profit foundation with the mission to distribute verified implementations of MPAI standards, as specified in another MPAI infrastructure standard (Governance of the MPAI Ecosystem).

An important part of the work program addresses the development of performance assessment specifications for the 2 application standards. The purpose of performance assessment is to enable MPAI-appointed entities to assess the grade of reliability, robustness, replicability and fairness of implementations. While performance will not be mandatory for an implementation to be posted to the MPAI Store, users downloading an implementation will be informed of its status.

Another section of the work program concerns the development of extensions of existing standards. Company Performance Prediction (part of Compression and Understanding of Industrial Data) will include more risks in addition to seismic and cyber; Multimodal Conversation will enhance the features of some of its use cases, e.g., by applying them to the interaction of a human with a connected autonomous vehicle; and Context-based Audio Enhancement will enter the domain of separation of useful sounds from the environment.

An important part of the work program is assigned to developing new standards for the areas that have been explored in the last few months, such as:

  1. Server-based Predictive Multiplayer Gaming (MPAI-SPG) using AI to train a network that com­pensates data losses and detects false data in online multiplayer gaming.
  2. AI-Enhanced Video Coding (MPAI-EVC) improving existing video coding with AI tools for short-to-medium term applications.
  3. End-to-End Video Coding (MPAI-EEV) exploring on the promising area of AI-based “end-to-end” video coding for longer-term applications.
  4. Connected Autonomous Vehicles (MPAI-CAV) using AI for such features as Environment Sensing, Autonomous Motion, and Motion Actuation.

Finally, MPAI welcomes new activities proposed by its members to its work program:

  1. Avatar Representation and Animation (MPAI-ARA) targeting the specification of avatar descriptors.
  2. Neural Network Watermarking (MPAI-NNW) developing measures of the impact of adding ownership and licensing information inside a neural network.

MPAI develops data coding standards for applications that have AI as the core enabling technology. Any legal entity supporting the MPAI mission may join MPAI, if able to contribute to the development of standards for the efficient use of data.

Visit the MPAI web site, contact the MPAI secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook , Instagram and YouTube.

Most important: join MPAI, share the fun, build the future.


Digital humans and MPAI

“Digital human”  has recently become a trendy expression and different meanings can be attached to it. MPAI says that it is “a digital object able to receive text/audio/video/commands (“Information”) and generate Information that is congruent with the received Information”.

MPAI has been developing several standards for “digital humans” and plans on extending them and developing more.

Let’s have an overview.

In Conversation with Emotion a digital human perceives text or speech from and video of a human. It then generates text or speech that is congruent with content and emotion of the perceived media, and displays itself as an avatar whose lips move in sync with its speech and according to the emotion embedded in the synthetic speech.

In Multimodal Question Answering a digital human perceives text or speech from a human asking a question about an object held by the human, and the video of the human holding the object. In response it generates text or speech that is a response to the human question and is congruent with the perceived media data including the emotional state of the human.

Adding an avatar whose lips move in sync with the generated speech could have a more satisfactory rendering of the speech generated by the digital human.

In Automatic Speech Translation a digital human is told to translate speech or text generated by the human into a specified language and to preserve or not the speech features of the input speech, in case the input is speech. The digital human then generates translated text, if the input is text and translated speech preserving or not the input speech features, if the input is speech.

Adding an avatar whose lips move in sync with the generated speech and according to its embedded emotion could have a more satisfactory rendering of the digital human speech.

In Emotion Enhanced Speech a digital human is told to add an emotion to an emotion-less speech by giving

  1. A model utterance: the digital human extracts and adds the speech features of the model utterance to the emotion-less speech segment.
  2. An emotion taken from the MPAI standard list of emotions: the digital human adds the speech features obtained by combining the speech features proper of the selected emotion to the speech features of the emotion-less speech to the emotion-less speech.

In both cases an avatar can be animated by the emotion-enhanced speech.

MPAI has more digital human use cases under development:

  1. Human-CAV Interaction: A digital human (a face) speaks to a group of humans gazing at the human it is responding to.

  1. Mixed-reality Collaborative Spaces: A digital human (a torso) utters the speech of a participant in a virtual videoconference while its torso moves in sync with the participant’s torso.

  1. Conversation about a scene: A digital human (a face) converses with a human about the objects of a scene the human is part of gazing at the human or at an object.


What does MPAI do in a week?

The year 2021 was very productive for MPAI. In January it started with a Call for Technologies on AI Framework and ended in December with 5standards approved and 7 projects in the pipeline.

How was that possible? Simple: intense collaborative work.

OK, but exactly how?

So far MPAI has not held physical meetings. MPAI does all current work online in 1, 2 or 3 one-hour sessions a day in the 13-18 UTC time frame. Purpose of this post is to describe how an MPAI work week unfolds. All times are UTC. All meetings last 1 hour.

Monday @15Mixed-reality Collaborative Spaces (MCS) is a project finalising the MCS Use Cases and Functional Requirements document. The use cases considered are “Avatar-Based Videoconference” and “Virtual eLearning” where avatars with levels of similarity with the person they represent hold meetings or assist to lectures. Extension of avatars to volumetric video is being considered. To know more, visit mcs.mpai.community.

Monday @16Governance of the MPAI Ecosystem (GME) is the name of an MPAI standard developed, approved and published by MPAI. It envisages the establishment of the MPAI Store, a non-profit commercial organisation whose role is to receive implementations of MPAI standards, verify they are secure, test that they are conforming with MPAI specifications, possibly collect the results proving that the implementation is reliable, robust, replicable and fair – attributes that MPAI labels as Performance – and post the implementation on the MPAI Store web site for users to download. MPAI is moving to the implementation of the MPAI Store. To know more, visit gme.mpai.community.

Monday @17AI Framework (AIF) is the name of an MPAI standard developed, approved and published by MPAI. MPAI is now developing a software implementation of the standard. To know more, visit aif.mpai.community.

Tuesday @14Multimodal Conversation (MMC) is the name of an MPAI standard developed, approved, and published by MPAI. MPAI is now developing software implementations of the four MPAI-MMC use cases, conformance testing and performance assessment and extending the current V1 specification to support more use cases. To know more, visit mmc.mpai.community.

Tuesday @16Context-based Audio Enhancement is the name of an MPAI standard developed, approved, and published by MPAI. MPAI is now developing software implementations of the four MPAI-MMC use cases, conformance testing and performance assessment and extending the current V1 specification to support more use cases. To know more, visit cae.mpai.community.

Wednesday @13Connected Autonomous Vehicles (CAV) is a project finalising the CAV Use Cases and Functional Requirements document. The MPAI CAV is composed of 4 technology-laden subsystems. To know more, visit cav.mpai.community.

Wednesday @15AI-Enhanced Video Coding (EVC) is a project investigating the enhancement or replacement of existing video coding tools in the MPEG-5 EVC codec with AI tools. When MPAI will reach an improvement of 25%, a Call for Technologies will be issued. To know more, visit evc.mpai.community. On alternate week, the session is taken by End-to-End Video Coding (EEV), a project addressing video coding without the constraint of traditional video coding architectures. To know more, visit eev.mpai.community.

Wednesday @16Compression and Understanding of Industrial Data (CUI) is the name of an MPAI standard developed, approved, and published by MPAI. As all 4 components of an MPAI standard – technical specification, reference software, conformance testing and performance assessment have been published, the group is preparing for a new version of the standard that includes additional risks to seismic and cyber.

Thursday @14Server-based Predictive Multiplayer Gaming (SPG) is a project completing the validation of the SPG model. Once the validation is completed, it will finalise the SPG Use Cases and Functional Requirements. To know more, visit spg.mpai.community.

In addition to these technical meeting sessions, MPAI holds a General Assembly to discuss and ratify any proposed results from the technical groups.

The Board of Directors typically meets twice between two General Assemblies. Two advisory groups hold meetings:

Thursday @15Communication Advisory Committee manages the manifold MPAI communication activities. Some of them are press releases, newsletter, online presentations and social media.

Friday @15Industry and Standards Advisory Committee manages the relationships of MPAI with all external entities that are relevant for MPAI activities.

A lot has been happening in MPAI, and a lot is now happening so that a lot can happen next outside MPAI.