Moving Picture, Audio and Data Coding
by Artificial Intelligence

All posts by Leonardo Chiariglione

Making sure that AI is “good” AI

AI has generated easy enthusiasms but also fears. The narrative that has developed sees in the development of AI more a potentially dystopian machine-ruled future than a tool potentially capable to improve the weel being of humanity.

Indeed, some AI technologies hold the potential to transform our society in a disruptive way. That possibility must be kept in check if we want to avoid potentially serious problems. Just think of video deep fakes, but also of the possibility of advanced linguistic models such as GPT-3 to generate ethically questionable outcomes.

One problem in this is that AI is a new technology and its limitations and problems are sometimes difficult to understand and evaluate. This is illustrated by the frequent identification of training bias or vulnerabilities which could have disastrous impacts in systems that are mission-critical or make sensitive decisions.

The MPAI Technical Specification called “Governance of the MPAI Ecosystem (MPAI-GME)” (see here) deals with these issues. To address a problem, however, you first need to identify it. MPAI does that by defining the Performance of an Implementation of an MPAI standard as a collection of 4 attributes:

  1. Reliability: Implementation performs as specified by the standard, profile and version the Implementation refers to, e.g., within the application scope, stated limitations, and for the period of time specified by the Implementer.
  2. Robustness: the ability of the Implementation to cope with data outside of the stated application scope with an estimated degree of confidence.
  3. Replicability: the assessment made by an entity can be replicated, within an agreed level, by another entity.
  4. Fairness: the training set and/or network is open to testing for bias and unanticipated results so that the extent of system applicability can be assessed.

MPAI defines the figure of “Performance Assessors”  who are mandated to assess how much an Implementation possesses the Performance attributes.

Who can be a Performance Assessor? A testing laboratory, a qualified company and even an Implementer. In the last case, an Implementer may not Assess the Performance of its Implementations. A Performance Assessor is appointed for a particular domain and for an indefinite duration and may charge Implementers for its services. However, MPAI can revoke the appointment.

In making its assessments, an MPAI Assessor is guided by the Performance Assessment Specification (PAS), the fourth component of an MPAI Standard. A PAS specifies the characteristics of the procedure, the tools and the datasets used by an Assessor when assessing the Performance of an Implementation.

MPAI has developed the PAS of the Compression and Understanding of Industrial Data Standard (MPAI-CUI). MPAI-CUI can predict the default and business discontinuity probability, and the adequacy of the organisational model of a company in a given prediction horizon using governance and financial statement data, and the assessment of cyber and seismic risk. Of course, the outlook of a company depends on more risks than cyber and seismic, but the standard in its current form takes only these risk into account.

The figure below gives the referencemodelof the standard.

Let’s see what the MPAI-CUI PAS actually says.

A Performance Assessor shall assess the Performance of an Implementation using a dataset satisfying the following requirement:

  1. The turnover of the companies used to create the dataset shall be between 1 M$ and 50 M$.
  2. The Financial Statements used shall cover 5 consecutive years.
  3. The last year of the Financial Statements and Governance data shall be the year the Performance is assessed.
  4. No Financial Statements, Governance data and no risk data shall be missing.

and the assessment process shall be carried out in 3 steps, as follows:

  1. Compute the Default Probability for each company in the dataset that
    1. Includes geographic location and industry types.
    2. Does not include geographic location and industry types.
  2. Compute the Organisational Model Index for each company in the dataset that
    1. Includes geographic location and industry types.
    2. Does not include geographic location and industry types.
  3. Verify that the average
    1. Default Probabilities for 1.a. and 1.b. do not differ by more than 2%.
    2. Organisational Model Index for 2.a. and 2.b. does not differ by more than 2%.

The MPAI Store will use the result of the Performance Assessment to label an Implementation.

Although very specific to an application, the example provided in this article gives a sufficient indication that the governance of the MPAI ecosystem has been designed to provide a practical solution to a difficult problem that risks depriving humankind of a potentially good technology.

If only we can separate the wheat from the chaff.


Patent pool being formed for four MPAI standards

Geneva, Switzerland – 18 May 2022. Today the international, non-profit, unaffiliated Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) standards developing organisation has concluded its 20th General Assembly. Among the outcomes is the communication that a patent pool will soon be established for four of its standards: AI Framework (MPAI-AIF), Context-Based Audio Enhancement (MPAI-CAE), Compression and Understanding of Industrial Data (MPAI-CUI) and Multimodal Conversation (MPAI-MMC).

The four standards have been developed based on the MPAI process;

  1. Between December 2020 and March 2021. MPAI issues 4 Calls for Technologies each referring to two documents – Functional Requirements and Framework Licence.
  2. In the September to December 2021 time frame, the 4 standards have been approved.
  3. In December 2021 and January 2022, the MPAI Secretariat requested its members to declare whether they believed holding patents essential to the four standards.
  4. In March 2022, the Secretariat issued a Call for Patent Pool Administrators on behalf of the identified patent holders.
  5. In May 2022, the result of the Call was communicated to patent holders.

According to the MPAI process, the patent holders will select the patent pool administrator with a majority of 2/3 of the patent holders’ votes. The licences shall have a total cost comparable with the total cost of similar technologies and be released not after products are on the market.

MPAI is developing the Calls for Technologies with associated Functional Requirements and Framework Licences for Version 2 of the MPAI-AIF and MPAI-MMC standards, planning to publish the Calls on 19 July 2022. The definition of the terms of the Framework Licences, a licence without critical data such as cost, dates, rates etc., is a prerogative of the MPAI Principal Members.

Version 2 will substantially extend the capabilities of Version 1 of the 3 standards by supporting three new use cases:

  1. Conversation About a Scene: a human holds a conversation with a machine about objects in a scene of which the human is part. While conversing, the human points their fingers to indicate their interest in a particular object.
  2. Human-Connected Autonomous Vehicle Interaction: a group of humans converse with a Connected Autonomous Vehicle (CAV) on a domain-specific subject (travel by car). The machine understands the utterances, the emotion in the speech and the expression in the faces and in the gestures of the humans it is conversing with, and manifests itself as head and shoulder of an avatar whose face and head convey emotions congruent with the uttered speech.
  3. Avatar Videoconference. Avatars participate in a videoconference reproducing the upper part of the human bodies they represent with a high degree of accuracy. A virtual secretary with a humanly appearance creates an online summary of the meeting. The quality of the summary is enhanced by the virtual secretary’s ability to detect the avatars’ emotions and expressions and to interact with avatars requesting changes to the summary. The quality of the interaction os enhanced by the virtual secretary’s ability to show emotions and expressions.

MPAI develops data coding standards for applications that have AI as the core enabling technology. Any legal entity supporting the MPAI mission may join MPAI, if able to contribute to the development of standards for the efficient use of data.

So far, MPAI has developed 5 standards (normal font in the list below), is currently engaged in extending two approved standards (underlined) and is developing other 9 standards (italic).

Name of standard Acronym Brief description
AI Framework MPAI-AIF Specifies an infrastructure enabling the execution of implementations and access to the MPAI Store. MPAI-AIF V2 is being prepared.
Context-based Audio Enhancement MPAI-CAE Improves the user experience of audio-related applications in a variety of contexts. MPAI-CAE V2 is being prepared.
Compression and Understanding of Industrial Data MPAI-CUI Predicts the company performance from governance, financial, and risk data.
Governance of the MPAI Ecosystem MPAI-GME Establishes the rules governing the submission of and access to interoperable implementations.
Multimodal Conversation MPAI-MMC Enables human-machine conversation emulating human-human conversation. MPAI-MMC V2 is being prepared.
Server-based Predictive Multiplayer Gaming MPAI-SPG Trains a network to com­pensate data losses and detects false data in online multiplayer gaming.
AI-Enhanced Video Coding MPAI-EVC Improves existing video coding with AI tools for short-to-medium term applications.
End-to-End Video Coding MPAI-EEV Explores the promising area of AI-based “end-to-end” video coding for longer-term applications.
Connected Autonomous Vehicles MPAI-CAV Specifies components for Environment Sensing, Autonomous Motion, and Motion Actuation.
Avatar Representation and Animation MPAI-ARA Specifies descriptors of avatars impersonating real humans.
Neural Network Watermarking MPAI-NNW Measures the impact of adding ownership and licensing information in models and inferences.
Integrative Genomic/Sensor Analysis MPAI-GSA Compresses high-throughput experiments data combining genomic/proteomic and other.
Mixed-reality Collaborative Spaces MPAI-MCS Supports collaboration of humans represented by avatars in virtual-reality spaces called Ambients
Visual Object and Scene Description MPAI-OSD Describes objects and their attributes in a scene and the semantic description of the objects.

Visit the MPAI website, contact the MPAI secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook, Instagram, and YouTube.

Most importantly: join MPAI, share the fun, build the future.


The MPAI Framework Licence approach to Standard Essential Patent (SEP) licensing

In the business world, goods are delivered based on technical and commercial specifications. In the standards world, there are good reasons why the goods (the standards) of a Standards Developing Organisation (SDO) are not delivered according to commercial requirements normally accepted in the business world. However, this is not a good reason for an SDO to stay with commercial requirements called “patent declarations” that simply bind the originators to license their SEPs at so-called Fair, Reasonable and Non-Discriminatory (FRAND) terms. This simply would not make sense in business and this is the reason why FRAND has been and continues to be causing problems.

The Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) SDO was established to develop data coding standards mostly using the Artificial Intelligence (AI) technology while offering, for each of its standards, a clear licensing framework to implementers.

This is how MPAI implements its process:

  1. A new standard may be proposed by anybody.
  2. Anybody may participate in the development of the Use Cases and Functional Requirements of a standard.
  3. MPAI Principal Members intending to participate in the development of a standard develop and approve, with 2/3 majority, the Framework Licence (FWL) for that standard. The FWL is a licence without values (dollars, percentages, rates, dates, etc.) containing a declaration that:
    1. The total cost of the licence will be in line with the total cost of the licenses for similar data coding technologies and will consider the market value of the specific standardised technology.
    2. The licence will be issued not after commercial implementations of the standard are made available on the market.
  4. During the development of the standard, MPAI members making technical contributions to the committee developing the standard declare they will make available their licences according to the FWL. Non members may participate in the development of the standard by becoming members.
  5. After the standard has been approved by the MPAI General Assembly:
    1. MPAI members who believe to be SEP holders express their preference on the patent pool administrator of the standard with a 2/3 majority.
    2. All Members declare they will get a licence for other members’ SEPs, if used, within one year after publication of the licensing terms by SEP holders.

The MPAI process ensures that

  1. The use cases and functional requirements of a standard are developed with participation of the eventual users, not just by MPAI members (i.e., the technology developers).
  2. Information about the eventual licence of a standard includes time (not after products are on the market) and cost (total cost of the licence in line with the total cost of the licenses for similar technologies).

Sure, this is not the same as a hard delivery date of and a price tag in dollars – standard are a special type of goods that is closely watched by antitrust authorities. But it is a long way from a promise that cost will be “fair”, “reasonable” and “non-discriminatory”, and time at heaven’s will.

You can find more information on the MPAI process and the FWL.


Virtual Secretary for Videoconference

As reported in a previous post, MPAI is busy finalising the “Use Cases and Functional Requirements” document of MPAI-MMC V2. One use case is Avatar-Based Videoconference (ABV), part of the Mixed-reality Collaborative Space (MCS) project supporting scenarios where geographically separated humans represented by avatars collaborate in virtual-reality spaces.

ABV refers to a virtual videoconference room equipped with a table and an appropriate number of chairs to be occupied by:

  1. Speaking virtual twins representing human participants displayed as the upper part of avatars resembling their real twins.
  2. Speaking human-like avatars not representing humans, e.g., a secretary taking notes of the meeting, answering questions, etc.

In line with the MPAI approach to standardisation, this article will report the currently defined functions, input/output data, AIM topology of the AI Workflow (AIW) of the Virtual Secretary, and the AI Modules (AIM) and their input/output data. The information in this article is expected to change when it will be published as an annex to the upcoming Call for Technologies.

The functions of the Virtual Secretary are:

  1. To collect and summarise the statements made by participating avatars.
  2. To display the summary for participants to see, read and comment on.
  3. To receive sentences/questions about its summary via Speech and Text.
  4. To monitor the avatars’ emotions in their speech and face, and expression in their gesture.
  5. To change the summary based on avatars’ text from speech, emotion from speech and face, and expression from gesture.
  6. To respond via speech and text, and display emotion in text, speech, and face.

The Virtual Secretary workflow in the AI Framework is depicted in Figure 1.

Figure 1 – Reference Model of Virtual Secretary

The operation of the workflow can be described as follows:

  1. The Virtual Secretary recognises the speech of the avatars.
  2. The Speech Recognition and Face Analysis extract the emotions from the avatars’ speech and face.
  3. Emotion Fusion provides a single emotion based on the two emotions.
  4. Gesture Analysis extracts the gesture expression.
  5. Language Understanding uses the recognised text and the emotion in speech to provide the final version of the input text (LangUnd-Text) and the meaning of the sentence uttered by an avatar.
  6. Question analysis uses the meaning to extract the intention of the sentence uttered by an avatar.
  7. Question and Dialogue Processing (QDP) receives LangUnd-Text and the text provided by a participant via chat and generates:
    1. The text to be used in the summary or to interact with other avatars.
    2. The emotion contained in the speech to be synthesised.
    3. The emotion to be displayed by the Virtual Secretary avatar’s face.
    4. The expression to be displayed by the Virtual Secretary’s avatar
  8. Speech Synthesis (Emotion) uses QDP’s text and emotion and generates the Virtual Secretary’s synthetic speech with the appropriate embedded emotion.
  9. Face Synthesis (Emotion) uses the avatar’s synthetic speech and QDP’s face emotion to animate the face of the Virtual Secretary’s avatar.

The data types processed by the Virtual Secretary are:

Avatar Descriptors allow the animation of an Avatar Model based on the description of the movement of:

  1. Muscles of the face (e.g., eyes, lips).
  2. Head, arms, hands, and fingers.

Avatar Model allows the use of avatar descriptors related to the model without the lower part (from the waist down) to:

  1. Express one of the MPAI standardised emotions on the face of the avatar.
  2. Animate the lips of an avatar in a way that is congruent with the speech it utters, its associated emotion and the emotion it expresses on the face.
  3. Animate head, arms, hands, and fingers to express one of the Gestures to be standardised by MPAI, e.g., to indicate a particular person or object or the movements required by a sign language.
  4. Rotate the upper part of the avatar’s body, e.g., as need if the avatar turns to watch the avatar next to itself.

Emotion of a Face is represented by the MPAI standardised basic set of 59 static emotions and their semantics. To support the Virtual Secretary use case, MPAI needs new technology to represent a sequence of emotions each having a duration and a transition time. The dynamic emotion representation should allow for two different emotions to happen at the same time, possibly with different durations.

Face Descriptors allow the animation of a face expressing emotion, including at least eyes (to gaze at a particular avatar) and lips (animated in sync with the speech).

Intention is the result of analysis of the goal of an input question standardised in MPAI-MMC V1.

Meaning is information extracted from an input text and physical gesture expression such as question, statement, exclamation, expression of doubt, request, invitation.

Physical Gesture Descriptors represent the movement of head, arms, hands, and fingers suit-able for:

  1. Recognition of sign language.
  2. Recognition of coded hand signs, e.g., to indicate a particular object in a scene.
  3. Representation of arbitrary head, arm, hand, and finger motion.
  4. Culture-dependent signs (e.g, mudra sign).

Spatial coordinates allow the representation of the position of an avatar, so that another avatar can gaze at its face when it has a conversation with it.

Speech Features allow a user to select a Virtual Secretary with a particular speech model.

Visual Scene Descriptors allow the representation of a visual scene in a virtual environment.

In July MPAI plans on publishing a Call for Technologies for MPAI-MMC V2. The Call will have two attachments. The first is the already referenced Use Cases and Functional Requirements document, the second is the Framework Licence that those responding to the Call shall accept in order to have their response considered.


Watermarking and AI

The term watermarking comprises a family of methodological and application tools used to insert data into a content item in a way that is as imperceptible and persistent as possible. Watermarking is used for different purposes such as to enable an entity to claim ownership of a content item or a device to use it.
As a neural network is a type of content – and one that may be quite expensive to develop – does it make sense to apply the watermarking approach to content to neural networks?
MPAI thinks it does and is working to develop requirements for a Neural Network Watermarking (NNW) standard called MPAI-NNW that will enable a watermarking technology provider to validate their products’ claims. The standard will provide the means to measure, for a given size of the watermarking payload, the ability of:

  • The watermark inserter to inject a payload without affecting the performance of the neural network. This item requires, for a given application domain:
    • A testing dataset to be used for the watermarked and unwatermarked neural network.
    • An evaluation methodology to assess any change of the performance induced by the watermark.
  • The watermark detector to recognise the presence of the inserted watermark when applied to a watermarked network that has been modified (e.g., by transfer learning or pruning) or to any of the inferences of the modified model. This item requires, for a given application domain:
    • A list of potential modification types expected to be applied to the watermarked neural network as well as of their ranges (e.g., random pruning at 25%).
    • Performance criteria for the watermark detector (e.g., relative numbers of missed detections and false alarms).
  • The watermark decoder to successfully retrieve the payload when applied to a watermarked network that has been modified (e.g., by transfer learning or pruning) or to any of the inferences of the modified model. This item requires, for a given application domain:
    • A list of potential modification types expected to be applied to the watermarked neural network as well as of their ranges (e.g., random pruning at 25%).
    • ​​Performance criteria for the watermark decoder (e.g., 100% or (100-α)% recovery).
  • The watermark inserter to inject a payload at a low computational cost, e.g., execution time on a given processing environment.
  • The watermark detector/decoder to detect/decode a payload from a watermarked model or from any of its inferences, at a low computational cost, e.g., execution time on a given processing environment.

You can read the MPAI-NNW Use cases & functional requirements WD 0.2.

The work of developing requirements for the MPAI-NNW standard is ongoing. In this phase of the work, participation is open to non members. Contact the MPAI Secretariat if you wish to join the MPAI-NNW online meetings.


Watermarking and AI

The term watermarking comprises a family of methodological and application tools used to insert data into a content item in a way that is as imperceptible and persistent as possible. Watermarking is used for different purposes such as to enable an entity to claim ownership of a content item or a device to use it.
As a neural network is a type of content – and one that may be quite expensive to develop – does it make sense to apply the watermarking approach to content to neural networks?
MPAI thinks it does and is working to develop requirements for a Neural Network Watermarking (NNW) standard called MPAI-NNW that will enable a watermarking technology provider to validate their products’ claims. The standard will provide the means to measure, for a given size of the watermarking payload, the ability of:

  • The watermark inserter to inject a payload without affecting the performance of the neural network. This item requires, for a given application domain:
    • A testing dataset to be used for the watermarked and unwatermarked neural network.
    • An evaluation methodology to assess any change of the performance induced by the watermark.
  • The watermark detector to recognise the presence of the inserted watermark when applied to a watermarked network that has been modified (e.g., by transfer learning or pruning) or to any of the inferences of the modified model. This item requires, for a given application domain:
    • A list of potential modification types expected to be applied to the watermarked neural network as well as of their ranges (e.g., random pruning at 25%).
    • Performance criteria for the watermark detector (e.g., relative numbers of missed detections and false alarms).
  • The watermark decoder to successfully retrieve the payload when applied to a watermarked network that has been modified (e.g., by transfer learning or pruning) or to any of the inferences of the modified model. This item requires, for a given application domain:
    • A list of potential modification types expected to be applied to the watermarked neural network as well as of their ranges (e.g., random pruning at 25%).
    • ​​Performance criteria for the watermark decoder (e.g., 100% or (100-α)% recovery).
  • The watermark inserter to inject a payload at a low computational cost, e.g., execution time on a given processing environment.
  • The watermark detector/decoder to detect/decode a payload from a watermarked model or from any of its inferences, at a low computational cost, e.g., execution time on a given processing environment.

You can read the MPAI-NNW Use cases & functional requirements WD 0.2.

The work of developing requirements for the MPAI-NNW standard is ongoing. In this phase of the work, participation is open to non members. Contact the MPAI Secretariat if you wish to join the MPAI-NNW online meetings.


MPAI publishes Working Draft of Use Cases and Functional Requirements of Multimodal Conversation (MPAI-MMC) Version 2

 Geneva, Switzerland – 20 April 2022. Today the international, non-profit, unaffiliated Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) standards developing organisation has concluded its 19th General Assembly. Among the outcomes is the publication of the working draft of the Use Cases and Functional Requirements of the planned Version 2 of the Multimodal Conversation (MPAI-MMC) standard.

The MPAI process envisages that a standard be developed based on a Call for Technologies referring to two documents Functional Requirements and Framework Licence. While the MPAI-MMC V2 documents are still being finalised, MPAI offers an initial working draft of the Functional Requirements to alert the industry of its intention to initiate the development of the standard. This will happen when the Call for Technologies is published (planned to be the 13th of July 2022). Responses are expected to be submitted on the 10th of October 2022 and the standard to be published in the first few months of 2023.

Version 2 will substantially extend the capabilities of Version 1 of the MPAI-MMC standard by supporting three new use cases:

  1. Conversation About a Scene: a human holds a conversation with a machine about objects in a scene of which the human is part. While conversing, the human points their fingers to indicare their interest in a particular object.
  2. Human-Connected Autonomous Vehicle Interaction: a group of humans has a conversation on a domain-specific suject (travel by car) with a Connected Autonomous Vehicle. The machine understands the utterances, the emotion in the specch and in the faces, and the expression in their gestures. The machine manifests itself as the torso of an avatar whose face and head convey emotions congruent with the the speech it utters.
  3. Avatar Videoconference. In this instance of Mixed-reality Collaborative Space (MCS), avatars represent humans participating in a videoconference. Avatars reproduce the movements of the torsoes of human participants with a high degree of accuracy.

MPAI develops data coding standards for applications that have AI as the core enabling technology. Any legal entity supporting the MPAI mission may join MPAI, if able to contribute to the development of standards for the efficient use of data.

So far, MPAI has developed 5 standards (normal font in the list below), is currently engaged in extending two approved standards (underlined) and is developing other 9 standards (italic).

Name of standard Acronym Brief description
AI Framework MPAI-AIF Specifies an infrastructure enabling the execution of implementations and access to the MPAI Store. MPAI-AIF V2 is being prepared.
Context-based Audio Enhancement MPAI-CAE Improves the user experience of audio-related applications in a variety of contexts.
Compression and Understanding of Industrial Data MPAI-CUI Predicts the company performance from governance, financial, and risk data.
Governance of the MPAI Ecosystem MPAI-GME Establishes the rules governing the submission of and access to interoperable implementations.
Multimodal Conversation MPAI-MMC Enables human-machine conversation emulating human-human conversation. MPAI-MMC V2 is being prepared.
Server-based Predictive Multiplayer Gaming MPAI-SPG Trains a network to com­pensate data losses and detects false data in online multiplayer gaming.
AI-Enhanced Video Coding MPAI-EVC Improves existing video coding with AI tools for short-to-medium term applications.
End-to-End Video Coding MPAI-EEV Explores the promising area of AI-based “end-to-end” video coding for longer-term applications.
Connected Autonomous Vehicles MPAI-CAV Specifies components for Environment Sensing, Autonomous Motion, and Motion Actuation.
Avatar Representation and Animation MPAI-ARA Specifies descriptors of avatars impersonating real humans.
Neural Network Watermarking MPAI-NNW Measures the impact of adding ownership and licensing information in models and inferences.
Integrative Genomic/Sensor Analysis MPAI-GSA Compresses high-throughput experiments data combining genomic/proteomic and other.
Mixed-reality Collaborative Spaces MPAI-MCS Supports collaboration of humans represented by avatars in virtual-reality spaces called Ambients.
Visual Object and Scene Description MPAI-OSD Describes objects and their attributes in a scene and the semantic description of the objects.

Visit the MPAI website, contact the MPAI secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook, Instagram, and YouTube.

Most importantly: join MPAI, share the fun, build the future.

 

 


Video coding in MPAI

Video coding has been one of the first standardisation areas addressed by MPAI (the first was Content-based Enhanced Audio Coding or MPAI-CAE). One MPAI Video Coding area is focused on the use of AI tools to improve the classic block-based hybrid coding framework and offer more efficient video compression solutions. This was motivated by the rough estimate that MPAI did from its survey on published AI-based video coding results leading to a potential improvement of up to 30%.

Figure 1 identifies 10 coding tools that can potentially be improved by AI tools.

Figure 1 – A classic block-based hybrid coding framework

MPAI decided to start from a high-performance “clean-sheet” traditional (i.e., data-processing-based) coding scheme and add AI-enabled improvements to it, rather than starting from a scheme overloaded by IP-laden compression technologies bringing dubious improvements. It selected the MPEG-5 Essential Video Coding (EVC) standard and is modifying it by enhancing/replacing existing video coding tools with AI tools. The EVC Baseline Profile has been selected because it employs 20+ years old technologies and has a compression performance close to that of HEVC.

MPAI calls this project MPAI AI-Enhanced Video Coding (MPAI-EVC). However, MPAI has made the deliberate decision not to initiate a standard project at this time, because it first wants to set up a unified platform and conduct experiments with the goal to ascertain that 25% coding performance improvement can be achieved. Therefore the name of the current activity within the project is “MPAI-EVC Evidence Project”.

Significant results have already been achieved by a growing group of participants that includes both MPAI and non-MPAI members. Note: MPAI has the policy of allowing non-members to participate in its preliminary pre-standardisation activities.

As soon as the Evidence Project will demonstrate that AI tools can improve the MPEG-5 EVC efficiency by at least 25%, MPAI will be in a position to initiate work on the MPAI-EVC standard. The functional requirements have already been developed and only need to be revised. Then, the framework licence will be developed by active principal members and the Call for Technology issued to acquire the technology needed to develop the MPAI-EVC standard..

MPAI-EVC covers the short-to-medium term video coding needs. If you want to know more. visit the MPAI-EVC web page or contact the MPAI Secretariat to participate in MPAI-EVC meetings.

MPAI has a second video coding project motivated by the consensus in the video coding research community that the so-called End-to-End (E2E) video coding schemes can yield significantly higher performance in the longer term.

MPAI is conducting the MPAI End-to-End Video Coding (MPAI-EEV) project in its role as a technical body whose mission is the provision of efficient and usable data coding standards, unconstrained by legacy IP. The notion of Framework Licence is at the basis of this work. The Basic Framework Licence for Collaborative Explorations is the interim MPAI-EEV Framework Licence. This means that contributions submitted to MPAI-EEV (formally, to the Requirements (EEV) group) shall be accompanied by this licence.

MPAI has selected OpenDVC as a starting point and is investigating the addition of novel motion compensation networks.

Figure 2 – The OpenDVC reference model with a motion compensation network

Clearly, MPAI-VVC targets the longoterm term video coding needs. If you want to know more visit the MPAI-EEV web page or contact the MPAI Secretariat to participate in MPAI-EEV meetings.


MPAI-MMC to be adopted as IEEE standard

On the day MPAI Multimodal Conversation (MPAI-MMC) reached its full 6 months since its approval, the IEEE hosted the kick-off meeting of the P3300 working group tasked with the adoption of the MPAI technical specification as an IEEE standard. Earlier, MPAI and IEEE had signed an agreement whereby MPAI grants IEEE the right to publish MPAI-MMC as an IEEE standard.

At its first meeting, the WG has approved the working draft of IEEE 3300 and requested IEEE to ballot the WD. In a couple of months, MPAI-MMC is expected to become IEEE 3300.

The creation of the WG and the development of the IEEE 3300 standard are the natural steps following the issuance of the Call for Patent Pool Administrator by the MPAI-MMC patent holders. The next step will be the development of the Use Cases and Functional Requirements for MPAI-MMC Version 2 that MMC-DC and other groups are busy preparing.

The IEEE 3300 WD is verbatim MPAI-MMC, so this article is a good opportunity to recall the MPAI document and its structure. If you want to follow this description with the actual text, please download it.

Chapter 1 is an informative introduction to MPAI, the AI Framework (MPAI-AIF) approach to AI data coding standards including the notion of AI Modules (AIM) organised in and AI Workflow executed in the AI Framework (AIF), and the governance of the MPAI ecosystem.

Chapter 2 is a functional specification of the 5 use cases:

“Conversation with Emotion” (CWE):  a human is holding an audio-visual conversation with a machine impersonated by a synthetic voice and an animated face. Both the human and the machine express emotion.
“Multimodal Question Answering” (MQA): a human is holding an audio-visual conversation with a machine impersonated by a synthetic voice. The human asks a question about an object held in thei hand.
Three Uses Cases supporting conversational translation applications. In each Use Case, users can specify whether speech or text is used as input and, if it is speech, whether their speech features are preserved in the interpreted speech:

– “Unidirectional Speech Translation” (UST).
– “Bidirectional Speech Translation” (BST).
– “One-to-Many Speech Translation” (MST).

Chapter 3 contains definitions of terms that are specific to MPAI-MMC.

Chapter 4 contains normative and information references.

Chapter 5 contains the specification of the 5 use cases. For each of them, the following is specified:

  1. The Scope of the Use Case
  2. The syntax and semantics of the data entering and leaving the AIW
  3. The Architecture of AIMs composing the AIW implementing the Use Case
  4. The functions of the AIMs
  5. The JSON Metadata describing the AIW

Chapter 6 contains the specification of all the AIMs of all the Use Cases:

  1. A note about the meaning of AIM interoperability
  2. The syntax and semantics of the data entering and leaving all the AIMs of the 5 AIWs
  3. The formats of all the AIM data

Annex 1 defines the terms not specific to MPAI-MMC

Annex 2 contains notices and disclaimers concerning MPAI standards (informative)

Annex 3 provides a brief introduction to the Governance of the MPAI Ecosystem (informative)

Annex 4 and the following annexes provide the AIW and AIM metadata of all MPAI-MMC Use Cases.

MPAI-MMC is just the initial step. Two more MPAI Technical Specifications have been submitted for adoption: AI Framework (MPAI-AIF) and Context-based Audio Enhancement.

MPAI is looking forward to a mutually beneficial collaboration with IEEE.


MPAI issues a Call for Patent Pool Administrator on behalf of the MPAI-CAE and MPAI-MMC patent holders

Geneva, Switzerland – 23 March 2022. Today the international, non-profit, unaffiliated Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) standards developing organisation has concluded its 18th General Assembly. Among the outcomes is the publication of Call for Patent Pool Administrators for two of its approved Technical Specifications.

The MPAI process of standard development prescribes that Active Principal Members, i.e., those intending to participate in the development of a Technical Specification, adopt a Framework Licence before initiating the development. All those contributing to the work are requested to accept the Framework Licence. If they are not Members, they are requested to join MPAI. Once a Technical Specification is approved, MPAI identifies patent holders and facilitates the creation of a patent pool.

Patent holders of Context-based Audio Enhancement (MPAI-CAE) and Multimodal Conversation (MPAI-MMC) have agreed to issue a Call for Patent Pool Administrator and have asked MPAI to publish the call on its website. The Patent Holders expect to work with the selected Entity to facilitate a licensing program that responds to the requirements of the licensees while ensuring the commercial viability of the program. In the future, the coverage of the patent pool may be extended to new versions of MPAI-CAE and MPAI-MMC, and/or other MPAI standards.

Parties interested in being selected as Entity are requested to communicate, no later than 1 May 2022, their interest and provide appropriate material as a qualification to the MPAI Secretariat. The Secretariat will forward the received material to the Patent Holders.

While Version 1 of MPAI-CAE and MPAI-MMC are progressing toward practical deployment, work is ongoing to develop Use Cases and Functional Requirements of MPAI-CAE and MPAI-MMC V2. These will extend the V1 technologies to support new use cases, i.e.,

  1. Conversation about a Scene (CAS), enabling a human holds a conversation with a machine on the objects in a scene.
  2. Human to Connected Autonomous Vehicle Interaction (HCI), enabling humans to have rich interaction, including question answering and conversation with a Connected Autonomous Vehicle (CAV).
  3. Mixed-reality Collaborative Spaces (MCS), enabling humans to develop collaborative activities in a Mixed-Reality space via their avatars.

MPAI develops data coding standards for applications that have AI as the core enabling technology. Any legal entity supporting the MPAI mission may join MPAI, if able to contribute to the development of standards for the efficient use of data.

MPAI is currently engaged in extending some of the already approved standards and developing other 9 standards (those in italic in the list below).

Name of standard Acronym Brief description
AI Framework MPAI-AIF Specifies an infrastructure enabling the execution of implementations and access to the MPAI Store.
Context-based Audio Enhancement MPAI-CAE Improves the user experience of audio-related applications in a variety of contexts.
Compression and Understanding of Industrial Data MPAI-CUI Predicts the company performance from governance, financial, and risk data.
Governance of the MPAI Ecosystem MPAI-GME Establishes the rules governing the submission of and access to interoperable implementations.
Multimodal Conversation MPAI-MMC Enables human-machine conversation emulating human-human conversation.
Server-based Predictive Multiplayer Gaming MPAI-SPG Trains a network to com­pensate data losses and detects false data in online multiplayer gaming.
AI-Enhanced Video Coding MPAI-EVC Improves existing video coding with AI tools for short-to-medium term applications.
End-to-End Video Coding MPAI-EEV Explores the promising area of AI-based “end-to-end” video coding for longer-term applications.
Connected Autonomous Vehicles MPAI-CAV Specifies components for Environment Sensing, Autonomous Motion, and Motion Actuation.
Avatar Representation and Animation MPAI-ARA Specifies descriptors of avatars impersonating real humans.
Neural Network Watermarking MPAI-NNW Measures the impact of adding ownership and licensing information in models and inferences.
Integrative Genomic/Sensor Analysis MPAI-GSA Compresses high-throughput experiments data combining genomic/proteomic and other.
Mixed-reality Collaborative Spaces MPAI-MCS Supports collaboration of humans represented by avatars in virtual-reality spaces called Ambients
Visual Object and Scene Description MPAI-OSD Describes objects and their attributes in a scene and the semantic description of the objects.

Visit the MPAI website, contact the MPAI secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook, Instagram, and YouTube.

Most importantly: join MPAI, share the fun, build the future.