Moving Picture, Audio and Data Coding
by Artificial Intelligence

Seven good reasons to join MPAI

MPAI, the international, unaffiliated, non-profit organisation developing AI-based data coding standards with clear Intellectual Property Rights licensing frameworks, is now offering those wishing to join MPAI the opportunity to start their 2023 membership two months in advance, from the 1st of November 2022.

Here are six more good reasons why you should join MPAI now.

  1. In a matter of months after its establishment in September 2020, MPAI has developed 5 standards. Now it is working to extend 3 of them (AI Framework, Context-based Audio Enhancement, and Multimodal Conversation), and to develop 2 new standards (Neural Network Watermarking and Avatar Representation and Animation). More in the latest press release.
  2. MPAI enforces a rigorous standards development process and offers an open route to convert – without modification – its specifications to IEEE standards. Four MPAI standards – AI Framework (P3301), Context-based Audio Enhancement (P3302), Compression and Understanding of Industrial Data (P3303), and Multimodal Conversation (P3304) – are expected to become IEEE standards in a matter of weeks,.
  3. MPAI has proved that AI-based standards in disparate technology areas – execution of AI application, audio, speech, natural language processing, and financial data – can be developed in a timely manner. It is currently developing standards for avatar representation and animation, and neural network watermarking. More projects are in the pipeline in health, connected autonomous vehicles, short-medium and long-term video coding, online gaming, extended reality venues, and the metaverse.
  4. MPAI role extends from an environment to develop standards to a stepping stone to make its standards practically and timely usable. In a matter of months after standard approval, patent holders have already selected a patent pool administrator for some MPAI standards.
  5. MPAI is the root of trust of an ecosystem specified by its Governance of the MPAI Ecosystem grafted on its standards development process. The ecosystem includes a Registration Authority where implementers can get identifiers for their implementations, and the MPAI Store, a not-for-profit entity with the mission to test the security and conformance of implementations, make them available for download and publish their performance as reported by MPAI-appointed Performance Assessors.
  6. MPAI works on leading-edge technologies and its members have  already been given many opportunities to publish the results of their research and standard development results at conferences and in journals.

Joining MPAI is easy. Send to the MPAI Secretariat the application form, the signed Statutes and a copy of the bank transfer of 480/2400 EUR for associate/principal membership.

Join the fun – Build the future!


MPAI extends 3 and develops 2 new standards

Geneva, Switzerland – 26 October 2022. Today the international, non-profit, unaffiliated Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) standards developing organisation has concluded its 25th General Assembly (MPAI-25). Among the outcomes is the decision, based on substantial inputs received in response to its Calls for Technologies, to extend three of its existing standards and to initiate the development of two new standards.

The three standards being extended are:

  1. AI Framework (MPAI-AIF). AIF is an MPAI-standardised environment where AI Workflows (AIW) composed of AI Modules (AIM) can be executed. Based on substantial industry input, MPAI is in a position to extend the MPAI-AIF specification with a set of APIs that allow a developer to configure the security solution adequate for the intended application.
  2. Context-based Audio Enhancement (MPAI-CAE). Currently, MPAI-CAE specifies four use cases: – Emotion-Enhanced Speech, Audio Recording Preservation, Speech Restoration Systems, and Enhanced Audioconference Experience. The last use case includes technology to describe the audio scene of an audio/video conference room in a standard way. MPAI-CAE is being extended to support more challenging environments such as human interaction with autonomous vehicles and metaverse applications.
  3. Multimodal Conversation (MPAI-MMC). MPAI-MMC V1 has specified a robust and extensible emotion description system. In the currently developed V2, MPAI is generalising the notion of Emotion to cover two more internal statuses: Cognitive State and Social Attitude and is specifying a new data format covering the three internal statuses called Personal Status.

The two new standards under development are:

  1. Avatar Representation and Animation (MPAI-ARA). The standard intends to provide technology to enable:
    1. A user to generate an avatar model and then descriptors to animate the model, and an independent user to animate the model using the model and the descriptors.
    2. A machine to animate a speaking avatar model expressing the Personal Status that the machine has generated during the conversation with a human (or another avatar).
  2. Neural Network Watermarking (MPAI-NNW). The standard specifies methodologies to evaluate neural network watermarking solutions:
    1. The impact on the performance of a watermarked neural network (and its inference).
    2. The ability of the detector/decoder to detect/decode a payload when the watermarked neural network has been modified.
    3. The computational cost of injecting, detecting in or decoding a payload from the watermark.

Development of these standards is planned to be completed in the early months of 2023.

MPAI-25 has also confirmed its intention to develop a Technical Report (TR) called MPAI Metaverse Model (MPAI-MMM). The TR will cover all aspects underpinning the design, deployment, and operation of a Metaverse Instance, especially interoperability between Metaverse Instances.

So far, MPAI has developed five standards for applications that have AI as the core enabling technology. It is now extending three of those existing, developing two new standards and one technical report, and engaged in the drafting of functional requirements for nine future standards. It is thus a good opportunity for legal entities supporting the MPAI mission and able to contribute to the development of standards for the efficient use of data to join MPAI. Also considering that by joining on or after the 1st of November 2022, membership is immediately active and will last until 2023/12/31.

Developed standard Acronym Brief description
Compression and Understanding of Industrial Data MPAI-CUI Predicts the company’s performance from governance, financial, and risk data.
Governance of the MPAI Ecosystem MPAI-GME Establishes the rules governing the submission of and access to interoperable implementations.
Standard being extended Acronym Brief description
AI Framework MPAI-AIF Specifies an infrastructure enabling the execution of implementations and access to the MPAI Store.
Context-based Audio Enhancement MPAI-CAE Improves the user experience of audio-related applications in a variety of contexts.
Multimodal Conversation MPAI-MMC Enables human-machine conversation emulating human-human conversation.
Standard being developed Acronym Brief description
Avatar Representation and Animation MPAI-ARA Specifies descriptors of avatars impersonating real humans.
MPAI Metaverse Model MPAI-MMM Development of a technical report guiding creation and operation of Interoperable Metaverses.
Neural Network Watermarking MPAI-NNW Measures the impact of adding ownership and licensing information to models and inferences.
Standard being developed Acronym Brief description
AI Health MPAI-AIH Specifies components to securely collect, AI-based process, and access health data.
Connected Autonomous Vehicles MPAI-CAV Specifies components for Environment Sensing, Autonomous Motion, and Motion Actuation.
End-to-End Video Coding MPAI-EEV Explores the promising area of AI-based “end-to-end” video coding for longer-term applications.
AI-Enhanced Video Coding MPAI-EVC Improves existing video coding with AI tools for short-to-medium term applications.
Integrative Genomic/Sensor Analysis MPAI-GSA Compresses high-throughput experiments’ data combining genomic/proteomic and other data.
Mixed-reality Collaborative Spaces MPAI-MCS Supports collaboration of humans represented by avatars in virtual-reality spaces.
Visual Object and Scene Description MPAI-OSD Describes objects and their attributes in a scene.
Server-based Predictive Multiplayer Gaming MPAI-SPG Trains a network to compensate data losses and detects false data in online multiplayer gaming.
XR Venues MPAI-XRV XR-enabled and AI-enhanced use cases where venues may be both real and virtual.

Please visit the MPAI website, contact the MPAI secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook, Instagram, and YouTube.

Most importantly: please join MPAI, share the fun, build the future.


MPAI, third year and counting

Moving Picture, Audio, and Data Coding by Artificial Intelligence – MPAI – was established two years ago, on 30 September 2020.. It is a good time to review what it has done, what it plans on doing, and how the machine driving the organisation works.

These are the main results of the 2nd year of MPAI activity (the activities of the first year can be found here).

  1. Approved the MPAI-AIF (AI Framework) and MPAI-CAE (Context-based Audio Enhancement) Technical Specifications.
  2. Approved several revisions of all five Technical Specifications.
  3. Promoted the establishment of a patent pool for MPAI standards (as mandated by the MPAI Statutes).
  4. Submitted four Technical Specifications to IEEE for adoption without modification as IEEE standards. Three – AIF, CAE, and MMC (Multimodal Conversation) – should be approved as IEEE Standards by the end of 2022.
  5. Kickstarted the MPAI Ecosystem by
    1. Promoting the establishment of the MPAI Store.
    2. Adopting the MPAI-MPAI Store Agreement.
    3. Contributing to the definition of the IEEE-MPAI Store agreement on ImplementerID Registration Authority.
    4. Revising the MPAI-GME (Governance of the MPAI Ecosystem) standard to accommodate the developments of the Ecosystem.
  6. Developed the MPAI-AIF Reference Software to enable implementation of other MPAI Technical Specifications.
  7. Developed drafts of the MPAI-AIF Conformance Testing,
  8. Developed MPAI-CAE and MPAI-MMC Reference Software and Conformance Testing drafts.
  9. Continued the development of Use Cases and Functional Requirements for Connected Autonomous Vehicles (MPAI-CAV), AI-Enhanced Video Coding (MPAI-EVC), and Server-based Predictive multiplayer Gaming (MPAI-SPG).
  10. Opened new activities in AI Health (MPAI-AIH), Avatar Representation and Animation (MPAI-ARA), End-to-End Video Coding (EEV), MPAI Metaverse Model (MMM), Neural Network Watermarking (NNW), and XR Venues (XRV).
  11. Developed 3 Calls for Technologies for MPAI-AIF V2, MPAI-MMC V2, and MPAI-NNW. Responses are expected by MPAI-25.
  12. Published Towards Pervasive and Trustworthy Artificial Intelligence.
  13. Published papers on conferences and journals.

The organisation of the machine that has produced these results is depicted in Figure 1.

Figure 1 – The MPAI organisation (Sptember 2022)

The General Assembly is the supreme body of MPAI. It establishes Developing Committees (DC) tasked with the development of standards: AI Framework, Context-based Audio Enhancement, Compression and Understanding of Industrial Data, Governance of the MPAI Ecosystem, Multimodal Conversation, and Neural Network Watermarking. It also directs Standing Committees, currently, the Requirements SC under which the following groups operate: AI Health, Avatar Representation and Animation, Connected Autonomous Vehicles, End-to-End Video Coding, AI-Enhanced Video Coding, MPAI Metaverse Model, and Server-based Predictive multiplayer Gaming, and XR Venues.

The Board of Directors is in charge of day-to-day activitivies and oversees five Advisory Committees: Membership and Nominating, Finance and Audit, IPR Support, Industry and Standards, and Communication.

The Secretariat performs such activities as keeping the list of members, organising meeting, communicating etc.

In its third year of activities MPAI plans on:

  1. Make the MPAI Ecosystem fully operational.
  2. Complete the specification sets of MPAI-AIF, MPAI-CAE, and MPAI-MMC.
  3. Promote the development of the MPAI implementations market.
  4. Develop extensions of MPAI-AIF and MPAI-MMC.
  5. Develop the new MPAI-NNW Technical Specification.
  6. Achieve publication of MPAI-AIF, MPAI-CAE, MPAI-CUI, and MPAI-MMC as IEEE standards.
  7. Submit MPAI-AIF V2, MPAI-CAE V2, MPAI-MMC V2, MPAI-NNW for adoption without modification by IEEE.
  8. Initiate the standard development process for MPAI-AIH, MPAI-ARA, MPAI-CAV, MPAI-EVC, MPAI-SPG, and MPAI-XRV.
  9. Promote the MPAI Metaverse Model and make it the compass for MPAI standard development.
  10. Be ready to exploit new standardisation opportunities.
  11. Continue active publication of papers on MPAI activities and results.
  12. Strengthen relationship with other bodies.

At its 24th General Assembly the Board announced the new membership policy: those who join MPAI after the 1st of November will have a 14-month membership until the 31st of December 20223. A good opportunity to join the fun, build the future!


MPAI appoints MPAI Store, incorporated as Company Limited by Guarantee, as the MPAI Store in the MPAI Ecosystem

Geneva, Switzerland – 30 September 2022. Today the international, non-profit, unaffiliated Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) standards developing organisation has concluded its 24th General Assembly (MPAI-24). Among the outcomes is the appointment of MPAI Store, a company limited by guarantee incorporated in Scotland, as the “MPAI Store” referenced to by the Governance of the MPAI Ecosystem standard (MPAI-GME).

The tasks of the MPAI Store are critical for the operation of the MPAI Ecosystem. Some of these are:

  1. Operation on a cost-recovery basis.
  2. Registration Authority function, i.e., assignment of an ID to implementers.
  3. Testing of implementations of MPAI technical specifications submitted by implementers.
  4. Labelling of implementations based on the verified interoperability level.
  5. Distribution of implementations via high-availability ICT infrastructure.

MPAI-24 has reiterated the deadline extension for submitting responses to the Calls for Technologies on AI Framework, Multimodal Conversation, and Neural Network watermarking until the 24th of October. The link to all documents relevant to the Calls can be found on the MPAI website.

MPAI develops data coding standards for applications that have AI as the core enabling technology. Any legal entity supporting the MPAI mission may join MPAI, if able to contribute to the development of standards for the efficient use of data.

 

So far, MPAI has developed 5 standards (not italic in the list below), is currently engaged in extending 2 approved standards (underlined) and is developing another 10 standards (italic).

 

 

Name of standard Acronym Brief description
AI Framework MPAI-AIF Specifies an infrastructure enabling the execution of implementations and access to the MPAI Store.
Context-based Audio Enhancement MPAI-CAE Improves the user experience of audio-related applications in a variety of contexts.
Compression and Understanding of Industrial Data MPAI-CUI Predicts the company’s performance from governance, financial, and risk data.
Governance of the MPAI Ecosystem MPAI-GME Establishes the rules governing the submission of and access to interoperable implementations.
Multimodal Conversation MPAI-MMC Enables human-machine conversation emulating human-human conversation.
Avatar Representation and Animation MPAI-ARA Specifies descriptors of avatars impersonating real humans.
Connected Autonomous Vehicles MPAI-CAV Specifies components for Environment Sensing, Autonomous Motion, and Motion Actuation.
End-to-End Video Coding MPAI-EEV Explores the promising area of AI-based “end-to-end” video coding for longer-term applications.
AI-Enhanced Video Coding MPAI-EVC Improves existing video coding with AI tools for short-to-medium term applications.
Integrative Genomic/Sensor Analysis MPAI-GSA Compresses high-throughput experiments’ data combining genomic/proteomic and other data.
Mixed-reality Collaborative Spaces MPAI-MCS Supports collaboration of humans represented by avatars in virtual-reality spaces.
MPAI Metaverse Model MPAI-MMM Development of a reference model to guide the creation of Interoperable Metaverse Instances.
Neural Network Watermarking MPAI-NNW Measures the impact of adding ownership and licensing information to models and inferences.
Visual Object and Scene Description MPAI-OSD Describes objects and their attributes in a scene.
Server-based Predictive Multiplayer Gaming MPAI-SPG Trains a network to compensate data losses and detects false data in online multiplayer gaming.
XR Venues MPAI-XRV XR-enabled and AI-enhanced use cases where venues may be both real and virtual.

 

Please visit the MPAI website, contact the MPAI secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook, Instagram, and YouTube.

Most importantly: please join MPAI, share the fun, build the future.


Imperceptibility, Robustness, and Computational Cost in Neural Network Watermarking

Introduction

Research efforts, specific skills, training and processing can cumulatively bring the development costs of a neural network anywhere from a few thousand to a few hundreds of thousand dollars. Therefore, the AI industry needs a technology to ensure traceability and integrity not only of a neural network but also of the content generated by it (so-called inference).

Faced with a similar problem, the digital content production and distribution industry has considered watermarking as a tool to insert a payload carrying data such as timestamping or owner ID information. If the inserted payload is imperceptible and persistent, it can be used to signal the ownership of a content item or the semantic modification of its content.

A role for MPAI?

MPAI has assessed that watermarking can also be used by the AI industry and intends to develop a standard to assess the performance of neural network watermarking technologies. Users with different applications in mind can be interested in neural network watermarking. For instance, the owner, i.e., the developer of a neural network, is interested in having their neural network protected by the “best” watermarking solution. The watermarking provider, i.e., the developer of the watermarking technology, is interested in evaluating the performance of their watermarking technology. In its turn, the customer, i.e., the provider of an end product needs the owner’s and watermarking provider’s solution to offer a product or a service. Finally, the end-user buys or rents the product and uses it.

All these users are mainly interested in three neural network watermarking properties: imperceptibility, persistency, and computational complexity.

Neural network watermarking imperceptibility

One of the features that a user of a watermarking technology may be interested in is assessing the impact that the embedding of a watermark in a neural network has on the quality of the inference that the neural network provides.

MPAI has identified the following process to test imperceptibility:

  1. Select a pair of training and testing datasets and a set of M unwatermarked neural networks.
  2. Insert a watermark in each neural network with D different data payloads, yielding M x (D + 1) neural networks: M x D watermarked neural networks and M unwatermarked neural networks.
  3. Feed the M x (D + 1) neural networks with the testing dataset and measure the quality of the produced inference.

Neural network watermarking persistence

One of the features that a user of a watermarking technology may be interested in is assessing the capability of the detector to ascertain the presence of the watermark and the capability of the decoder to retrieve from a modified version of the neural network.

MPAI has identified the following process to test the capability of the detector to find the watermark in the neural network:

  1. Repeat step 1 above.
  2. Repeat step 2 above.
  3. Repeat step 3 above.
  4. Apply one of the modifications (to be specified by the standard), with the goal to alter the watermark. Each modification must be characterised by a set of parameters that will challenge the robustness of the watermark.
  5. Feed the M x (D + 1) neural networks to the detector and record the decision –“watermark present” or “watermark absent”.
  6. Mark the results as true positive, true negative, false positive (false alarm) and false negative (missed detection).

The process to test the capability of the decoder to retrieve the payload in the neural network requires similar steps as above where “presence and absence” is replaced by “distance between the retrieved payload and the original payload”.

The computational cost

One of the features that a user of a watermarking technology may be interested in is evaluating the processing cost of a watermarking solution (in terms of computing resources and/or time).

The MPAI Call for Technologies

The MPAI process is to develop Use Cases and Functional Requirements, issue Calls for Technologies, receive and assess responses to the Call, and develop a standard for assessing the performance of a neural network watermarking technology. The published document can be found here. The MPAI secretariat should receive responses by 2022/10/24.

 

 


Avatars and the MPAI-MMC V2 Call for Technologies

The goal of the MPAI Multimodal Conversation (MPAI-MMC) standard is to enable forms of human-machine conversation that emulate the human-human one in completeness and intensity. While this is clearly a long-term goal, MPAI is focusing on standards providing frameworks which break down – where possible – complex AI functions to facilitate the formation of a component market where solution aggregators can find AI Modules (called AIM) to build AI Workflows (called AIW) corresponding to standard use cases. The AI Framework standard (MPAI-AIF) is a key enabler of this plan.

In September 2021, MPAI approved Multimodal Conversation V1 with 5 use cases. The first one – Conversation with Emotion – assumes that a human converses with a machine that understands what the human says, extracts the human’s emotion from their speech and face, articulates a textual response with an attached emotion, and converts it into synthetic speech containing emotion and a video containing a face expressing the machine’s emotion whose lips are properly animated.

The second MPAI-MMC V1 use case was Multimodal Question Answering. Here a human asks a question to a machine about an object. The machine understands the question and the nature of the object and generates a text answer which is converted to synthetic speech.

The other use cases are about automatic speech translation and they are not relevant for this article.

In July 2022, MPAI issued a Call for Technologies with the goal to acquire the technologies needed to implement three more Multimodal Conversation use cases. One concerns the extension of the notion of “emotion” to “Personal Status”, an element of the internal state of a person which also contains cognitive status (what a human or a machine has understood about the context) and attitude (what is the stance the human or the machine intends to adopt in the context). Personal status is conveyed by text, speech, face, and gesture. See here for more details. Gesture is the second ambition of MPAI-MMC V2.

A use case of MPAI-MMC V2 is “Conversation about a Scene” and can be described as follows:

A human converses with a machine indicating the object of their interest. The machine sees the scene and hears the human; extracts and understands the text from the human’s speech and the personal status in their speech, face, and gesture; understands the object intended by the human; produces a response (text) with its own personal status; and manifests itself as a speaking avatar.

Figure 1 depicts a subset of the technologies that MPAI needs in order to implement this use case.

Figure 1 – The audio-visual front end

These are the functions of the modules and the data provided:

  1. The Visual Scene Description module analyses the video signal, describes, and makes available the Gesture and the Physical Objects in the scene.
  2. The Object Description module provides the Physical Object Descriptors.
  3. The Gesture Description modules provides the Gesture Descriptors.
  4. The Object Identification module uses both Physical Object Descriptors and Visual Scene-related Descriptors, to understand which object in the scene the human points their finger to, select the appropriate set of Physical Object Descriptors, and give the Object ID.
  5. The Gesture Descriptor Interpretation module uses the Gesture Descriptors to extract the Personal Status of Gesture.
  6. The Face DescriptionFace Descriptor Interpretation chain produces the Personal Status of Face.
  7. The Audio Scene Description module analyses the audio signal, describes, and makes available the Speech Object.
  8. The Speech DescriptionSpeech Descriptor Interpretation chain produces the Personal Status of Speech.

After the “front end” part we have a “conversation and manifestation” part involving another set of technologies as described in Figure 2.

Figure 2 – Conversation and Manifestation

  1. The Text and Meaning Extraction module produces Text and Meaning.
  2. The Personal Status Fusion module integrates the three sources of Personal Status into the Personal Status.
  3. The Question and Dialogue Processing module processes Input Text, Meaning, Personal Status and Object ID and provides the Machine Output Text and Personal Status.
  4. The Personal Status Display module processes Machine Output Text and Personal Status and produces a speaking avatar uttering Machine Speech and showing an animated Machine Face and Machine Gesture.

The MPAI-MMC V2 Call considers another use case – Avatar-Based Videoconference – that uses avatars in a different way.

Avatars representing geographically separated humans participate in a virtual conference. Each participant receives each other participants’ avatars, locates them around a table, and participates in the videoconference embodied in their own avatar.

The system is composed of:

  1. Transmitter client: Extracts speech and face descriptors for authentication, creates avatar descriptors using Face & Gesture Descriptors, and Meaning, and sends the participant’s Avatar Model & Descriptors and Speech to the Server.
  2. Server: Authenticates participants; distributes Avatar Models & Descriptors and Speech of each participant.
  3. Virtual Secretary: Makes and displays a summary of the avatars’ utterances using their speech and Personal Status.
  4. Receiver client: Creates virtual videoconference scene, attaches speech to each avatar and lets participant view and/or navigate the virtual videoconference room.

Figure 3 gives a simplified one-figure description of the use case.

Figure 3 – The avatar-based videoconference use case

This is the sequence of operations:

  1. The Speaker Identification and Face Identification modules produce Speech and Face Descriptors that the Authentication module in the server uses to identify the participant.
  2. The Personal Status Extraction module produces the Personal Status.
  3. The Speech Recognition and Meaning produces the Meaning.
  4. The Face Description and Gesture Description modules produce the Face and Gesture Descriptors (for feature and motion).
  5. The Participant Description module uses Personal Status, Meaning, and Face and Gesture Descriptors to produce the Avatar Descriptors.
  6. The Avatar Animation module animates the individual participant’s Avatar Model using the Avatar Descriptors.
  7. The AV Scene Composition module places the participants’ avatars in their assigned places, attaches to each avatar its own speech and produces the Audio-Visual Scene that the participant can view and navigate.

The MPAI-MMC V2 use cases require the following technologies

  1. Audio Scene Description.
  2. Visual Scene Description.
  3. Speech Descriptors for:
    1. Speaker identification.
    2. Personal status extraction.
  4. Human Object Descriptors.
  5. Face Descriptors for:
    1. Face identification.
    2. Personal status extraction.
    3. Feature extraction (e.g., for avatar model)
    4. Motion extraction (e.g., to animate an avatar).
  6. Gesture Descriptors for:
    1. Personal Status extraction.
    2. Features (e.g., for avatar model)
    3. Motion (e.g., to animate an avatar).
  7. Personal Status.
  8. Avatar Model.
  9. Environment Model.
  10. Human’s virtual twin animation.
  11. Animated avatar manifesting a machine producing text and personal status.

The MPAI-MMC V2 standard is an opportunity for the industry to agree on a set of data formats so a market of modules can be created that is able to handle those formats. The standard should be extensible, in the sense that as new more performing technologies mature, they can be incorporated into the standard.

Please see:

  1. The 2 min video (YouTube and non-YouTube) illustrating MPAI-MMC V2.
  2. The slides presented at the online meeting on 2022/07/12.
  3. The Video recording of the online presentation (Youtube,non-YouTube) made at that 12 July presentation.
  4. The Call for TechnologiesUse Cases and Functional Requirements, Framework Licence. and Template for responses.

The MPAI 2022 Calls for Technologies – Part 3 (Neural Network Watermarking)

Research, personnel, training and processing can bring the development costs of a neural network anywhere from a few thousand to a few hundreds of thousand dollars. Therefore, the AI industry needs a technology to ensure traceability and integrity not only of a neural network, but also of the content generated by it (so-called inference). The content industry facing a  similar problem, has used watermarking to imperceptibly and persistently insert a payload carrying, e.g., owner ID, timestamp, etc. to signal the ownership of a content item. Watermarking can also be used by the AI industry.

The general requirements for using watermarking in neural networks are:

  • The techniques shall not affect the performance of the neural network.
  • The payload shall be recoverable even if the content was modified.

MPAI has classified the cases of watermarking use as follows:

  • Identification of actors (i.e., neural network owner, customer, and end-user).
  • Identification of the neural network model.
  • Detecting the modification of a neural network.

This classification is depicted in Figure 1 and concerns the use of watermarking technologies in neural networks and is independent of the intended use.

Figure 1 – Classification of neural network watermarking uses

MPAI has identified the need for a standard – code name MPAI-NNW – enabling users to measure the performance of the following component of a watermarking technology:

  • The ability of a watermark inserter to inject a payload without deteriorating the performance of the Neural Network.
  • The ability of a watermark detector to ascertain the presence and of a watermark decoder to retrieve the payload of the inserted watermark when applied to:
    • A modified watermarked network (e.g., by transfer learning or pruning).
    • An inference of the modified model.
  • The computational cost (e.g., execution time) of a watermark inserter to inject a payload, a watermark detector/decoder to detect/decode a payload from a watermarked model or from any of its inferences.

Figure 2 depicts the three watermarking components covered by MPAI-NNW.

Figure 2 – The three areas to be covered by MPAI-NNW

MPAI has issued a Call to acquire the technologies for use in the standard. The list below is a subset of the requests contained in the call:

  • Use cases
    • Comments on use cases.
  • Impact of the watermark on the performance
    • List of Tasks to be performed by the Neural Network (g. classification task, speech generation, video encoding, …).
    • Methods to measure the quality of the inference produced (g. precision, recall, subjective quality evaluation, PSNR, …).
  • Detection/Decoding capability
    • List of potential modifications that a watermark shall be robust against (g. pruning, fine-tuning, …).
    • Parameters and ranges of proposed modifications.
    • Methods to evaluate the differences between the original and retrieved watermarks (g., Symbol Error Rate).
  • Processing cost
    • Specification of the testing environments.
    • Specification of the values characterizing the processing of Neural Networks.

Below are a few useful links for those wishing to know more about the MPAI-NNW Call for Technologies and how to respond to it:

The MPAI secretariat shall receive the responses to the MPAI-NNW Call for Technologies by 2022 October


Answering a few basic questions about MPAI

Q: What is the main objective of MPAI?

A: There are many languages in the world, but if we want to reach all interested people we have to use a universally recognised language. The same thing happens with data: the more people understand the format the more value they have.

The definition of a universal language for music – the MP3 standard – led to a revolution that many – because they did not know the world before – cannot understand the extent. The same goes for video of which there were, up to the mid-90s, perhaps as many formats or sub-formats as there were countries in the world. Of course, today’s world of media would not exist if the crazy idea that each country is entitled to define its own format still prevailed.

MPAI aims to bring the same revolution to the world of data by specifying standard data formats that rely on AI. Data is increasingly processed with artificial intelligence techniques; therefore standards should take the peculiarity of artificial intelligence into account. As with the media, billions of citizens of the world will enjoy the benefits.

Q: Who can participate and contribute to the MPAI project?

A: MPAI is open to all legal entities who can contribute to the development of standards for data coding achieved, mainly but not exclusively, using artificial intelligence. An internal committee at MPAI evaluates membership applications based on a few simple criteria. MPAI, however, is also open to the participation of physical persons in the phases concerning the presentation of proposals and the definition of the functional requirements of a standard.

Q: How does the standardisation process in MPAI work, starting from the market needs to arrive at standards?

A: As I said, MPAI gives anyone the opportunity to submit proposals for standards and to contribute to the definition of their functional requirements. However, a different argument must be made for the definition of another important parameter of a standard – the commercial requirements – that is, when and at what conditions will I be able to use the standard? The answer to this question is not obvious because there are MPEG standards whose usage licences is not well defined or have become available many years after the standard was approved. Defining business requirements is the responsibility of MPAI’s core members, but anyone can propose technologies in response to MPAI’s Call for Technologies. If a non-member proposal is accepted, they must become a member. The development of a standard is therefore open to all MPAI members.

Q: With its recently published Multimodal Conversation (MPAI-MMC) V2 Call for Technologies seems particularly interesting in the field of machine learning applications for conversation analytics and especially sentiment analysis because it introduces Emotion, Cognitive State and Attitude (Personal Status) to enhance the conversation between humans and machines represented by avatars displaying Personal Status. Could you briefly describe some possible use cases and objectives of this call?

A: Back in September 2021, one year after the founding of MPAI, three standards were published, one of which concerned the human-machine conversation (Multimodal Conversation, MPAI-MMC). One use case concerns a machine that “understands” not only the speech but also the emotion contained in the voice and the face of the human the machine is talking to and responds with a sentence and an avatar face both displaying a pertinent emotion

In this year’s call, ambition is significantly greater because the concept of “emotion” has been extended to “personal state” which includes, in addition to emotion, cognitive state, and attitude of an entity that can be a human or a machine. Personal status can be extracted through a module called “personal status extraction” and provides an answer to “how much my text, my voice, my face and my gestures transmit my degree of knowledge of the subject I am talking about and my attitude towards the interlocutor?” The personal status of a machine can be manifested using a module that we call “personal status display” able to synthesise a speaking avatar that utters a sentence expressing a given personal status.

With these two modules and a series of other technologies, MPAI intends to support three use cases: a human talking to a machine about the objects contained in a room, a group of humans talking with an autonomous vehicle and a virtual video conference in which the participants are represented by avatars seated around a table and express the characteristics and movements of the participants. In the last use case, a virtual secretary understands what the avatars say along with their personal status and converts everything into a summary.

Q: Where are we regarding the definition of data coding standards for AI? Do you think we will reach a sufficient level to allow the construction of an ecosystem of AI applications that can really exploit standardised data coding protocols?

So far, we have approved 5 standards. Four are “technical”: context-based audio enhancement, multimodal conversation, probability of a company’s failure based on its data, and a standard environment for running AI applications. This last standard MPAI-AIF is at the basis of the other three because MPAI standards do not define monolithic, but component-based applications. An AI application is a workflow consisting of modules of which an MPAI application standard defines the functions and interfaces, i.e., the data that passes through them. A user of the standard can build their own application workflow by putting together modules from different sources that can work together because they conform with the “standard”. It is the Lego approach adapted to component-based applications. For this to be practically possible, governance of the ecosystem generated by MPAI standards is required. This is the goal of the fifth MPAI standard MPAI-GME that sets the rules to convert the idea of the user who builds a workflow and makes it work from a good dream to reality. A fundamental governance element is the MPAI Store, recently established as a non-profit organisation, from which a user can download not only complete applications – as they can do today from app stores – but also the components of the application they need coming from diverse sources. If the scenario of composing an application from components ceases to be a dream, then those who develop MPAI modules do not necessarily have to be economic giants to bear the development costs of monolithic AI applications, but they may very well be small and medium-sized companies specialising in potentially much narrower areas, a possibly better suited to innovation. In addition, the MPAI Store deals with module distribution.

This is the new world that is emerging, made possible by MPAI standards.

Join MPAI – Join the fun – Build the future!


The MPAI 2022 Calls for Technologies – Part 2 (Multimodal Conversation)

Processing and generation of natural language is an area where artificial Intelligence is expected to make a difference compared to traditional technologies. Version 1 of the MPAI Multimodal Conversation standard (MPAI-MMC V1), specifically the Conversation with Emotion use case, has addressed this and related challenges: processing and generation not only of speech but also of the corresponding human face when both convey emotion.

The audio and video produced by a human conversing with the machine represented by the blue box in Figure 1 is perceived by the machine which then generates a human-like speech and video in response.

Figure 1 – Multimodal conversation in MPAI-MMC V1

The system depicted in Figure 1 operates as follows (bold indicates module, underline indicates output, italic indicates input):

  1. Speech Recognition (Emotion) produces Recognised Text from Input Speech and the Emotion embedded in Recognised Text and in Input Speech.
  2. Video Analysis extracts the Emotion expressed in Input Video (human’s face).
  3. Emotion Fusion fuses the three Emotions into one (Fused Emotion).
  4. Language Understanding produces Text (Language Understanding) from Recognised Text and Fused Emotion.
  5. Dialogue Processing generates pertinent Output Text and Output Emotion using Text, Meaning, and Fused Emotion.
  6. Speech Synthesis (Emotion) synthesises Output Speech from Output Text.
  7. Lips Animation generates Output Video displaying the Output Emotion with lips animated by Output Speech using Output Speech, Output Emotion and a Video drawn from the Video of Faces Knowledge Base.

The MPAI-MMC V2 Call for Technologies issued in July 2022, seeks four major classes of technologies enabling a significant extension of the scope of its use cases:

  1. The internal status of a human from Emotion, defined as the typically non-rational internal status of a human resulting from their interaction with the Environment, such as “Angry”, “Sad”, “Determined” to two more internal statuses: Cognitive State, defined as the typically rational internal status of a human reflecting the way they understand the Environment, such as “Confused”, “Dubious”, “Convinced”, and Attitude, defined as the internal status of a human or avatar related to the way they intend to position themselves vis-à-vis the Environment, e.g., “Respectful”, “Confrontational”, “Soothing”. Personal Status is the combination of Emotion, Cognitive State and Attitude. These can be extracted not only from speech and face but from text and gesture (intended as the combination of the head, arms, hands and fingers) as well.
  2. The direction suggested by the Conversation with Emotion use case where the machine generates an Emotion pertinent to what it has heard (Input Speech) and seen (human face of Input Video) but also to what the machine is going to say (Output Text). Therefore, Personal Status is not just extracted from a human but can also be generated by a machine.
  3. Enabling solutions no longer targeted to a controlled environment but facing the challenges of the real world: to enable a machine to create the digital representation of an audio-visual scene composed of speaking humans in a real environment.
  4. Enabling one party to animate an avatar model using standard descriptors and model generated by another party.

More information about Personal Status and its applications in Personal Status Extraction and Personal Status Display can be found in Personal Status in human-machine conversation.

With these technologies MPAI-MMC V2 will support three new use cases:

  1. Conversation about a scene. A human has a conversation with a machine about the objects in a room. The human uses gestures to indicate the objects of their interest. The machine uses a personal status extraction module to better understand the internal status of the human and produces responses that include text and personal status. The machine manifests itself via a personal status display module (see more here).
  2. Human-Connected Autonomous Vehicle (CAV) Interaction. A group of humans interact with a CAV to get on board, request to be taken to a particular venue and have a conversation with the CAV while travelling. The CAV uses a personal status extraction module to better understand the personal status of the humans and produces responses that include Text and Personal Status. The CAV manifests itself via a personal status display module (again, see more here).
  3. Avatar-Based Videoconference. (Groups of) humans from different geographical locations participate in a virtual conference represented by avatars animated by descriptors produced by their clients using face and gesture descriptors supported by speech analysis and personal status extraction. The server performs speech translation and distributes avatar models and descriptors. Each participant places the individual avatars animated by their descriptors around a virtual table with their speech. A virtual secretary creates an editable summary recognising the speech and extracting the personal status of each avatar.

Figure 2 represents the reference diagram of Conversation about a Scene.

Figure 2 – Conversation about a Scene in MPAI-MMC V2

The system depicted in Figure 2 operates as follows:

  1. Visual Scene Description creates a digital representation of the visual scene.
  2. Speech Recognition recognises the text uttered by the human.
  3. Object Description, Gesture Description and Object Identification provide the ObjectID.
  4. Personal Status Extraction provides the human’s current Personal Status.
  5. Language Understanding  provides Text (Language Understanding) and Meaning.
  6. Question and Dialogue Processing generates Output Text and the Personal State of each of Speech, Face and Gesture.
  7. Personal Status Display produces a speaking avatar animated by Output Text and the Personal State of each of Speech, Face and Gesture.

The internal architecture of the Personal Status Display module is depicted in Figure 3.

Figure 3 – Architecture of the Personal Status Display Module

Those wishing to know more about the MPAI-MMC V2 Call for Technologies should review:

  1. The 2 min video (YouTube, non-YouTube) illustrating MPAI-MMC V2.
  2.  The slides presented at the online meeting on 2022/07/12.
  3. The video recording of the online presentation (YouTube, non-YouTube) made at that 12 July presentation.
  4. The Call for Technologies, Use Cases and Functional Requirements, Clarifications about MPAI-MMC V2 Call for Technologies data formats, Framework Licence, and Template for responses.

The MPAI Secretariat should receive the responses to the Call by 2022/10/10T23:59 UTC. Partial responses are welcome.


Making a standard is just the first step

A definition of standard is “the documented agreement by a group of people that certain things should be done in a certain way”. In its still short but intense life, MPAI has documented several such agreements about “certain things”. One of these things is emotion. MPAI has specified 59 words with attached semantics that identify a type of emotion and a mechanism to extend the words representing emotions. It has also defined the format of the financial and organisational data that feed a machine predicting the probability of default and the organisational adequacy of the company the financial data refers to.

However, the path from a documented agreement to real products, services and applications is not easy and not short. In the domain of software standards, a reference implementation is often required to assure those who were not part of the agreement that the agreement is sound. In all cases, you need an agreed procedure that allows a party to test an implementation and verify that it does indeed conform with the standard. In other words, if the standard can be compared to the law, conformance testing can be compared to the code of procedure.

MPAI application standards, such as Conversation with Emotion and Company Performance Prediction, are not monoliths. They are implemented as sets of modules (AI Modules – AIMs) connected in workflows (AI Workflows – AIWs) executed in an environment (AI Framework – AIF). MPAI specifies the functions and data formats of the AIMs and the function, data format and connections of their AIW. MPAI also specifies the APIs that an AIM and other AIF components call to execute an AIW. The goal is to enable a user to buy an AIF from one party, an AIW from another, and the AIMs also from different parties.

Another facet is that MPAI standards deal with a particular technology – Artificial Intelligence – that sets it aside from other technologies. In general, the value of a neural network resides in the “completeness” and “reliability” of the data used for the task that the neural network claims to be able to carry out. Users buy a network together with the assurance that the data used are OK. But is it? Can a user be safe if it relies on a neural network?

MPAI uses the word performance (of an implemented standard) to indicate all these issues and defines it as a set of attributes – Reliability (e.g., quality), Robustness (e.g., ability to operate outside its original domain), Replicability (e.g., tests done by one party can be replicated by another) and Fairness (e.g., dataset or the network are open to being tested for bias).

Since its early days, MPAI has decided that the complex ecosystem composed of MPAI, its specifications, implementers of specifications, conformance testers, and performance assessors could hardly get their act together in a smoothly running ecosystem. MPAI would be unable to honour its promises of an open competitive market of components, implicit in the notion of interoperable AIMs nicely fitting in an AIW, and running in an AIF.

MPAI has specified its solution in the Governance of the MPAI Ecosystem (MPAI-GME) standard that specifies the roles of the MPAI Ecosystem players:

  1. MPAI:
    1. Develops and publishes standards, i.e., the set of Technical Specification, Reference Software Specification, Conformance Testing Specification, ad Performance Assessment Specification.
    2. Establishes the MPAI Store (see point 2.).
    3. Appoints Performance Assessors (see point 3.).
  2. The MPAI Store:
    1. Assign identifiers to implementers of MPAI standards (this is required to identify an implementation).
    2. Receive implementations of MPAI standards submitted by implementers.
    3. Verify that implementations are secure.
    4. Test the conformance of implementations to the appropriate MPAI standard or to one of its use cases.
    5. Receive reports from Performance Assessors about the grade of performance of implementations.
    6. Label with their interoperability level and make available for download implementations, and publish reviews of implementation user experiences.
    7. Publish reviews of implementations communicated by Users.
  3. Implementers of MPAI Technical Specifications:
    1. Obtain an implementer ID from the MPAI Store.
    2. Submit implementations to the MPAI Store for security verification and conformance testing prior to distribution.
    3. May submit implementations to Performance Assessors.
  4. Performance Assessors:
    1. Are appointed by MPAI for a particular domain of expertise.
    2. Assess implementations for performance, typically for a fee.
    3. Make the MPAI Store and implementer aware of their findings about an implementation.
  5. Users:
    1. Download implementations.
    2. May communicate reviews of downloaded implementations to the MPAI Store.

Figure 1 describes the operation of the MPAI Ecosystem

Figure 1 – The MPAI Ecosystem

Another piece of the Governance of the MPAI Ecosystem has been put in place on the 9th of August by a group of individuals active in MPAI who have established the MPAI Store in Scotland as a Company Limited by Guarantee.