Moving Picture, Audio and Data Coding
by Artificial Intelligence

Archives: 2022-09-21

MPAI calls for new members to support its standard development plans

Geneva, Switzerland – 23 November 2022. Today the international, non-profit, unaffiliated Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) standards developing organisation has concluded its 26th General Assembly (MPAI-26). MPAI is calling for new members to support the development of its work program.

Planned for approval in the first months of 2023 are 5 standards and 1 technical report:

  1. AI Framework (MPAI-AIF). Standard for a secure AIF environment executing AI Workflows (AIW) composed of AI Modules (AIM).
  2. Avatar Representation and Animation (MPAI-ARA). Standard for generation and animation of interoperable avatar models reproducing humans and expressing a Personal Status.
  3. Context-based Audio Enhancement (MPAI-CAE). Standard to describe an audio scene to support human interaction with autonomous vehicles and metaverse applications.
  4. Multimodal Conversation (MPAI-MMC). Standard for Personal Status generalising the notion of Emotion including Cognitive State and Social Attitude.
  5. MPAI Metaverse Model (MPAI-MMM). Technical Report covering the design, deployment, operation, and interoperability of Metaverse Instances.
  6. Neural Network Watermarking (MPAI-NNW). Standard specifying methodologies to evaluate neural network-based watermarking solutions

The MPAI work plan also includes exploratory activities, some of which are close to becoming standard or technical report projects:

  1. AI Health (MPAI-AIH). Targets an architecture where smartphones store users’ health data processed using AI and AI Models are updated using Federated Learning.
  2. Connected Autonomous Vehicles (MPAI-CAV). Targets the Human-CAV Interaction Environment Sensing, Autonomous Motion, and Motion Actuation subsystems implemented as AI Workflows.
  3. End-to-End Video Coding (MPAI-EEV). Extends the video coding fronties using AI-based End-to-End Video coding.
  4. AI-Enhanced Video Coding (MPAI-EVC). Improves existing video coding with AI tools for short-to-medium term applications.
  5. Server-based Predictive Multiplayer Gaming (MPAI-SPG). Uses AI to train neural networks that help an online gaming server to compensate data losses and detects false data.
  6. XR Venues (MPAI-XRV). Identifies common AI Modules used across various XR-enabled and AI-enhanced use cases where venues may be both real and virtual.

It is a good opportunity for legal entities supporting the MPAI mission and able to contribute to the development of standards for the efficient use of data to join MPAI now, also considering that membership is immediately active and will last until 2023/12/31.

Please visit the MPAI website, contact the MPAI secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook, Instagram, and YouTube.

Most importantly: please join MPAI, share the fun, build the future.


End-to-End Video Coding in MPAI

Introduction

During the past decade, the Unmanned-Aerial-Vehicles (UAVs) have attracted increasing attention due to their flexible, extensive, and dynamic space-sensing capabilities. The volume of video captured by UAVs is exponentially growing along with the increased bitrate generated by the advancement of the sensors mounted on UAVs, bringing new challenges for on-device UAV storage and air-ground data transmission. Most existing video compression schemes were designed for natural scenes without consideration of specific texture and view characteristics of UAV videos. In MPAI EEV project, we have contributed a detailed analysis of the current state of the field of UAV video coding. Then EEV establishes a novel task for learned UAV video coding and construct a comprehensive and systematic benchmark for such a task, present a thorough review of high quality UAV video datasets and benchmarks, and contribute extensive rate-distortion efficiency comparison of learned and conventional codecs after. Finally, we discuss the challenges of encoding UAV videos. It is expected that the benchmark will accelerate the research and development in video coding on drone platforms

UAV Video Sequences

We collect a set of video sequences to build the UAV video coding benchmark from those diverse contents, considering the recording device type (various models of drone-mounted cameras), diverse in many aspects including location (in-door and out-door places), environment (traffic workload, urban and rural regions), objects (e.g., pedestrian and vehicles), and scene object density (sparse and crowded scenes).Table 1 provides a comprehensive summary of the prepared learned drone video coding benchmark for a better understanding of those videos.

Table 1: Video sequence characteristics of the proposed learned UAV video coding benchmark

Source Sequence

Name

Spatial

Resolution

Frame

Count

Frame

Rate

Bit

Depth

Scene

Feature

 

Class A VisDrone-SOT

BasketballGround 960×528 100 24 8 Outdoor
GrassLand 1344×752 100 24 8 Outdoor
Intersection 1360×752 100 24 8 Outdoor
NightMall 1920×1072 100 30 8 Outdoor
SoccerGround 1904×1056 100 30 8 Outdoor
Class B

VisDrone-MOT

Circle 1360×752 100 24 8 Outdoor
CrossBridge 2720×1520 100 30 8 Outdoor
Highway 1344×752 100 24 8 Outdoor
Class C

Corridor

Classroom 640×352 100 24 8 Indoor
Elevator 640×352 100 24 8 Indoor
Hall 640×352 100 24 8 Indoor
Class D

UAVDT S

Campus 1024×528 100 24 8 Outdoor
RoadByTheSea 1024×528 100 24 8 Outdoor
Theater 1024×528 100 24 8 Outdoor

The corresponding thumbnail of each video clip is depicted in Fig. 1 as supplementary information. There are 14 video clips from multiple different UAV video dataset sources [1, 2, 3]. Their resolutions and frame rates range from 2720 × 1520 down to 640 × 352 and 24 to 30 respectively.

To comprehensively reveal the R-D efficiency of UAV video using both conventional and learned codecs, we encode the above-collected drone video sequences using the HEVC reference software with screen content coding (SCC) extension (HM-16.20-SCM-8.8) and the emerging learned video coding framework OpenDVC [4]. Moreover, the reference model of MPAI End-to-end Video (EEV) is also employed to compress the UAV videos. As such, the baseline coding results are based on three different codecs. Their schematic diagrams are shown in Fig. 1. The left panel represents the classical hybrid codec. The remaining two are learned codecs, OpenDVC and EEV respectively. It is easy to observe that the EEV software is an enhanced version of OpenDVC codecthat incorporates more advanced modules such as motion compensation prediction improvement, two-stage residual modelling, and in-loop restoration network.

Figure 1 Block diagram of different codecs. (a) Conventional hybrid codec HEVC. (b)

OpenDVC. (3) MPAI EEV. Zoom-in for better visualization

Another important factor for learned codecs is train-and-test data consistency. It is widely accepted in the machine learning community that train and test data should be independent and identically distributed. However, both OpenDVC and EEV are trained using the natural video dataset vimeo-90k with mean-square-error (MSE) as distortion metrics. We employ those pre-trained weights of learned codecs without fine-tuning them on drone video data to guarantee that the benchmark is general.

 Evaluation.

Since all drone videos in our proposed benchmark use the RGB color space, the quality assessment methods are also applied to the reconstruction in the RGB domain. For each frame, the peak-signal-noise-ratio (PSNR) is  are calculated for each component channel. Then the RGB averaged value is obtained to indicate its picture quality. Regarding the bitrate, we calculate bit-per-pixel (BPP) using the binary files produced by codecs. We report the coding efficiency of different codecs using the Bjøntegaard delta bit rate (BD-rate) measurement.

Table 1: The BD-rate performance of different codecs (OpenDVC, EEV, and HM-16.20- SCM-8.8) on drone video compression. The distortion metric is RGB-PSNR.

Category Sequence

Name

BD-Rate Reduction

EEV vs OpenDVC

BD-Rate Reduction

EEV vs HEVC

 

Class A VisDrone-SOT

BasketballGround -23.84% 9.57%
GrassLand -16.42% -38.64%
Intersection -18.62% -28.52%
NightMall -21.94% -6.51%
SoccerGround -21.61% -10.76%
Class B VisDrone-MOT Circle -20.17% -25.67%
CrossBridge -23.96% 26.66%
Highway -20.30% -12.57%
Class C Corridor Classroom -8.39% 178.49%
Elevator -19.47% 109.54%
Hall -15.37% 58.66%
Class D UAVDT S Campus -26.94% -25.68%
RoadByTheSea -20.98% -24.40%
Theater -19.79% 2.98%
Class A -20.49% -14.97%
Class B -21.48% 3.86%
Class C -14.41% 115.56%
Class D -22.57% -15.70%
Average -19.84% 15.23%

The corresponding PSNR based R-D performances of the three different codecs are shown in Table 1. Regarding the simulation results, it is observed that around 20% bit-rate reduction could be achieved when comparing EEV and OpenDVC codec. This shows promising performances for the learned codecs and its improvement made by EEV software.

When we directly compare the coding performance of EEV and HEVC, obvious performance gap between the in-door and out-door sequences could be observed. Generally speaking, the HEVC SCC codec outperforms the learned codec by 15.23% over all videos. Regarding Class C, EEV is significantly inferior to HEVC by clear margin, especially for the Classroom and elevator sequences. Such R-D statistics reveal that learned codecs are more sensitive to the video content variations than conventional hybrid codecs if we directly apply natural-video-trained codec to UAV video coding. For future research, this point could be resolved and modeled as an out-of-distribution problem and extensive knowledge could be borrowed from the machine learning community.

To further dive into the R-D efficiency interpretation of different codecs, we plot the R-D curves of different methods in Fig. 2. Specifically, we select Camplus and Highway for illustration. The blue-violet, peach-puff, and steel-blue curves denote EEV, HEVC, and OpenDVC codec respectively. The content characteristic of UAV videos and its distance to the natural videos shall be modeled and investigated in future research.

MPAI-EEV Working Mechanism

This work was accomplished in the MPAI-EEV coding project, which is an MPAI standard project seeking to compress video by exploiting AI-based data coding technologies. Within this workgroup, experts around the globe gather and review the progress, and plan new efforts every two weeks. In its current phase, attendance at MPAI-EEV meetings is open to interested experts. Since its formal establishment in Nov. 2021, the MPAI EEV has released three major versions of it reference models. MPAI-EEV plans on being an asset for AI-based end-to-end video coding by continuing to contribute new development in the end-to-end video coding field.

This work, contributed by MPAI-EEV, has constructed a solid baseline for compressing UAV videos and facilitates the future research works for related topics.

Reference

[1]       Pengfei Zhu, Longyin Wen, Dawei Du, Xiao Bian, Heng Fan, Qinghua Hu, and Haibin Ling, “Detection and Tracking Meet Drones Challenge,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 11, pp. 7380–7399, 2021.

[2]       A. Kouris and C.S. Bouganis, “Learning to Fly by MySelf: A Self-Supervised CNN- based Approach for Autonomous Navigation,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2018, pp. 5216–5223.

[3]       Dawei Du, Yuankai Qi, Hongyang Yu, Yifan Yang, Kaiwen Duan, Guorong Li, Weigang Zhang, Qingming Huang, and Qi Tian, “The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking,” in European Conference on Computer Vision, 2018, 370–386.

[4]       Ren Yang, Luc Van Gool, and Radu Timofte, “OpenDVC: An Open Source Imple- mentation of the DVC Video Compression Method,” arXiv preprint arXiv:2006.15862, 2020.

 

 

 


And now, whither MPAI?

After establishing 25 months ago, adding 5 standards in its game bag, and submitting 4 of them to IEEE for adoption, what is the next MPAI challenge? The answer is that MPAI has more than one challenge in its viewfinder and that this post will report on the first of them, namely the next standards MPAI is working on.

AI Framework (MPAI-AIF). Version 1 (V1) of this standard specifies an environment where non-monolithic component-based AI applications are executed. The new (V2) standard is adding a set of APIs that enable an application developer to select a security level or implement a particular security solution.

Context-based Audio Enhancement (MPAI-CAE). One MPAI-CAE V1 use case is Enhanced Audioconference Experience where the remote end can correctly recreate the sound sources at its end by using the Scene Description of the Audio  at the transmitting side. The new (V2) standard is targeting more challenging environments than a room, such as a human outdoor talking to a vehicle whose speech must be as clean as possible. Therefore,  a more powerful audio scene description needs to be developed.

Multimodal Conversation (MPAI-MMC). One MPAI-MMC V1 use case is a machine talking to a human and extracting the human’s emotional state from their text, speech, and face to improve the quality of the conversation. The new (V2) standard is augmenting the scope of the understanding of the human internal state by introducing Personal Status combining emotion, cognitive state and social attitude. MPAI-MMC applies it to three new use cases: Conversation about a Scene, Virtual Secretary, and Human-Connected Autonomous Vehicles Interaction (which uses the MPAI-CAE V2 technology).

Avatar Animation and Representation (MPAI-ARA). The new (V1) MPAI-ARA standard addresses several areas where the appearance of a human is mapped to an avatar model. One example is the Avatar-Based Videoconference Use Case where the appearance of an avatar is expected to faithful reproduce a human participant or where a machine conversing with humans displays itself as an avatar showing a Personal Status consistent with the conversation.

Neural Network Watermarking. The new (V1) MPAI-NNW standard specifies methodologies to evaluate neural network watermarking technologies in the following areas:

  • The impact on the performance of a watermarked neural network (and its inference).
  • The ability of the detector/decoder to detect/decode a payload when the watermarked neural network has been modified.
  • The computational cost of injecting, detecting, or decoding a payload in the watermark.

MPAI Metaverse Model. The new (V1) MPAI-MMM Technical Report identifies, organises, defines, and exemplifies functionalities generally considered useful to the metaverse without assuming that a specific metaverse implementation support any of them.

All these are ambitious targets but the work is supported by the submissions received in response to the relevant Calls for Technologies and MPAI’s internal expertise.

This is the first of the current MPAI objectives. It is good enough to convince you to  join MPAI now. Read all the 7 good reasons in the MPAI blog.


Seven good reasons to join MPAI

MPAI, the international, unaffiliated, non-profit organisation developing AI-based data coding standards with clear Intellectual Property Rights licensing frameworks, is now offering those wishing to join MPAI the opportunity to start their 2023 membership two months in advance, from the 1st of November 2022.

Here are six more good reasons why you should join MPAI now.

  1. In a matter of months after its establishment in September 2020, MPAI has developed 5 standards. Now it is working to extend 3 of them (AI Framework, Context-based Audio Enhancement, and Multimodal Conversation), and to develop 2 new standards (Neural Network Watermarking and Avatar Representation and Animation). More in the latest press release.
  2. MPAI enforces a rigorous standards development process and offers an open route to convert – without modification – its specifications to IEEE standards. Four MPAI standards – AI Framework (P3301), Context-based Audio Enhancement (P3302), Compression and Understanding of Industrial Data (P3303), and Multimodal Conversation (P3304) – are expected to become IEEE standards in a matter of weeks,.
  3. MPAI has proved that AI-based standards in disparate technology areas – execution of AI application, audio, speech, natural language processing, and financial data – can be developed in a timely manner. It is currently developing standards for avatar representation and animation, and neural network watermarking. More projects are in the pipeline in health, connected autonomous vehicles, short-medium and long-term video coding, online gaming, extended reality venues, and the metaverse.
  4. MPAI role extends from an environment to develop standards to a stepping stone to make its standards practically and timely usable. In a matter of months after standard approval, patent holders have already selected a patent pool administrator for some MPAI standards.
  5. MPAI is the root of trust of an ecosystem specified by its Governance of the MPAI Ecosystem grafted on its standards development process. The ecosystem includes a Registration Authority where implementers can get identifiers for their implementations, and the MPAI Store, a not-for-profit entity with the mission to test the security and conformance of implementations, make them available for download and publish their performance as reported by MPAI-appointed Performance Assessors.
  6. MPAI works on leading-edge technologies and its members have  already been given many opportunities to publish the results of their research and standard development results at conferences and in journals.

Joining MPAI is easy. Send to the MPAI Secretariat the application form, the signed Statutes and a copy of the bank transfer of 480/2400 EUR for associate/principal membership.

Join the fun – Build the future!


MPAI extends 3 and develops 2 new standards

Geneva, Switzerland – 26 October 2022. Today the international, non-profit, unaffiliated Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) standards developing organisation has concluded its 25th General Assembly (MPAI-25). Among the outcomes is the decision, based on substantial inputs received in response to its Calls for Technologies, to extend three of its existing standards and to initiate the development of two new standards.

The three standards being extended are:

  1. AI Framework (MPAI-AIF). AIF is an MPAI-standardised environment where AI Workflows (AIW) composed of AI Modules (AIM) can be executed. Based on substantial industry input, MPAI is in a position to extend the MPAI-AIF specification with a set of APIs that allow a developer to configure the security solution adequate for the intended application.
  2. Context-based Audio Enhancement (MPAI-CAE). Currently, MPAI-CAE specifies four use cases: – Emotion-Enhanced Speech, Audio Recording Preservation, Speech Restoration Systems, and Enhanced Audioconference Experience. The last use case includes technology to describe the audio scene of an audio/video conference room in a standard way. MPAI-CAE is being extended to support more challenging environments such as human interaction with autonomous vehicles and metaverse applications.
  3. Multimodal Conversation (MPAI-MMC). MPAI-MMC V1 has specified a robust and extensible emotion description system. In the currently developed V2, MPAI is generalising the notion of Emotion to cover two more internal statuses: Cognitive State and Social Attitude and is specifying a new data format covering the three internal statuses called Personal Status.

The two new standards under development are:

  1. Avatar Representation and Animation (MPAI-ARA). The standard intends to provide technology to enable:
    1. A user to generate an avatar model and then descriptors to animate the model, and an independent user to animate the model using the model and the descriptors.
    2. A machine to animate a speaking avatar model expressing the Personal Status that the machine has generated during the conversation with a human (or another avatar).
  2. Neural Network Watermarking (MPAI-NNW). The standard specifies methodologies to evaluate neural network watermarking solutions:
    1. The impact on the performance of a watermarked neural network (and its inference).
    2. The ability of the detector/decoder to detect/decode a payload when the watermarked neural network has been modified.
    3. The computational cost of injecting, detecting in or decoding a payload from the watermark.

Development of these standards is planned to be completed in the early months of 2023.

MPAI-25 has also confirmed its intention to develop a Technical Report (TR) called MPAI Metaverse Model (MPAI-MMM). The TR will cover all aspects underpinning the design, deployment, and operation of a Metaverse Instance, especially interoperability between Metaverse Instances.

So far, MPAI has developed five standards for applications that have AI as the core enabling technology. It is now extending three of those existing, developing two new standards and one technical report, and engaged in the drafting of functional requirements for nine future standards. It is thus a good opportunity for legal entities supporting the MPAI mission and able to contribute to the development of standards for the efficient use of data to join MPAI. Also considering that by joining on or after the 1st of November 2022, membership is immediately active and will last until 2023/12/31.

Developed standard Acronym Brief description
Compression and Understanding of Industrial Data MPAI-CUI Predicts the company’s performance from governance, financial, and risk data.
Governance of the MPAI Ecosystem MPAI-GME Establishes the rules governing the submission of and access to interoperable implementations.
Standard being extended Acronym Brief description
AI Framework MPAI-AIF Specifies an infrastructure enabling the execution of implementations and access to the MPAI Store.
Context-based Audio Enhancement MPAI-CAE Improves the user experience of audio-related applications in a variety of contexts.
Multimodal Conversation MPAI-MMC Enables human-machine conversation emulating human-human conversation.
Standard being developed Acronym Brief description
Avatar Representation and Animation MPAI-ARA Specifies descriptors of avatars impersonating real humans.
MPAI Metaverse Model MPAI-MMM Development of a technical report guiding creation and operation of Interoperable Metaverses.
Neural Network Watermarking MPAI-NNW Measures the impact of adding ownership and licensing information to models and inferences.
Standard being developed Acronym Brief description
AI Health MPAI-AIH Specifies components to securely collect, AI-based process, and access health data.
Connected Autonomous Vehicles MPAI-CAV Specifies components for Environment Sensing, Autonomous Motion, and Motion Actuation.
End-to-End Video Coding MPAI-EEV Explores the promising area of AI-based “end-to-end” video coding for longer-term applications.
AI-Enhanced Video Coding MPAI-EVC Improves existing video coding with AI tools for short-to-medium term applications.
Integrative Genomic/Sensor Analysis MPAI-GSA Compresses high-throughput experiments’ data combining genomic/proteomic and other data.
Mixed-reality Collaborative Spaces MPAI-MCS Supports collaboration of humans represented by avatars in virtual-reality spaces.
Visual Object and Scene Description MPAI-OSD Describes objects and their attributes in a scene.
Server-based Predictive Multiplayer Gaming MPAI-SPG Trains a network to compensate data losses and detects false data in online multiplayer gaming.
XR Venues MPAI-XRV XR-enabled and AI-enhanced use cases where venues may be both real and virtual.

Please visit the MPAI website, contact the MPAI secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook, Instagram, and YouTube.

Most importantly: please join MPAI, share the fun, build the future.


MPAI, third year and counting

Moving Picture, Audio, and Data Coding by Artificial Intelligence – MPAI – was established two years ago, on 30 September 2020.. It is a good time to review what it has done, what it plans on doing, and how the machine driving the organisation works.

These are the main results of the 2nd year of MPAI activity (the activities of the first year can be found here).

  1. Approved the MPAI-AIF (AI Framework) and MPAI-CAE (Context-based Audio Enhancement) Technical Specifications.
  2. Approved several revisions of all five Technical Specifications.
  3. Promoted the establishment of a patent pool for MPAI standards (as mandated by the MPAI Statutes).
  4. Submitted four Technical Specifications to IEEE for adoption without modification as IEEE standards. Three – AIF, CAE, and MMC (Multimodal Conversation) – should be approved as IEEE Standards by the end of 2022.
  5. Kickstarted the MPAI Ecosystem by
    1. Promoting the establishment of the MPAI Store.
    2. Adopting the MPAI-MPAI Store Agreement.
    3. Contributing to the definition of the IEEE-MPAI Store agreement on ImplementerID Registration Authority.
    4. Revising the MPAI-GME (Governance of the MPAI Ecosystem) standard to accommodate the developments of the Ecosystem.
  6. Developed the MPAI-AIF Reference Software to enable implementation of other MPAI Technical Specifications.
  7. Developed drafts of the MPAI-AIF Conformance Testing,
  8. Developed MPAI-CAE and MPAI-MMC Reference Software and Conformance Testing drafts.
  9. Continued the development of Use Cases and Functional Requirements for Connected Autonomous Vehicles (MPAI-CAV), AI-Enhanced Video Coding (MPAI-EVC), and Server-based Predictive multiplayer Gaming (MPAI-SPG).
  10. Opened new activities in AI Health (MPAI-AIH), Avatar Representation and Animation (MPAI-ARA), End-to-End Video Coding (EEV), MPAI Metaverse Model (MMM), Neural Network Watermarking (NNW), and XR Venues (XRV).
  11. Developed 3 Calls for Technologies for MPAI-AIF V2, MPAI-MMC V2, and MPAI-NNW. Responses are expected by MPAI-25.
  12. Published Towards Pervasive and Trustworthy Artificial Intelligence.
  13. Published papers on conferences and journals.

The organisation of the machine that has produced these results is depicted in Figure 1.

Figure 1 – The MPAI organisation (Sptember 2022)

The General Assembly is the supreme body of MPAI. It establishes Developing Committees (DC) tasked with the development of standards: AI Framework, Context-based Audio Enhancement, Compression and Understanding of Industrial Data, Governance of the MPAI Ecosystem, Multimodal Conversation, and Neural Network Watermarking. It also directs Standing Committees, currently, the Requirements SC under which the following groups operate: AI Health, Avatar Representation and Animation, Connected Autonomous Vehicles, End-to-End Video Coding, AI-Enhanced Video Coding, MPAI Metaverse Model, and Server-based Predictive multiplayer Gaming, and XR Venues.

The Board of Directors is in charge of day-to-day activitivies and oversees five Advisory Committees: Membership and Nominating, Finance and Audit, IPR Support, Industry and Standards, and Communication.

The Secretariat performs such activities as keeping the list of members, organising meeting, communicating etc.

In its third year of activities MPAI plans on:

  1. Make the MPAI Ecosystem fully operational.
  2. Complete the specification sets of MPAI-AIF, MPAI-CAE, and MPAI-MMC.
  3. Promote the development of the MPAI implementations market.
  4. Develop extensions of MPAI-AIF and MPAI-MMC.
  5. Develop the new MPAI-NNW Technical Specification.
  6. Achieve publication of MPAI-AIF, MPAI-CAE, MPAI-CUI, and MPAI-MMC as IEEE standards.
  7. Submit MPAI-AIF V2, MPAI-CAE V2, MPAI-MMC V2, MPAI-NNW for adoption without modification by IEEE.
  8. Initiate the standard development process for MPAI-AIH, MPAI-ARA, MPAI-CAV, MPAI-EVC, MPAI-SPG, and MPAI-XRV.
  9. Promote the MPAI Metaverse Model and make it the compass for MPAI standard development.
  10. Be ready to exploit new standardisation opportunities.
  11. Continue active publication of papers on MPAI activities and results.
  12. Strengthen relationship with other bodies.

At its 24th General Assembly the Board announced the new membership policy: those who join MPAI after the 1st of November will have a 14-month membership until the 31st of December 20223. A good opportunity to join the fun, build the future!


MPAI appoints MPAI Store, incorporated as Company Limited by Guarantee, as the MPAI Store in the MPAI Ecosystem

Geneva, Switzerland – 30 September 2022. Today the international, non-profit, unaffiliated Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) standards developing organisation has concluded its 24th General Assembly (MPAI-24). Among the outcomes is the appointment of MPAI Store, a company limited by guarantee incorporated in Scotland, as the “MPAI Store” referenced to by the Governance of the MPAI Ecosystem standard (MPAI-GME).

The tasks of the MPAI Store are critical for the operation of the MPAI Ecosystem. Some of these are:

  1. Operation on a cost-recovery basis.
  2. Registration Authority function, i.e., assignment of an ID to implementers.
  3. Testing of implementations of MPAI technical specifications submitted by implementers.
  4. Labelling of implementations based on the verified interoperability level.
  5. Distribution of implementations via high-availability ICT infrastructure.

MPAI-24 has reiterated the deadline extension for submitting responses to the Calls for Technologies on AI Framework, Multimodal Conversation, and Neural Network watermarking until the 24th of October. The link to all documents relevant to the Calls can be found on the MPAI website.

MPAI develops data coding standards for applications that have AI as the core enabling technology. Any legal entity supporting the MPAI mission may join MPAI, if able to contribute to the development of standards for the efficient use of data.

 

So far, MPAI has developed 5 standards (not italic in the list below), is currently engaged in extending 2 approved standards (underlined) and is developing another 10 standards (italic).

 

 

Name of standard Acronym Brief description
AI Framework MPAI-AIF Specifies an infrastructure enabling the execution of implementations and access to the MPAI Store.
Context-based Audio Enhancement MPAI-CAE Improves the user experience of audio-related applications in a variety of contexts.
Compression and Understanding of Industrial Data MPAI-CUI Predicts the company’s performance from governance, financial, and risk data.
Governance of the MPAI Ecosystem MPAI-GME Establishes the rules governing the submission of and access to interoperable implementations.
Multimodal Conversation MPAI-MMC Enables human-machine conversation emulating human-human conversation.
Avatar Representation and Animation MPAI-ARA Specifies descriptors of avatars impersonating real humans.
Connected Autonomous Vehicles MPAI-CAV Specifies components for Environment Sensing, Autonomous Motion, and Motion Actuation.
End-to-End Video Coding MPAI-EEV Explores the promising area of AI-based “end-to-end” video coding for longer-term applications.
AI-Enhanced Video Coding MPAI-EVC Improves existing video coding with AI tools for short-to-medium term applications.
Integrative Genomic/Sensor Analysis MPAI-GSA Compresses high-throughput experiments’ data combining genomic/proteomic and other data.
Mixed-reality Collaborative Spaces MPAI-MCS Supports collaboration of humans represented by avatars in virtual-reality spaces.
MPAI Metaverse Model MPAI-MMM Development of a reference model to guide the creation of Interoperable Metaverse Instances.
Neural Network Watermarking MPAI-NNW Measures the impact of adding ownership and licensing information to models and inferences.
Visual Object and Scene Description MPAI-OSD Describes objects and their attributes in a scene.
Server-based Predictive Multiplayer Gaming MPAI-SPG Trains a network to compensate data losses and detects false data in online multiplayer gaming.
XR Venues MPAI-XRV XR-enabled and AI-enhanced use cases where venues may be both real and virtual.

 

Please visit the MPAI website, contact the MPAI secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook, Instagram, and YouTube.

Most importantly: please join MPAI, share the fun, build the future.


Imperceptibility, Robustness, and Computational Cost in Neural Network Watermarking

Introduction

Research efforts, specific skills, training and processing can cumulatively bring the development costs of a neural network anywhere from a few thousand to a few hundreds of thousand dollars. Therefore, the AI industry needs a technology to ensure traceability and integrity not only of a neural network but also of the content generated by it (so-called inference).

Faced with a similar problem, the digital content production and distribution industry has considered watermarking as a tool to insert a payload carrying data such as timestamping or owner ID information. If the inserted payload is imperceptible and persistent, it can be used to signal the ownership of a content item or the semantic modification of its content.

A role for MPAI?

MPAI has assessed that watermarking can also be used by the AI industry and intends to develop a standard to assess the performance of neural network watermarking technologies. Users with different applications in mind can be interested in neural network watermarking. For instance, the owner, i.e., the developer of a neural network, is interested in having their neural network protected by the “best” watermarking solution. The watermarking provider, i.e., the developer of the watermarking technology, is interested in evaluating the performance of their watermarking technology. In its turn, the customer, i.e., the provider of an end product needs the owner’s and watermarking provider’s solution to offer a product or a service. Finally, the end-user buys or rents the product and uses it.

All these users are mainly interested in three neural network watermarking properties: imperceptibility, persistency, and computational complexity.

Neural network watermarking imperceptibility

One of the features that a user of a watermarking technology may be interested in is assessing the impact that the embedding of a watermark in a neural network has on the quality of the inference that the neural network provides.

MPAI has identified the following process to test imperceptibility:

  1. Select a pair of training and testing datasets and a set of M unwatermarked neural networks.
  2. Insert a watermark in each neural network with D different data payloads, yielding M x (D + 1) neural networks: M x D watermarked neural networks and M unwatermarked neural networks.
  3. Feed the M x (D + 1) neural networks with the testing dataset and measure the quality of the produced inference.

Neural network watermarking persistence

One of the features that a user of a watermarking technology may be interested in is assessing the capability of the detector to ascertain the presence of the watermark and the capability of the decoder to retrieve from a modified version of the neural network.

MPAI has identified the following process to test the capability of the detector to find the watermark in the neural network:

  1. Repeat step 1 above.
  2. Repeat step 2 above.
  3. Repeat step 3 above.
  4. Apply one of the modifications (to be specified by the standard), with the goal to alter the watermark. Each modification must be characterised by a set of parameters that will challenge the robustness of the watermark.
  5. Feed the M x (D + 1) neural networks to the detector and record the decision –“watermark present” or “watermark absent”.
  6. Mark the results as true positive, true negative, false positive (false alarm) and false negative (missed detection).

The process to test the capability of the decoder to retrieve the payload in the neural network requires similar steps as above where “presence and absence” is replaced by “distance between the retrieved payload and the original payload”.

The computational cost

One of the features that a user of a watermarking technology may be interested in is evaluating the processing cost of a watermarking solution (in terms of computing resources and/or time).

The MPAI Call for Technologies

The MPAI process is to develop Use Cases and Functional Requirements, issue Calls for Technologies, receive and assess responses to the Call, and develop a standard for assessing the performance of a neural network watermarking technology. The published document can be found here. The MPAI secretariat should receive responses by 2022/10/24.

 

 


Avatars and the MPAI-MMC V2 Call for Technologies

The goal of the MPAI Multimodal Conversation (MPAI-MMC) standard is to enable forms of human-machine conversation that emulate the human-human one in completeness and intensity. While this is clearly a long-term goal, MPAI is focusing on standards providing frameworks which break down – where possible – complex AI functions to facilitate the formation of a component market where solution aggregators can find AI Modules (called AIM) to build AI Workflows (called AIW) corresponding to standard use cases. The AI Framework standard (MPAI-AIF) is a key enabler of this plan.

In September 2021, MPAI approved Multimodal Conversation V1 with 5 use cases. The first one – Conversation with Emotion – assumes that a human converses with a machine that understands what the human says, extracts the human’s emotion from their speech and face, articulates a textual response with an attached emotion, and converts it into synthetic speech containing emotion and a video containing a face expressing the machine’s emotion whose lips are properly animated.

The second MPAI-MMC V1 use case was Multimodal Question Answering. Here a human asks a question to a machine about an object. The machine understands the question and the nature of the object and generates a text answer which is converted to synthetic speech.

The other use cases are about automatic speech translation and they are not relevant for this article.

In July 2022, MPAI issued a Call for Technologies with the goal to acquire the technologies needed to implement three more Multimodal Conversation use cases. One concerns the extension of the notion of “emotion” to “Personal Status”, an element of the internal state of a person which also contains cognitive status (what a human or a machine has understood about the context) and attitude (what is the stance the human or the machine intends to adopt in the context). Personal status is conveyed by text, speech, face, and gesture. See here for more details. Gesture is the second ambition of MPAI-MMC V2.

A use case of MPAI-MMC V2 is “Conversation about a Scene” and can be described as follows:

A human converses with a machine indicating the object of their interest. The machine sees the scene and hears the human; extracts and understands the text from the human’s speech and the personal status in their speech, face, and gesture; understands the object intended by the human; produces a response (text) with its own personal status; and manifests itself as a speaking avatar.

Figure 1 depicts a subset of the technologies that MPAI needs in order to implement this use case.

Figure 1 – The audio-visual front end

These are the functions of the modules and the data provided:

  1. The Visual Scene Description module analyses the video signal, describes, and makes available the Gesture and the Physical Objects in the scene.
  2. The Object Description module provides the Physical Object Descriptors.
  3. The Gesture Description modules provides the Gesture Descriptors.
  4. The Object Identification module uses both Physical Object Descriptors and Visual Scene-related Descriptors, to understand which object in the scene the human points their finger to, select the appropriate set of Physical Object Descriptors, and give the Object ID.
  5. The Gesture Descriptor Interpretation module uses the Gesture Descriptors to extract the Personal Status of Gesture.
  6. The Face DescriptionFace Descriptor Interpretation chain produces the Personal Status of Face.
  7. The Audio Scene Description module analyses the audio signal, describes, and makes available the Speech Object.
  8. The Speech DescriptionSpeech Descriptor Interpretation chain produces the Personal Status of Speech.

After the “front end” part we have a “conversation and manifestation” part involving another set of technologies as described in Figure 2.

Figure 2 – Conversation and Manifestation

  1. The Text and Meaning Extraction module produces Text and Meaning.
  2. The Personal Status Fusion module integrates the three sources of Personal Status into the Personal Status.
  3. The Question and Dialogue Processing module processes Input Text, Meaning, Personal Status and Object ID and provides the Machine Output Text and Personal Status.
  4. The Personal Status Display module processes Machine Output Text and Personal Status and produces a speaking avatar uttering Machine Speech and showing an animated Machine Face and Machine Gesture.

The MPAI-MMC V2 Call considers another use case – Avatar-Based Videoconference – that uses avatars in a different way.

Avatars representing geographically separated humans participate in a virtual conference. Each participant receives each other participants’ avatars, locates them around a table, and participates in the videoconference embodied in their own avatar.

The system is composed of:

  1. Transmitter client: Extracts speech and face descriptors for authentication, creates avatar descriptors using Face & Gesture Descriptors, and Meaning, and sends the participant’s Avatar Model & Descriptors and Speech to the Server.
  2. Server: Authenticates participants; distributes Avatar Models & Descriptors and Speech of each participant.
  3. Virtual Secretary: Makes and displays a summary of the avatars’ utterances using their speech and Personal Status.
  4. Receiver client: Creates virtual videoconference scene, attaches speech to each avatar and lets participant view and/or navigate the virtual videoconference room.

Figure 3 gives a simplified one-figure description of the use case.

Figure 3 – The avatar-based videoconference use case

This is the sequence of operations:

  1. The Speaker Identification and Face Identification modules produce Speech and Face Descriptors that the Authentication module in the server uses to identify the participant.
  2. The Personal Status Extraction module produces the Personal Status.
  3. The Speech Recognition and Meaning produces the Meaning.
  4. The Face Description and Gesture Description modules produce the Face and Gesture Descriptors (for feature and motion).
  5. The Participant Description module uses Personal Status, Meaning, and Face and Gesture Descriptors to produce the Avatar Descriptors.
  6. The Avatar Animation module animates the individual participant’s Avatar Model using the Avatar Descriptors.
  7. The AV Scene Composition module places the participants’ avatars in their assigned places, attaches to each avatar its own speech and produces the Audio-Visual Scene that the participant can view and navigate.

The MPAI-MMC V2 use cases require the following technologies

  1. Audio Scene Description.
  2. Visual Scene Description.
  3. Speech Descriptors for:
    1. Speaker identification.
    2. Personal status extraction.
  4. Human Object Descriptors.
  5. Face Descriptors for:
    1. Face identification.
    2. Personal status extraction.
    3. Feature extraction (e.g., for avatar model)
    4. Motion extraction (e.g., to animate an avatar).
  6. Gesture Descriptors for:
    1. Personal Status extraction.
    2. Features (e.g., for avatar model)
    3. Motion (e.g., to animate an avatar).
  7. Personal Status.
  8. Avatar Model.
  9. Environment Model.
  10. Human’s virtual twin animation.
  11. Animated avatar manifesting a machine producing text and personal status.

The MPAI-MMC V2 standard is an opportunity for the industry to agree on a set of data formats so a market of modules can be created that is able to handle those formats. The standard should be extensible, in the sense that as new more performing technologies mature, they can be incorporated into the standard.

Please see:

  1. The 2 min video (YouTube and non-YouTube) illustrating MPAI-MMC V2.
  2. The slides presented at the online meeting on 2022/07/12.
  3. The Video recording of the online presentation (Youtube,non-YouTube) made at that 12 July presentation.
  4. The Call for TechnologiesUse Cases and Functional Requirements, Framework Licence. and Template for responses.

The MPAI 2022 Calls for Technologies – Part 3 (Neural Network Watermarking)

Research, personnel, training and processing can bring the development costs of a neural network anywhere from a few thousand to a few hundreds of thousand dollars. Therefore, the AI industry needs a technology to ensure traceability and integrity not only of a neural network, but also of the content generated by it (so-called inference). The content industry facing a  similar problem, has used watermarking to imperceptibly and persistently insert a payload carrying, e.g., owner ID, timestamp, etc. to signal the ownership of a content item. Watermarking can also be used by the AI industry.

The general requirements for using watermarking in neural networks are:

  • The techniques shall not affect the performance of the neural network.
  • The payload shall be recoverable even if the content was modified.

MPAI has classified the cases of watermarking use as follows:

  • Identification of actors (i.e., neural network owner, customer, and end-user).
  • Identification of the neural network model.
  • Detecting the modification of a neural network.

This classification is depicted in Figure 1 and concerns the use of watermarking technologies in neural networks and is independent of the intended use.

Figure 1 – Classification of neural network watermarking uses

MPAI has identified the need for a standard – code name MPAI-NNW – enabling users to measure the performance of the following component of a watermarking technology:

  • The ability of a watermark inserter to inject a payload without deteriorating the performance of the Neural Network.
  • The ability of a watermark detector to ascertain the presence and of a watermark decoder to retrieve the payload of the inserted watermark when applied to:
    • A modified watermarked network (e.g., by transfer learning or pruning).
    • An inference of the modified model.
  • The computational cost (e.g., execution time) of a watermark inserter to inject a payload, a watermark detector/decoder to detect/decode a payload from a watermarked model or from any of its inferences.

Figure 2 depicts the three watermarking components covered by MPAI-NNW.

Figure 2 – The three areas to be covered by MPAI-NNW

MPAI has issued a Call to acquire the technologies for use in the standard. The list below is a subset of the requests contained in the call:

  • Use cases
    • Comments on use cases.
  • Impact of the watermark on the performance
    • List of Tasks to be performed by the Neural Network (g. classification task, speech generation, video encoding, …).
    • Methods to measure the quality of the inference produced (g. precision, recall, subjective quality evaluation, PSNR, …).
  • Detection/Decoding capability
    • List of potential modifications that a watermark shall be robust against (g. pruning, fine-tuning, …).
    • Parameters and ranges of proposed modifications.
    • Methods to evaluate the differences between the original and retrieved watermarks (g., Symbol Error Rate).
  • Processing cost
    • Specification of the testing environments.
    • Specification of the values characterizing the processing of Neural Networks.

Below are a few useful links for those wishing to know more about the MPAI-NNW Call for Technologies and how to respond to it:

The MPAI secretariat shall receive the responses to the MPAI-NNW Call for Technologies by 2022 October