Moving Picture, Audio and Data Coding
by Artificial Intelligence

MPAI calls for technologies supporting metaverse-based Agentic AI

Geneva, Switzerland – 29th October 2025. MPAI – Moving Picture, Audio and Data Coding by Artificial Intelligence – the international, non-profit, unaffiliated organisation developing AI-based data coding standards – has concluded its 61st General Assembly (MPAI-61) approving the publication of a Call for Autonomous User Architecture Technologies.

With this Call for Technologies, formally “Pursuing Goals in metaverse (MPAI-PGM) – Autonomous User Architecture (PGM-AUA)”, MPAI is aiming at a standard enabling Autonomous Users to perform activities such as moving around and conversing with other Users. These are processes representing humans in a metaverse conforming with the MPAI Metaverse Model Technologies standard (MMM-TEC). They can either operate with a high degree of autonomy (A-Users) or be directly controlled by humans (H-Users).

PGM-AUA will rely on the friendly MMM-TEC environment and many relevant technologies already available in the 16 approved MPAI standards. However, the ambitious PGM-AUA goal requires many new technologies that the Call is designed to secure.

The text of the call and associated document is available. Responses are due to the MPAI Secretariat by 2025/01/21T23:59.

Register to attend the online presentation of the call and its associated documents on 17 November at  9 UTC and 16 UTC.

MPAI-61 has also approved the new versions of standards previously posted for Community Comments:

MPAI is continuing the development of its work plan that involves the following activities:

  1. AI Framework (MPAI-AIF): developing a new MPAI-AIF specification that facilitates the creation of new workflows using available AIMs.
  2. AI for Health (MPAI-AIH): developing the specification of a system receiving and processing licenses AI Health Data and enabling clients to improve health processing models via federated learning.
  3. Context-based Audio Enhancement (CAE-DC): developing the Audio Six Degrees of Freedom (CAE-6DF) and Audio Object Scene Rendering (CAE-AOR) specifications.
  4. Connected Autonomous Vehicle (MPAI-CAV): investigating extensions of the current CAV-TEC specification.
  5. Compression and Understanding of Industrial Data (MPAI-CUI): developing the Company Performance Prediction V2.0 specification.
  6. End-to-End Video Coding (MPAI-EEV): exploring the potential of AI-based End-to-End Video coding.
  7. AI-Enhanced Video Coding (MPAI-EVC): finalising the Up-sampling Filter for Video applications (EVC-UFV) standard.
  8. Governance of the MPAI Ecosystem (MPAI-GME): operating the MPAI Ecosystem per the MPAI-GME Specification.
  9. Human and Machine Communication (MPAI-HMC): developing reference software and performance assessment.
  10. Multimodal Conversation (MPAI-MMC): discussing the conversational part of the PGM-AUA Call for Technologies.
  11. MPAI Metaverse Model (MPAI-MMM): developing support for security in the MMM-TEC specs.
  12. Neural Network Watermarking (MPAI-NNW): Reviewing the responses to the Call on Neural Network Traceability Technologies.
  13. Object and Scene Description (MPAI-OSD): discussing the spatial part of the PGM-AUA Call for Technologies.
  14. Portable Avatar Format (MPAI-PAF): discussing the rendering part of the PGM-AUA Call for Technologies.
  15. AI Module Profiles (MPAI-PRF): extending the scope of the current version of AI Module Profiles.
  16. Server-based Predictive Multiplayer Gaming (MPAI-SPG): exploring new standard opportunities in the domain.
  17. Data Types, Formats, and Attributes (MPAI-TFA) extending the standard to data types used by MPAI standards (e.g., automotive, health, and metaverse).
  18. XR Venues (MPAI-XRV): developing the standard for improved development and execution of Live Theatrical Performances.

Legal entities and representatives of academic departments supporting the MPAI mission and able to contribute to the development of standards for the efficient use of data can become MPAI members.

Please visit the MPAI website, contact the MPAI Secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook, Instagram, and YouTube.


Celebrating the first five years of MPAI

Where there are organisations counting years of existence in decades or centuries, there should not be much to celebrate for an organisation that only reaches as few as five years of existence. But there are years and years – even days and days – like in one day as a lion or a hundred years as a sheep.

The last five were not the years of a sheep but as one day as a lion.

We started with the idea of an organisation dedicated to standards for AI-based data coding because we thought that standards would bring benefits to a domain mostly alien to it. Not like some standards that look more like legal tools designed to oppress users but standards offering fair opportunities to all parties in the chain extending from innovators to end users.

An ambitious organisation like MPAI could not operate like four friends in a bar. The MPAI operation rules were developed and are now enshrined in the MPAI Patent Policy. The ambitions of MPAI were further enhanced by the definition of the MPAI Ecosystem extending from MPAI to implementers, integrators, and end users with the introduction of a new actor called MPAI Store, now incorporated in Scotland as a company limited by guarantee. There is a standard – Governance of the MPAI Ecosystem (MPAI-GME) setting the rules of operation of the Ecosystem.

The idea of a mission was there but what about implementing it? We acted as lions and posited that opaque monolithic AI should become component-based AI. Now a large share of our standards are based on the AI Framework (MPAI-AIF) standard, specifying an environment where AI Workflows composed of AI Modules can be initialised, dynamically configured, and controlled. MPAI-AIF also provided a stimulus to adoption of JSON Schema as a “language” to represent data types, AI Modules, and AI Workflows in MPAI standards. Today there is virtually no MPAI standard that does not use that language.

Having laid down the technical foundations, we started the buildings. One was designed to host the quite representative area of human and machine conversation extending beyond the “word” to cover other sometimes ethereal but information-carrying sensations and feelings. The standard called Multimodal Conversation (MPAI-MMC) is the first attempt at digitally representing this ethereal information with the Personal Status data type and Human-Machine Communication (MPAI-HMC) is an excellent example of its application.

Another investigation stream since the early MPAI days is audio sitting at the MPAI table as “Context-based Audio Enhancement” leading to the Context-based Audio Enhancement – Use Cases (MPAI-CAE) standard. Finally, with Compression and Understanding of Industrial Data (MPAI-CUI), MPAI demonstrated that data from so far unexplored domains like finance could benefit from standards.

Just one year after its establishment, MPAI could claim success by publishing its first three standards:  MPAI-CUI, MPAI-GME, and MPAI-MMC and, by the end of 2021, another two: MPAI-AIF and MPAI-CAE.

Since its early days, MPAI was convinced that standards should have as much visibility as possible. For this reason, it established a successful cooperation with the Institute of Electric and Electronic Engineers (IEEE) – Standard Association (SA). Today, starting from three standards in 2022, nine MPAI standards have been adopted by IEEE without modifications and three more are in the pipeline.

The creation of MPAI Development Committees and Working Groups and their activity continued unrelenting. The use of watermarking and then fingerprinting to trace the use of neural networks let to the development of Neural Network Watermarking – Traceability (NNW-NNT). Connected Autonomous Vehicles was started in late 2020 and is now a standard with the name Connected Autonomous Vehicle – Technologies (CAV-TEC). MPAI was probably the first to engage in activities leading to a metaverse standard and now it can claim to have a solid candidate to lead the move to interoperable metaverses with MPAI Metaverse Model – Technologies (MMM-TEC). Since its early days, MPAI worked on online gaming, producing the Server-based Predictive multiplayer Gaming – Mitigation of Data Loss Effects (SPG-MDL) standard where a set of AI Modules predicts the game state of an online multiplayer game.

MPAI abhors the attitude of other standards bodies who develop unnecessarily “siloed” standards where technologies are treated exclusively from the point of view of the domain of that standard without considering similar technologies in other domains. Object and Scene Description (MPAI-OSD) and Portable Avatar Format (MPAI-PAF) do specify AI Workflows specific to their domains but their AI Modules and Data Types were specified for wide reuse in many other MPAI standards. This attitude is not confined to these two standards as the same can be said of MPAI-CAE and MPAI-MMC.

Atypical – but no less important – standards are AI Module Profiles (MPAI-PRF) establishing a machine-readable description to identify AI Module Profiles and Data Types, Formats, and Attributes (MPAI-TFA) providing a standard way to add information about data for processing by a machine.

Last comes a standard that embodies probably the very first activity – AI for video. AI-Enhanced Video Coding – Up-sampling Filter for Video applications (EVC-UFV) offers an AI super-resolution filter vastly superior to currently used filters.

Five years ago, MPAI was very bold in targeting standards for AI, then just a nice technology to talk about. In five years, however, AI is all over the place and much talked about. What will the future offer for MPAI?

Some answers are clear:

  • With its impressive portfolio of 15 standards, there will be much maintenance and enhancement work to do.
  • Two new standards are being developed and should be completed in a short time: AI for Health – Health Secure Platform and XR Venues – Live Theatrical Performance.
  • One project – End-to-End Video coding has still to go through the Call for Technologies phase
  • A Call for Technologies is open, and responses are expected: Neural Network Watermarking – Technologies.
  • A new Call for Technologies on Pursuing Goals in the metaverse is being prepared. This will require the development of a significant number of “behaviours” on top of a “baseline” Small Language Model.
  • Development of reference implementations to enhance the value and attractiveness of existing standards.

AI continues its lightning speed of development and MPAI will continue watching and identifying standardisation opportunities in different domains.

Long live MPAI!

 


MPAI celebrates five years of pioneering AI standards

Geneva, Switzerland – 30th September 2025. MPAI – Moving Picture, Audio and Data Coding by Artificial Intelligence – the international, non-profit, unaffiliated organisation developing AI-based data coding standards – has celebrated its fifth anniversary at its 60th General Assembly (MPAI-60).

Established 5 years ago on 30 September 2020, MPAI has created the organisation, given itself rigorous procedures of work, developed 15 standards and two technical reports, obtained adoption of eight of its standards without modification by IEEE Standards Association, and is setting sights on next challenges targeting both extensions and new standards.

In line with its mission of AI-based data coding, MPAI standards cover execution of AI applications, audio enhancement, connected autonomous vehicles, finance, human and machine conversation, metaverse, objects and scenes, avatars, and many others.

MPAI-60 has approved final publication of new versions of existing standards:

and is publishing the following standards for Community Comments

MPAI is continuing the development of its work plan that involves the following activities:

  1. AI Framework (MPAI-AIF): developing a new MPAI-AIF specification that facilitates the creation of new workflows using available AIMs.
  2. AI for Health (MPAI-AIH): developing the specification of a system receiving and processing licenses AI Health Data and enabling clients to improve health processing models via federated learning.
  3. Context-based Audio Enhancement (CAE-DC): developing the Audio Six Degrees of Freedom (CAE-6DF) and Audio Object Scene Rendering (CAE-AOR) specifications.
  4. Connected Autonomous Vehicle (MPAI-CAV): investigating extensions of the current CAV-TEC specification.
  5. Compression and Understanding of Industrial Data (MPAI-CUI): developing the Company Performance Prediction V2.0 specification.
  6. End-to-End Video Coding (MPAI-EEV): exploring the potential of AI-based End-to-End Video coding.
  7. AI-Enhanced Video Coding (MPAI-EVC): refining the Up-sampling Filter for Video applications (EVC-UFV) standard.
  8. Governance of the MPAI Ecosystem (MPAI-GME): working on version 2.0 of the Specification.
  9. Human and Machine Communication (MPAI-HMC): developing reference software and performance assessment.
  10. Multimodal Conversation (MPAI-MMC): Developing the notion of Perceptive and Agentive AI (PAAI) capable of handling more complex questions.
  11. MPAI Metaverse Model (MPAI-MMM): extending the capabilities of the MMM-TEC specs to support more applications.
  12. Neural Network Watermarking (MPAI-NNW): Issuing a Call on Neural Network Traceability Technologies.
  13. Object and Scene Description (MPAI-OSD): extending the capabilities of the MPAI-OSD V1.3 to support more applications.
  14. Portable Avatar Format (MPAI-PAF): extending the capabilities of the MPAI-PAF V1.4 to support more applications.
  15. AI Module Profiles (MPAI-PRF): extending the scope of the current version of AI Module Profiles.
  16. Server-based Predictive Multiplayer Gaming (MPAI-SPG): exploring new standard opportunities in the domain.
  17. Data Types, Formats, and Attributes (MPAI-TFA) extending the standard to data types used by MPAI standards (e.g., automotive, health, and metaverse).
  18. XR Venues (MPAI-XRV): developing the standard for improved development and execution of Live Theatrical Performances.

Legal entities and representatives of academic departments supporting the MPAI mission and able to contribute to the development of standards for the efficient use of data can become MPAI members.

Please visit the MPAI website, contact the MPAI Secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook, Instagram, and YouTube.

 

 

 


Exploring the innovations of the MMM-TEC V2.1 standard

The MPAI Metaverse Model (MPAI-MMM) – Technologies (MMM-TEC) specification is based on an innovative approach. As in the real world (Universe) we have animate and inanimate things, in an MPAI Metaverse (M-Instance) we have Processes and Items. Processes can animate Items (things) in the metaverse but can also act as a bridge between metaverse and universe. For convenience, MMM-TEC defines four classes of Processes: Apps, Devices, Services, and Users.

Probably, the most interesting one is the User, defined as the “representative” of a human where representation means that the human is responsible for what their Users do in the metaverse. The representation function can be very strict because the human drives everything one of their User does or very loose because the User is a fully autonomous agent (still under the human’s responsibility). As the User is a Process, it cannot be “perceived” except from what it does, but it can render itself in a perceptible form, called Persona that may visually appear as a humanoid. A human can have more than one User, and a User can be rendered with more than one Persona.

Humans can do interesting things in the world, but what interesting things can they do in the metaverse? MMM-TEC answers this question by offering a range of 28 basic Actions, called Process Actions. An important one is Register. By Registering, a human gets the Rights to import (via the UM-Send Action) and deploy (via the Execute Action) Users and render (by e.g., MM-Adding) Personae. UM-Send means sending things from the universe to the metaverse and MM-Add means placing an Avatar and then possibly animating it (MM-Animate) with a stream or rendering it (MU-Actuate) in the universe.

Universe and metaverse are connected, but they should be mutually “protected”. One example of what this means is data from the universe cannot be simply imported into the metaverse, but is first captured (UM-Capture), then identified (Identify) – i.e., converted into an Item – and finallu acted upon, e.g., used to animate an avatar. Also, a User is not entitled to do just anything anywhere in the metaverse because its operation is governed by three basic notions: Rights, expressing the fact that a User (in general, a Process) may perform a certain Process Action; Rules, expressing the fact a Process maymay not, or must perform a Process Action; and P-Capabilities expressing that the Process can perform certain Process Actions.

What if a Process wants to perform a Process Action, has the Rights to perform it, and its performance complies with the Rules, but it cannot, i.e., it does not know how to perform it? MMM-TEC makes use of a philosophy of language notion called Speech Act that is expressed by an individual and contains both information and action. For instance, User MU-Actuates Persona At M-Location At U-Location With Spatial Attitude will mean that the User renders at U-Location in the universe with a certain Position and Orientation the Persona that is placed at an M-Location in the Metaverse. If the User can – i.e., it has the P-Capabilities to – MU-Actuate the Persona, for instance because it is connected to the universe via an appropriate device, and may, i.e., it has the Rights to MU-Actuate, and the planned Process Action complies with the Rules, then the Process Action is performed. However, if the User does not have the necessary P-Capabilities or does not have the Rights to MU-Actuate the Persona, it can ask an Import-Export Service to do this on its behalf. Possibly, the Service will request that a Transaction be made in order to perform the requested Process Action.

As a last point, we should describe how MMM-TEC represents Rights and Rules. MMM-TEC states that Rights are, in general, a collection of Process Actions that the Process can perform. Each of them is preceded by InternalAcquired, or Granted to indicate if the Rights were obtained at the time of Registration, were acquired (e.g., by a Transaction), or are Granted (and then possibly withdrawn) by another Process. Similarly, Rules are expressed by Process Actions each of which is preceded by MayMay not, or Must.

We could add many more details to give a complete description of the MMM-TEC potential. You can directly access the standard here, but now we want to address some of the innovations introduced by MMM-TEC V2.1.

The first is the set of new capabilities provided by the Property Change Process Action. We said that we can MM-Add a Persona and then MM-Animate it. But what if we are preparing a theatre performance and we do not want “to be seen” while rehearsing? Property Change can set the Perceptibility Status of an Item but can also change:

  • The properties of a visual Item in terms of its size, mass, material (i.e., to signal that the object is material or immaterial), gravity (is subject to gravity or not), and texture map.
  • The audio characteristics of an object: Reflectivity, Reverberation, Diffusion, and Absorption.
  • The properties of a light source: Type (Point, Directional, Spotlight, Area), colour, and intensity of the light source.
  • The properties of an audio source: Diffuseness, Directional Patterns, Shape, and Size.
  • The Personal Status (i.e., emotion) of an avatar.

Another important set of functionalities is provided by significant extensions of how a Process in the metaverse can affect the universe. MMM-TEC V2.1 allows a User to MU-Actuate at a U-Location an Item MM-Added at an M-Location. How can this Process Action be performed? We assume that the M-Instance is connected to a special Device that can perform the following in the universe:

  • Pick an existing object.
  • Drive a 3d printer that produces the analogue version of the Item.
  • Render a 2D or a 3D media object.

MMM-TEC V2.1 calls R-Item any physical object in the universe, including the object produced by a 3D printer and the 2D or 3D media object produced. It also defines the following additional Process Actions:

  • MU-Add an R-Item: to place an R-Item (a physical object) somewhere in the universe with a Spatial Attitude.
  • MU-Animate an R-Item: to animate, e.g., a robot, with a stream.
  • MU-Move an R-Item from a U-Location to another U-Location along a Trajectory.

MMM-TEC is rigorous in defining how Process Actions can be performed in an M-Instance, but what about the universe? Do we want Processes to perform actions in the universe in an uncontrolled way?

The answer is clear: the M-Instance does not control the Universe through some supernatural force but through Devices whose operation is conditional on the Rights and P-Capabilities held by the Device to perform the desired Process Actions in the universe. The Process Actions beginning with “MU-” include the Rights of a Device to act on the universe.

V2.1 adds several new use cases to the long list of V2.0. One of these is called “Emergency in Industrial Metaverse”:

  1. An M-Location includes the Digital Twin of a real factory (R-Factory) where the regular operation is separated from emergency operation described by the use case.
  2. An “emergency” User in the Digital Twin (V-Factory):
    1. Has the Rights to actuate and animate an “emergency” robot in the R-Factory.
    2. Can be rendered as a Persona having the appearance of the corresponding robot.
  3. In case of an emergency, the User:
    1. Activates an alarm in the R-Factory.
    2. Actuates its “emergency” robot (Analogue Twin) in the R-Factory.
    3. Animates the robot to solve the problem.
    4. Renders its Persona so that humans can see what is happening in the R-Factory.
  4. When the emergency is resolved, the robot is moved to its repository.

You are invited to register to attend the online presentation on 12 September at 15 UTC and provide your comments to the MPAI Secretariat by 2025/09/28 T23:59 UTC


MPAI publishes MPAI Metaverse Model – Technologies V2.1 standard with extended functionalities

Geneva, Switzerland – 20th August 2025. MPAI – Moving Picture, Audio and Data Coding by Artificial Intelligence – the international, non-profit, unaffiliated organisation developing AI-based data coding standards – has concluded its 59th General Assembly (MPAI-59) approving the publication of the MPAI Metaverse Model – Technologies V2.1 with a request for Community Comments.

The earlier 2.0 Version of Technical Specification: MPAI Metaverse Model (MMM) – Technologies (MMM-TEC) already supported digital twinning of real-world environments and their blending with MMM-TEC-specified virtual environments. The new MMM-TEC V2.1 supports “analogue twinning” of virtual- with real-world environments opening attractive industrial metaverse applications. This is achieved by introducing new “Process Actions” (speech acts of an MMM-TEC process sent to another process) and the notion of R-Item (real object) that can be MU-Added (placed at a U-Location, a location in the real world), MU-Moved (moved from a U-Location to another U-Location along a Trajectory), and MU-Animated (animated) in sync with a Persona (the rendering of a Process as an avatar) in the metaverse.

Among the several other innovations included in MMM-TEC V2.1, we mention Change Property, a Process Action whereby a Process changes – if it holds the Rights – the place where and object is located; its properties such as perceptibility, size, mass, gravity, and texture; audio properties such as reflectivity, reverberation, diffusion and absorption; an audio or light source; and the emotional state of an avatar.

MPAI standards are best described as a web of interconnected specifications. The new technologies needed by MMM-TEC are partly specified by Object and Scene Descriptors (MPAI-OSD), Portable Avatar Format (MPAI-PAF), and Data Types, Formats and Attributes (MPAI-TFA). They are now at versions V1.4, V1.5, and V1.4, respectively.

Register to attend the presentations of the many innovations added to

  • The MMM-TEC1 V2.1 standard on 12 September at 15 UTC (link).
  • The MPAI-OSD V1.4 and MPAI-PAF V1.5 standards on 12 September at 10 UTC (link).
  • The MPAI-TFA V1.4 standard on Wednesday 17 September at 15 UTC (link)
  • The MPAI-GME V2.0 standard on Friday 26 September at 14 UTC (link).

MPAI is continuing the development of its work plan that involves the following activities:

  1. AI Framework (MPAI-AIF): developing a new MPAI-AIF specification that facilitates the creation of new workflows using available AIMs.
  2. AI for Health (MPAI-AIH): developing the specification of a system receiving and processing licenses AI Health Data and enabling clients to improve health processing models via federated learning.
  3. Context-based Audio Enhancement (CAE-DC): developing the Audio Six Degrees of Freedom (CAE-6DF) and Audio Object Scene Rendering (CAE-AOR) specifications.
  4. Connected Autonomous Vehicle (MPAI-CAV): investigating extensions of the current CAV-TEC specification.
  5. Compression and Understanding of Industrial Data (MPAI-CUI): developing the Company Performance Prediction V2.0 specification.
  6. End-to-End Video Coding (MPAI-EEV): exploring the potential of AI-based End-to-End Video coding.
  7. AI-Enhanced Video Coding (MPAI-EVC): refining the Up-sampling Filter for Video applications (EVC-UFV) standard.
  8. Governance of the MPAI Ecosystem (MPAI-GME): working on version 2.0 of the Specification.
  9. Human and Machine Communication (MPAI-HMC): developing reference software and performance assessment.
  10. Multimodal Conversation (MPAI-MMC): Developing the notion of Perceptive and Agentive AI (PAAI) capable of handling more complex questions.
  11. MPAI Metaverse Model (MPAI-MMM): extending the capabilities of the MMM-TEC specs to support more applications.
  12. Neural Network Watermarking (MPAI-NNW): Issuing a Call on Neural Network Traceability Technologies.
  13. Object and Scene Description (MPAI-OSD): extending the capabilities of the MPAI-OSD V1.3 to support more applications.
  14. Portable Avatar Format (MPAI-PAF): extending the capabilities of the MPAI-PAF V1.4 to support more applications.
  15. AI Module Profiles (MPAI-PRF): extending the scope of the current version of AI Module Profiles.
  16. Server-based Predictive Multiplayer Gaming (MPAI-SPG): exploring new standard opportunities in the domain.
  17. Data Types, Formats, and Attributes (MPAI-TFA) extending the standard to data types used by MPAI standards (e.g., automotive, health, and metaverse).
  18. XR Venues (MPAI-XRV): developing the standard for improved development and execution of Live Theatrical Performances.

Legal entities and representatives of academic departments supporting the MPAI mission and able to contribute to the development of standards for the efficient use of data can become MPAI members.

Please visit the MPAI website, contact the MPAI Secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook, Instagram, and YouTube.


Exploring the Up-sampling Filter for Video applications (EVC-TEC) standard

MPAI has approved Technical Specification: AI-Enhanced Video Coding (MPAI-EVC) – Up-sampling Filter for Video applications (EVC-UFV).

The standard includes a general procedure to design video up-sampling filters based on super resolution techniques and a method to reduce the complexity of the designed filters without significant performance loss. The standard also provides the parameters of specific filters for standard definition to high definition and high definition to ultra-high definition, for the complexity-reduced and original cases.

The standard will be presented online on 23 July at 13 UTC. Register here to attend the presentation.

The standard is not in final form. It is published with a request for Community Comments according to MPAI procedures. Comments should be sent the MPAI Secretariat by 2025/08/18 T23:59 UTC.

A method typically used in video coding is to down-sample to half the input video frame before encoding. This reduces the computational cost but requires an up-sampling filter to recover the original video resolution in the decoded video to reduce as much as possible the loss in visual quality. Currently used filters are bicubic and Lanczos,

Figure 1 – Up-sampling Filters for Video application (EVC-UFV)

 

In the last few years, Artificial Intelligence (AI), Machine Learning (ML), and especially Deep Learning (DL) techniques, have demonstrated their capability to enhance the performance of various image and video processing tasks.  MPAI has performed an investigation to assess how video coding performance could be improved by replacing traditional coding blocks with deep-learning ones. The outcome of this study has shown that deep-learning based up-sampling filters significantly improve the performance of existing video codecs.

MPAI issued a Call for Technologies for up-sampling filters for video applications in October 2024. This was followed by an intense phase of development that enabled MPAI to approve Technical Specification: AI-Enhanced Video Coding (MPAI-EVC) – Up-sampling Filter for Video application (EVC-UFV) V1.0 with a request for Community Comments at its 58th General Assembly (MPAI-58).

EVC-UFV standard enables efficient and low complexity up-sampling filters applied to video with different bit-depth of 8 and 10 bit per pixels per component, in standard YCbCr colour space with 4:2:0 sub-sampling, encoded with a variety of encoding technologies using different encoding features such as random access and low delay.

As depicted in Figure 2, the filter is a Densely Residual Laplacian Super-Resolution network (DRLN), offering a novel deep-learning approach.

Figure 2 – Densely Residual Laplacian Super-Resolution network (DRLN).

The complexity of the filter is reduced in two steps. First, a drastic simplification of the deep-learning structure that reduces the numbers of blocks provides a much lighter network while keeping similar performances of the baseline DRLN. This is achieved by identifying the DRLN’s principal components and understanding the impact of each component on the output video frame quality, memory size, and computational costs.

As shown in Figure 2, the main component of the DRLN architecture is a Residual Block which is composed of the Densely Residual Laplacian Modules (DRLM) and a convolutional layer. Each DRLM contains three Residual Units, as well as one compression unit and one Laplacian attention unit (a set of Convolutional Layers with a square filter size and Dilation that is greater than or equal the filter size). Each Residual Unit consists of two convolutional layers and two ReLU Layers. All DRLM modules in each Residual Block and all Residual Units in each DRLM are densely connected. The Laplacian attention unit consists of three convolutional layers with filter size 3×3 and dilation (a technique for expanding a convolutional kernel by inserting holes or gaps between its elements) equal to 3, 5, 7. All convolutional layers in the network, except the Laplacian one, have filter size 3×3 with dilation equal to 1. Throughout the network, the number of feature maps (the outputs of convolutional layers) is 64.

Based on this structural analysis, reducing the number of the main Residual Blocks, adding more DRLMs, and reducing the complexity of the Residual Unit and the number of hidden convolutional layers and features map drastically accelerates execution speed and reduces memory management but does not substantially affect the network’s visual quality performance.

Figure 3 depicts the resulting EVC-UFV Up-sampling Filter,

Figure 3 – Structure of the EVC-UFV Up-sampling Filter

 

The parameters of the original and complexity-reduced network are given in Table 1.

 

Table 1 – Parameters of the original and the complexity-reduced network

Original Final
Residual Blocks 6 2
DRLMs per Residual Block 3 6
Residual Block per DRLM 3 3
Hidden Convolutional Layers per Residual Unit 2 1
Input Feature Maps 64 32

 

Further, by pruning the parameters and weights of the network, the network complexity is reduced by 40%. The loss in performance is less than 1% in BD-rate. This is achieved, by first using the well-known DeepGraph technique, modified to work with deep-learning based up-sampling filter, understanding the dependency of the different components’ layers of the simplified deep-learning network. This facilitates grouping components sharing a common pruning approach that can be applied without introducing dimensional inconsistencies among the inputs and outputs of the layers.

Verification Tests of the technology has been performed on:

 

Standard sequences  CatRobot, FoodMarket4, ParkRunning3.
Bits/sample  8 and 10 bit-depth per component.
Colour space  YCbCr with 4:2:0 sub sampling.
Encoding technologies  AVC, HEVC, and VVC.
Encoding settings  Random Access and Low Delay at QPs 22, 27, 32, 37, 42, 47.
Up-sampling SD to HD and HD to UHD.
Metrics BD-Rate, BD-PSNR and BD-VMAF
Deep-learning structure Same for all QPs

 

Results show an impressive improvement for all coding technologies, and encoding options for all three objective metrics when compared with the currently used traditional bicubic interpolation. The results of Table 2 have been obtained foe the low-delay coding mode.

 

Table 2 – Performance of the EVC-UFV Up-sampling Filter

AVC HEVC VVC
SD to HD (using own trained filter) 14.4% 12.2% 13.8%
HD to UHD (using own trained filter) 5.6% 6% 6.5%
SD to HD (using HD to UHD filter) 14% 11.6% 11.4%

 

All results are obtained with the 40% pruned network.


MPAI publishes the Up-sampling Filter for Video applications with a request for Community Comments

Geneva, Switzerland – 9th July 2025. MPAI – Moving Picture, Audio and Data Coding by Artificial Intelligence – the international, non-profit, unaffiliated organisation developing AI-based data coding standards – has concluded its 58th General Assembly (MPAI-58) approving the publication of the Up-sampling Filter for Video applications standard with a request for Community Comments.

The approved Technical Specification: AI-Enhanced Video Coding (MPAI-EVC) – Up-sampling Filter for Video applications (EVC-UFV) specifies a general procedure to design video up-sampling filters based on super resolution techniques. Additionally, it specifies a method to reduce the complexity of the designed filters without significant loss in performance. The standard also provides the parameters of specific filters for standard definition to high definition and high definition to ultra-high definition, for the complexity-reduced and original cases. The standard is not in final form. It is published with a request for Community Comments according to MPAI procedures. Comments should be sent the MPAI Secretariat by 2025/08/18 T23:59 UTC.

The standard will be presented online on 23 July at 13 UTC. Register here to attend the presentation.

MPAI is continuing the development of its work plan that involves the following activities:

  1. AI Framework (MPAI-AIF): developing a new MPAI-AIF specification that facilitates the creation of new workflows using available AIMs.
  2. AI for Health (MPAI-AIH): developing the specification of a system receiving and processing licenses AI Health Data and enabling clients to improve health processing models via federated learning.
  3. Context-based Audio Enhancement (CAE-DC): developing the Audio Six Degrees of Freedom (CAE-6DF) and Audio Object Scene Rendering (CAE-AOR) specifications.
  4. Connected Autonomous Vehicle (MPAI-CAV): investigating extensions of the current CAV-TEC specification.
  5. Compression and Understanding of Industrial Data (MPAI-CUI): developing the Company Performance Prediction V2.0 specification.
  6. End-to-End Video Coding (MPAI-EEV): exploring the potential of AI-based End-to-End Video coding.
  7. AI-Enhanced Video Coding (MPAI-EVC): refining the Up-sampling Filter for Video applications (EVC-UFV) standard.
  8. Governance of the MPAI Ecosystem (MPAI-GME): working on version 2.0 of the Specification.
  9. Human and Machine Communication (MPAI-HMC): developing reference software and performance assessment.
  10. Multimodal Conversation (MPAI-MMC): Developing the notion of Perceptive and Agentive AI (PAAI) capable of handling more complex questions.
  11. MPAI Metaverse Model (MPAI-MMM): extending the capabilities of the MMM-TEC specs to support more applications.
  12. Neural Network Watermarking (MPAI-NNW): Issuing a Call on Neural Network Traceability Technologies.
  13. Object and Scene Description (MPAI-OSD): extending the capabilities of the MPAI-OSD V1.3 to support more applications.
  14. Portable Avatar Format (MPAI-PAF): extending the capabilities of the MPAI-PAF V1.4 to support more applications.
  15. AI Module Profiles (MPAI-PRF): extending the scope of the current version of AI Module Profiles.
  16. Server-based Predictive Multiplayer Gaming (MPAI-SPG): exploring new standard opportunities in the domain.
  17. Data Types, Formats, and Attributes (MPAI-TFA) extending the standard to data types used by MPAI standards (e.g., automotive, health, and metaverse).
  18. XR Venues (MPAI-XRV): developing the standard for improved development and execution of Live Theatrical Performances.

Legal entities and representatives of academic departments supporting the MPAI mission and able to contribute to the development of standards for the efficient use of data can become MPAI members.

Please visit the MPAI website, contact the MPAI Secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook, Instagram, and YouTube.


A new approach to address advanced AI systems challenges? A Workshop

2025 July 10 T09-13 – Campus Biotech Innovation Park, Av. de Sécheron 15, 1202 Genève, Switzerland (CH)

To join online: Link to Workshop

 Context, Scope and Objectives

This workshop has been conceived and organized by a group of people from the University of Geneva, the University of Zurich and the MPAI (https://mpai.community/), in collaboration with members of the Institute for AI International Governance of Tsinghua University, I-AIIG). The workshop is open to invited scholars and experts in the fields of AI related technical, social and legal research.

During the past few years, the theme of Global AI Governance has gained broad public and policy attention favoring a wide review of the impacts of current AI development path. We understand that AI’s underlying technologies and AI systems’ performance continue to develop at an exponential pace. This provides significant opportunities as well as substantial risks. Here we focus on risks, and particularly extreme or “existential” risks.

Following the arguments and calls made by the world’s most eminent AI experts, we share key Concerns regarding the development of Advanced AI, i.e.: Safety, Alignment, Transparency, Trustworthiness, Fairness.

Many efforts have already been made to address some of these Concerns, and we are fully supportive of them. However, there is a substantial gap between the efforts made to improve AI systems’ performance (the “AI race”) and efforts made to strengthen AI safety and to address the other Concerns. This is due to a variety of reasons such as the limited focus and investment by companies, IPR issues, underestimation of the risks by Governments, difficulty of managing Generative AI through regulatory frameworks, and the geopolitical context.

In addition, based on our knowledge, there is insufficient attention to integrate standardization in the AI safety field with the AI systems’ research and development cycle.

While we have some initial ideas on the necessary starting point to address this gap, the main objective of the workshop is to openly discuss those ideas, verify if there is consensus on them and, in case, find ways to address the risks that the rapid development of AI poses to humanity. The objective is in no way dismissive of the efforts undertaken by a plurality of actors in this domain: efforts that, in general, we fully support. We believe that our proposal is complementary and should help strengthen and expand the global action required to address the Concerns.

Based on the discussions from the workshop, we hope to draft an outcome document/paper to be published. This explains the format of the workshop below.

Workshop Agenda

09:00 – 9:15 Welcome and registration of participants
9:15-9:30 Introduction by the organizers

Major Concerns and existing AI Governance Initiatives:

  • Quick review of some of the most important initiatives underway,
  • Observations about gaps.
9:30 – 10:30 Open discussion session I

How to address the Concerns in an effective, timely and proactive way?

  • Is it possible to launch a process to establish a global “entity” to deal with the Concerns, with the goals of:
    • Promoting the (mostly) science & technology-based development of AI systems that operate addressing the Concerns.
    • Producing standards and conformity assessment tools – through an innovative process, integrated with the R&D cycle – helping in the implementation of AI systems that address the Concerns.
  • Is it possible to start this global initiative, by drawing from the best expertise of research labs and academia and with European and Chinese entities as initial actors? We emphasize that this initiative is open to all parties sharing the goals and willing to contribute.
  • Are there other directions that should be pursued?
10:30 – 11:00 Coffee Break
11:00 – 12:00 Open discussion session II

Continued: participants are invited to express their views on the points outlined above.

12:00 – 12:30 Review of the discussion outcome

To be proposed by the Moderator and Secretary, for review and discussion by participants:

  • Key points of consensus regarding the proposed ideas and possible additional lines of action.
  • List of main disagreements and of matters to be described more precisely.
  • Observations on the challenges and on the feasibility of the initiative.
12:30 – 13:00 To be presented by the Moderator and Secretary for approval

Conclusions and indication on the way forward.

13:00 – 14:00 Lunch

 

 


An introduction to the Neural Network Watermarking Call for Technologies

Introduction

During the last decade, Neural Networks have been deployed in an increasing variety of domains and the production of Neural Networks became costly, in terms of both resources (GPUs, CPUs, memory) and time. Moreover, users of Neural Network based services more and more express their needs for a certified service quality.

NN Traceability offers solutions to satisfy both needs, ensuring that a deployed Neural Network is traceable and any tampering detected.

Inherited from the multimedia realm, watermarking assembles a family of methodological and application tools allowing to imperceptibly and persistently insert some metadata (payload) into an original NN model. Subsequently, detecting/decoding this metadata from the model itself or from any of its inferences provides the means to trace the source and to verify the authenticity.

An additional traceability technology is fingerprinting that relates to a family of methodological and applicative tools allowing to extract some salient information from the original NN model (a fingerprint) and to subsequently identify that model based on the extracted information.

Therefore, MPAI has found the application area called “Neural Network Watermarking” to be relevant for MPAI standardization as there is a need for both Nural Network Traceability technologies and for assessing the performances such technologies.

MPAI available standards

In response to these needs, MPAI has established the Neural Network Watermarking Development Committee (NNW-DC). The DC has developed Technical Specification: Neural Network Watermarking (MPAI-NNW) – Traceability (NNW-NNT) V1.0. This specifies methods to evaluate the following aspects of Active (Watermarking) and Passive (Fingerprinting) Neural Network Traceability Methods:

  • The ability of a Neural Network Traceability Detector/Decoder to detect/decode/match Traceability Data when the traced Neural Network has been modified,
  • The computational cost of injecting, extracting, detecting, decoding, or matching Traceability Data,
  • Specifically for active tracing methods, the impact of inserted Traceability Data on the performance of a neural network and on its inference.

MPAI-NNW Future Standards

During its 57th GA held on June 11th, MPAI released a Call for Neural Network Watermarking (MPAI-NNW) – Technologies (NNW-TEC). This call requests Neural Network Traceability Technologies that make it possible to:

  • verify that the data provided by an Actor, and received by another Actor is not compromised, i.e. it can be used for the intended scope,
  • identify the Actors providing and receiving the data, and
  • evaluate the quality of solutions supporting the previous two items.

An Actor is a process producing, providing, processing, or consuming information.

MPAI NNT Actors

Four types of Actors are identified as playing a traceability-related role in the use cases.

  • NN owner – the developer of the NN, who needs to ensure that ownership of NN can be claimed.
  • NN traceability provider – the developer of the traceability technology able to carry a payload in a neural network or in an inference.
  • NN customer – the user who needs the NN owner’s NN to make a product or offer a service.
  • NN end-user – the user who buys an NN-based product or subscribes to an NN-based service.

Examples of Actors are:

  • Edge-devices and software
  • Application devices and software
  • Network devices and software
  • Network services

MPAI NNT Use cases

MPAI use cases relate to both the NN per se (i.e., to the data representation of the model) and to the inference of that NN (i.e., to the result produced by the network when fed with some input data), as illustrated in Figure 1.

Figure 1: Synopsis of NNT generic use cases: Identify the ownership of an NN, Identify an NN (e.g. DOI) and Verify integrity of an NN

The NNW-TEC use cases document is available; it includes sequence diagrams describing the positions and actions of the four main Actors in the workflow.

MPAI NNT Service and application scenarios

MPAI NNT is relevant for services and applications benefitting from one or several conventional NN tasks such as:

  1. Video/image/audio/speech/text classification
  2. Video/image/audio/speech/text segmentation
  3. Video/image/audio/speech/text generation
  4. Video/image/audio/speech decoding

Figures 2, 3 and 4 present three typologies of services and applications aggregating the generic use cases presented above.

The first example (Traceable newsletter service – Figure 2) covers the case where an end-user subscribes to a newsletter that is produced by a Generative AI service (provided by an NN customer), according to the end-user profile. In such a use case, a malicious user might try to temper the very production of the personalized content or to modify it during its transmission.

The second example (Autonomous vehicle services – Figure 3) deals with the traceability and authenticity of the multimodal content that is exchanged in various ways: (1) the car A (acting as an NN end-user) sends to a server (acting as an NN customer or owner) acquired signals for data-processing, (2) An embedded AI transmits instructions such as braking, turning, or accelerating to the car (NN owner and end-user), (3) Another vehicle B in the environment transmits environmental information to vehicle A. Various types of malicious attacks with critical consequences can be envisaged: AI interception and corruption (e.g. adversarial learning), les données can be corrupted in their transmission from and/or to the autonomous vehicle (forced connection interruption or data modification).

The third example (AI generated or processed information services – Figure 4) shows how NNT can be beneficial when real images are modified by a deepfake process. A user capturing a video sequence by a connected camera would like to appear as the archetype secret agent (say, James Bond) by interacting with a generative AI service remotely accessible in the network. This module synthesizes novel audiovisual content, which is then rendered on a large display for the user to enjoy. Such services are not immune from security threats: the attacker can intercept the encoded stream prior to its arrival at the trusted AI server and processed is through a malicious edge-deployed generative AI or it can affect the very trusted generative AI service (e.g. by employing some adversarial training techniques).

Figure 2: Autonomous vehicle services

Figure 3: AI-generated newsletter example

Figure 4: AI generated or processed information services

 

 


MPAI calls for Neural Network Traceability Technologies

Geneva, Switzerland – 11th June 2025. MPAI – Moving Picture, Audio and Data Coding by Artificial Intelligence – the international, non-profit, unaffiliated organisation developing AI-based data coding standards – has concluded its 57th General Assembly (MPAI-57) with the publication of the Call for Neural Network Traceability Technologies and three supporting documents.

The Call for Technologies: Neural Network Watermarking (MPAI-NNW) – Technologies (NNW-TEC) requests Neural Network Traceability Technologies that make it possible:

  1. To verify that the data provided by an Actor, and received by another Actor is not compromised, i.e. it can be used for the intended scope.
  2. To identify the Actors providing and receiving the data.
  3. To evaluate the quality of solutions supporting points 1 and 2 above implemented with the proposed Neural Network Traceability Technologies.

An Actor is a process producing, providing, processing, or consuming information.

An online presentation of the Call will be made on 2025/07/01T15 UTC. Please register at https://bit.ly/4mW6AWX to attend.

Responses to the Call are due to the MPAI Secretariat on 2025/09/27 T23:59 UTC.

MPAI is continuing its work plan that involves the following activities:

  1. AI Framework (MPAI-AIF): developing a new MPAI-AIF specification that facilitates the creation of new workflows using available AIMs.
  2. AI for Health (MPAI-AIH): developing the specification of a system receiving and processing licenses AI Health Data and enabling clients to improve health processing models via federated learning.
  3. Context-based Audio Enhancement (CAE-DC): developing the Audio Six Degrees of Freedom (CAE-6DF) and Audio Object Scene Rendering (CAE-AOR) specifications.
  4. Connected Autonomous Vehicle (MPAI-CAV): investigating extensions of the current CAV-TEC specification.
  5. Compression and Understanding of Industrial Data (MPAI-CUI): developing the Company Performance Prediction V2.0 specification.
  6. End-to-End Video Coding (MPAI-EEV): exploring the potential of AI-based End-to-End Video coding.
  7. AI-Enhanced Video Coding (MPAI-EVC): developing an optimised Up-sampling Filter for Video applications (EVC-UFV) standard.
  8. Governance of the MPAI Ecosystem (MPAI-GME): working on version 2.0 of the Specification.
  9. Human and Machine Communication (MPAI-HMC): developing reference software and performance assessment.
  10. Multimodal Conversation (MPAI-MMC): Developing the notion of Perceptive and Agentive AI (PAAI) capable of handling more complex questions.
  11. MPAI Metaverse Model (MPAI-MMM): extending the capabilities of the MMM-TEC specs to support more applications.
  12. Neural Network Watermarking (MPAI-NNW): Issuing a Call on Neural Network Traceability Technologies.
  13. Object and Scene Description (MPAI-OSD): extending the capabilities of the MPAI-OSD V1.3 to support more applications.
  14. Portable Avatar Format (MPAI-PAF): extending the capabilities of the MPAI-PAF V1.4 to support more applications.
  15. AI Module Profiles (MPAI-PRF): extending the scope of the current version of AI Module Profiles.
  16. Server-based Predictive Multiplayer Gaming (MPAI-SPG): exploring new standard opportunities in the domain.
  17. Data Types, Formats, and Attributes (MPAI-TFA) extending the standard to data types used by MPAI standards (e.g., automotive, health, and metaverse).
  18. XR Venues (MPAI-XRV): developing the standard for improved development and execution of Live Theatrical Performances.

Legal entities and representatives of academic departments supporting the MPAI mission and able to contribute to the development of standards for the efficient use of data can become MPAI members.

Please visit the MPAI website, contact the MPAI Secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook, Instagram, and YouTube.