Leonardo Chiariglione
2025-08-20

MPAI publishes MPAI Metaverse Model – Technologies V2.1 standard with extended functionalities

Geneva, Switzerland – 20^th August 2025. MPAI – Moving Picture, Audio and Data Coding by Artificial Intelligence – the international, non-profit, unaffiliated organisation developing AI-based data coding standards – has concluded its 59^th General Assembly (MPAI-59) approving the publication of the MPAI Metaverse Model – Technologies V2.1 with a request for Community Comments.

The earlier 2.0 Version of Technical Specification: MPAI Metaverse Model (MMM) – Technologies (MMM-TEC) already supported digital twinning of real-world environments and their blending with MMM-TEC-specified virtual environments. The new MMM-TEC V2.1 supports “analogue twinning” of virtual- with real-world environments opening attractive industrial metaverse applications. This is achieved by introducing new “Process Actions” (speech acts of an MMM-TEC process sent to another process) and the notion of R-Item (real object) that can be MU-Added (placed at a U-Location, a location in the real world), MU-Moved (moved from a U-Location to another U-Location along a Trajectory), and MU-Animated (animated) in sync with a Persona (the rendering of a Process as an avatar) in the metaverse.

Among the several other innovations included in MMM-TEC V2.1, we mention Change Property, a Process Action whereby a Process changes – if it holds the Rights – the place where and object is located; its properties such as perceptibility, size, mass, gravity, and texture; audio properties such as reflectivity, reverberation, diffusion and absorption; an audio or light source; and the emotional state of an avatar.

MPAI standards are best described as a web of interconnected specifications. The new technologies needed by MMM-TEC are partly specified by Object and Scene Descriptors (MPAI-OSD), Portable Avatar Format (MPAI-PAF), and Data Types, Formats and Attributes (MPAI-TFA). They are now at versions V1.4, V1.5, and V1.4, respectively.

The MMM-TEC1 V2.1 standard on 12 September at 15 UTC (link).
The MPAI-OSD V1.4 and MPAI-PAF V1.5 standards on 12 September at 10 UTC (link).
The MPAI-TFA V1.4 standard on Wednesday 17 September at 15 UTC (link)
The MPAI-GME V2.0 standard on Friday 26 September at 14 UTC (link).

MPAI is continuing the development of its work plan that involves the following activities:

AI Framework (MPAI-AIF): developing a new MPAI-AIF specification that facilitates the creation of new workflows using available AIMs.
AI for Health (MPAI-AIH): developing the specification of a system receiving and processing licenses AI Health Data and enabling clients to improve health processing models via federated learning.
Context-based Audio Enhancement (CAE-DC): developing the Audio Six Degrees of Freedom (CAE-6DF) and Audio Object Scene Rendering (CAE-AOR) specifications.
Connected Autonomous Vehicle (MPAI-CAV): investigating extensions of the current CAV-TEC specification.
Compression and Understanding of Industrial Data (MPAI-CUI): developing the Company Performance Prediction V2.0 specification.
End-to-End Video Coding (MPAI-EEV): exploring the potential of AI-based End-to-End Video coding.
AI-Enhanced Video Coding (MPAI-EVC): refining the Up-sampling Filter for Video applications (EVC-UFV) standard.
Governance of the MPAI Ecosystem (MPAI-GME): working on version 2.0 of the Specification.
Human and Machine Communication (MPAI-HMC): developing reference software and performance assessment.
Multimodal Conversation (MPAI-MMC): Developing the notion of Perceptive and Agentive AI (PAAI) capable of handling more complex questions.
MPAI Metaverse Model (MPAI-MMM): extending the capabilities of the MMM-TEC specs to support more applications.
Neural Network Watermarking (MPAI-NNW): Issuing a Call on Neural Network Traceability Technologies.
Object and Scene Description (MPAI-OSD): extending the capabilities of the MPAI-OSD V1.3 to support more applications.
Portable Avatar Format (MPAI-PAF): extending the capabilities of the MPAI-PAF V1.4 to support more applications.
AI Module Profiles (MPAI-PRF): extending the scope of the current version of AI Module Profiles.
Server-based Predictive Multiplayer Gaming (MPAI-SPG): exploring new standard opportunities in the domain.
Data Types, Formats, and Attributes (MPAI-TFA) extending the standard to data types used by MPAI standards (e.g., automotive, health, and metaverse).
XR Venues (MPAI-XRV): developing the standard for improved development and execution of Live Theatrical Performances.

Legal entities and representatives of academic departments supporting the MPAI mission and able to contribute to the development of standards for the efficient use of data can become MPAI members.

Please visit the MPAI website, contact the MPAI Secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook, Instagram, and YouTube.

No Comments InAll posts

Leonardo Chiariglione
2025-07-14

Exploring the Up-sampling Filter for Video applications (EVC-TEC) standard

MPAI has approved Technical Specification: AI-Enhanced Video Coding (MPAI-EVC) – Up-sampling Filter for Video applications (EVC-UFV).

The standard includes a general procedure to design video up-sampling filters based on super resolution techniques and a method to reduce the complexity of the designed filters without significant performance loss. The standard also provides the parameters of specific filters for standard definition to high definition and high definition to ultra-high definition, for the complexity-reduced and original cases.

The standard will be presented online on 23 July at 13 UTC. Register here to attend the presentation.

The standard is not in final form. It is published with a request for Community Comments according to MPAI procedures. Comments should be sent the MPAI Secretariat by 2025/08/18 T23:59 UTC.

A method typically used in video coding is to down-sample to half the input video frame before encoding. This reduces the computational cost but requires an up-sampling filter to recover the original video resolution in the decoded video to reduce as much as possible the loss in visual quality. Currently used filters are bicubic and Lanczos,

Figure 1 – Up-sampling Filters for Video application (EVC-UFV)

In the last few years, Artificial Intelligence (AI), Machine Learning (ML), and especially Deep Learning (DL) techniques, have demonstrated their capability to enhance the performance of various image and video processing tasks. MPAI has performed an investigation to assess how video coding performance could be improved by replacing traditional coding blocks with deep-learning ones. The outcome of this study has shown that deep-learning based up-sampling filters significantly improve the performance of existing video codecs.

MPAI issued a Call for Technologies for up-sampling filters for video applications in October 2024. This was followed by an intense phase of development that enabled MPAI to approve Technical Specification: AI-Enhanced Video Coding (MPAI-EVC) – Up-sampling Filter for Video application (EVC-UFV) V1.0 with a request for Community Comments at its 58^th General Assembly (MPAI-58).

EVC-UFV standard enables efficient and low complexity up-sampling filters applied to video with different bit-depth of 8 and 10 bit per pixels per component, in standard YCbCr colour space with 4:2:0 sub-sampling, encoded with a variety of encoding technologies using different encoding features such as random access and low delay.

As depicted in Figure 2, the filter is a Densely Residual Laplacian Super-Resolution network (DRLN), offering a novel deep-learning approach.

Figure 2 – Densely Residual Laplacian Super-Resolution network (DRLN).

The complexity of the filter is reduced in two steps. First, a drastic simplification of the deep-learning structure that reduces the numbers of blocks provides a much lighter network while keeping similar performances of the baseline DRLN. This is achieved by identifying the DRLN’s principal components and understanding the impact of each component on the output video frame quality, memory size, and computational costs.

As shown in Figure 2, the main component of the DRLN architecture is a Residual Block which is composed of the Densely Residual Laplacian Modules (DRLM) and a convolutional layer. Each DRLM contains three Residual Units, as well as one compression unit and one Laplacian attention unit (a set of Convolutional Layers with a square filter size and Dilation that is greater than or equal the filter size). Each Residual Unit consists of two convolutional layers and two ReLU Layers. All DRLM modules in each Residual Block and all Residual Units in each DRLM are densely connected. The Laplacian attention unit consists of three convolutional layers with filter size 3×3 and dilation (a technique for expanding a convolutional kernel by inserting holes or gaps between its elements) equal to 3, 5, 7. All convolutional layers in the network, except the Laplacian one, have filter size 3×3 with dilation equal to 1. Throughout the network, the number of feature maps (the outputs of convolutional layers) is 64.

Based on this structural analysis, reducing the number of the main Residual Blocks, adding more DRLMs, and reducing the complexity of the Residual Unit and the number of hidden convolutional layers and features map drastically accelerates execution speed and reduces memory management but does not substantially affect the network’s visual quality performance.

Figure 3 depicts the resulting EVC-UFV Up-sampling Filter,

Figure 3 – Structure of the EVC-UFV Up-sampling Filter

The parameters of the original and complexity-reduced network are given in Table 1.

Table 1 – Parameters of the original and the complexity-reduced network

	Original	Final
Residual Blocks	6	2
DRLMs per Residual Block	3	6
Residual Block per DRLM	3	3
Hidden Convolutional Layers per Residual Unit	2	1
Input Feature Maps	64	32

Further, by pruning the parameters and weights of the network, the network complexity is reduced by 40%. The loss in performance is less than 1% in BD-rate. This is achieved, by first using the well-known DeepGraph technique, modified to work with deep-learning based up-sampling filter, understanding the dependency of the different components’ layers of the simplified deep-learning network. This facilitates grouping components sharing a common pruning approach that can be applied without introducing dimensional inconsistencies among the inputs and outputs of the layers.

Verification Tests of the technology has been performed on:

Standard sequences	CatRobot, FoodMarket4, ParkRunning3.
Bits/sample	8 and 10 bit-depth per component.
Colour space	YCbCr with 4:2:0 sub sampling.
Encoding technologies	AVC, HEVC, and VVC.
Encoding settings	Random Access and Low Delay at QPs 22, 27, 32, 37, 42, 47.
Up-sampling	SD to HD and HD to UHD.
Metrics	BD-Rate, BD-PSNR and BD-VMAF
Deep-learning structure	Same for all QPs

Results show an impressive improvement for all coding technologies, and encoding options for all three objective metrics when compared with the currently used traditional bicubic interpolation. The results of Table 2 have been obtained foe the low-delay coding mode.

Table 2 – Performance of the EVC-UFV Up-sampling Filter

	AVC	HEVC	VVC
SD to HD (using own trained filter)	14.4%	12.2%	13.8%
HD to UHD (using own trained filter)	5.6%	6%	6.5%
SD to HD (using HD to UHD filter)	14%	11.6%	11.4%

All results are obtained with the 40% pruned network.

No Comments InAll posts

Leonardo Chiariglione
2025-07-09

MPAI publishes the Up-sampling Filter for Video applications with a request for Community Comments

Geneva, Switzerland – 9^th July 2025. MPAI – Moving Picture, Audio and Data Coding by Artificial Intelligence – the international, non-profit, unaffiliated organisation developing AI-based data coding standards – has concluded its 58^th General Assembly (MPAI-58) approving the publication of the Up-sampling Filter for Video applications standard with a request for Community Comments.

The approved Technical Specification: AI-Enhanced Video Coding (MPAI-EVC) – Up-sampling Filter for Video applications (EVC-UFV) specifies a general procedure to design video up-sampling filters based on super resolution techniques. Additionally, it specifies a method to reduce the complexity of the designed filters without significant loss in performance. The standard also provides the parameters of specific filters for standard definition to high definition and high definition to ultra-high definition, for the complexity-reduced and original cases. The standard is not in final form. It is published with a request for Community Comments according to MPAI procedures. Comments should be sent the MPAI Secretariat by 2025/08/18 T23:59 UTC.

The standard will be presented online on 23 July at 13 UTC. Register here to attend the presentation.

MPAI is continuing the development of its work plan that involves the following activities:

AI Framework (MPAI-AIF): developing a new MPAI-AIF specification that facilitates the creation of new workflows using available AIMs.
AI for Health (MPAI-AIH): developing the specification of a system receiving and processing licenses AI Health Data and enabling clients to improve health processing models via federated learning.
Context-based Audio Enhancement (CAE-DC): developing the Audio Six Degrees of Freedom (CAE-6DF) and Audio Object Scene Rendering (CAE-AOR) specifications.
Connected Autonomous Vehicle (MPAI-CAV): investigating extensions of the current CAV-TEC specification.
Compression and Understanding of Industrial Data (MPAI-CUI): developing the Company Performance Prediction V2.0 specification.
End-to-End Video Coding (MPAI-EEV): exploring the potential of AI-based End-to-End Video coding.
AI-Enhanced Video Coding (MPAI-EVC): refining the Up-sampling Filter for Video applications (EVC-UFV) standard.
Governance of the MPAI Ecosystem (MPAI-GME): working on version 2.0 of the Specification.
Human and Machine Communication (MPAI-HMC): developing reference software and performance assessment.
Multimodal Conversation (MPAI-MMC): Developing the notion of Perceptive and Agentive AI (PAAI) capable of handling more complex questions.
MPAI Metaverse Model (MPAI-MMM): extending the capabilities of the MMM-TEC specs to support more applications.
Neural Network Watermarking (MPAI-NNW): Issuing a Call on Neural Network Traceability Technologies.
Object and Scene Description (MPAI-OSD): extending the capabilities of the MPAI-OSD V1.3 to support more applications.
Portable Avatar Format (MPAI-PAF): extending the capabilities of the MPAI-PAF V1.4 to support more applications.
AI Module Profiles (MPAI-PRF): extending the scope of the current version of AI Module Profiles.
Server-based Predictive Multiplayer Gaming (MPAI-SPG): exploring new standard opportunities in the domain.
Data Types, Formats, and Attributes (MPAI-TFA) extending the standard to data types used by MPAI standards (e.g., automotive, health, and metaverse).
XR Venues (MPAI-XRV): developing the standard for improved development and execution of Live Theatrical Performances.

Legal entities and representatives of academic departments supporting the MPAI mission and able to contribute to the development of standards for the efficient use of data can become MPAI members.

No Comments InAll posts

Leonardo Chiariglione
2025-07-07

A new approach to address advanced AI systems challenges? A Workshop

2025 July 10 T09-13 – Campus Biotech Innovation Park, Av. de Sécheron 15, 1202 Genève, Switzerland (CH)

To join online: Link to Workshop

Context, Scope and Objectives

This workshop has been conceived and organized by a group of people from the University of Geneva, the University of Zurich and the MPAI (https://mpai.community/), in collaboration with members of the Institute for AI International Governance of Tsinghua University, I-AIIG). The workshop is open to invited scholars and experts in the fields of AI related technical, social and legal research.

During the past few years, the theme of Global AI Governance has gained broad public and policy attention favoring a wide review of the impacts of current AI development path. We understand that AI’s underlying technologies and AI systems’ performance continue to develop at an exponential pace. This provides significant opportunities as well as substantial risks. Here we focus on risks, and particularly extreme or “existential” risks.

Following the arguments and calls made by the world’s most eminent AI experts, we share key Concerns regarding the development of Advanced AI, i.e.: Safety, Alignment, Transparency, Trustworthiness, Fairness.

Many efforts have already been made to address some of these Concerns, and we are fully supportive of them. However, there is a substantial gap between the efforts made to improve AI systems’ performance (the “AI race”) and efforts made to strengthen AI safety and to address the other Concerns. This is due to a variety of reasons such as the limited focus and investment by companies, IPR issues, underestimation of the risks by Governments, difficulty of managing Generative AI through regulatory frameworks, and the geopolitical context.

In addition, based on our knowledge, there is insufficient attention to integrate standardization in the AI safety field with the AI systems’ research and development cycle.

While we have some initial ideas on the necessary starting point to address this gap, the main objective of the workshop is to openly discuss those ideas, verify if there is consensus on them and, in case, find ways to address the risks that the rapid development of AI poses to humanity. The objective is in no way dismissive of the efforts undertaken by a plurality of actors in this domain: efforts that, in general, we fully support. We believe that our proposal is complementary and should help strengthen and expand the global action required to address the Concerns.

Based on the discussions from the workshop, we hope to draft an outcome document/paper to be published. This explains the format of the workshop below.

Workshop Agenda

09:00 – 9:15	Welcome and registration of participants
9:15-9:30	Introduction by the organizers Major Concerns and existing AI Governance Initiatives: Quick review of some of the most important initiatives underway, Observations about gaps.
9:30 – 10:30	Open discussion session I How to address the Concerns in an effective, timely and proactive way? Is it possible to launch a process to establish a global “entity” to deal with the Concerns, with the goals of: Promoting the (mostly) science & technology-based development of AI systems that operate addressing the Concerns. Producing standards and conformity assessment tools – through an innovative process, integrated with the R&D cycle – helping in the implementation of AI systems that address the Concerns. Is it possible to start this global initiative, by drawing from the best expertise of research labs and academia and with European and Chinese entities as initial actors? We emphasize that this initiative is open to all parties sharing the goals and willing to contribute. Are there other directions that should be pursued?
10:30 – 11:00	Coffee Break
11:00 – 12:00	Open discussion session II Continued: participants are invited to express their views on the points outlined above.
12:00 – 12:30	Review of the discussion outcome To be proposed by the Moderator and Secretary, for review and discussion by participants: Key points of consensus regarding the proposed ideas and possible additional lines of action. List of main disagreements and of matters to be described more precisely. Observations on the challenges and on the feasibility of the initiative.
12:30 – 13:00	To be presented by the Moderator and Secretary for approval Conclusions and indication on the way forward.
13:00 – 14:00	Lunch

No Comments InAll posts

Leonardo Chiariglione
2025-06-16

An introduction to the Neural Network Watermarking Call for Technologies

Introduction

During the last decade, Neural Networks have been deployed in an increasing variety of domains and the production of Neural Networks became costly, in terms of both resources (GPUs, CPUs, memory) and time. Moreover, users of Neural Network based services more and more express their needs for a certified service quality.

NN Traceability offers solutions to satisfy both needs, ensuring that a deployed Neural Network is traceable and any tampering detected.

Inherited from the multimedia realm, watermarking assembles a family of methodological and application tools allowing to imperceptibly and persistently insert some metadata (payload) into an original NN model. Subsequently, detecting/decoding this metadata from the model itself or from any of its inferences provides the means to trace the source and to verify the authenticity.

An additional traceability technology is fingerprinting that relates to a family of methodological and applicative tools allowing to extract some salient information from the original NN model (a fingerprint) and to subsequently identify that model based on the extracted information.

Therefore, MPAI has found the application area called “Neural Network Watermarking” to be relevant for MPAI standardization as there is a need for both Nural Network Traceability technologies and for assessing the performances such technologies.

MPAI available standards

In response to these needs, MPAI has established the Neural Network Watermarking Development Committee (NNW-DC). The DC has developed Technical Specification: Neural Network Watermarking (MPAI-NNW) – Traceability (NNW-NNT) V1.0. This specifies methods to evaluate the following aspects of Active (Watermarking) and Passive (Fingerprinting) Neural Network Traceability Methods:

The ability of a Neural Network Traceability Detector/Decoder to detect/decode/match Traceability Data when the traced Neural Network has been modified,
The computational cost of injecting, extracting, detecting, decoding, or matching Traceability Data,
Specifically for active tracing methods, the impact of inserted Traceability Data on the performance of a neural network and on its inference.

MPAI-NNW Future Standards

During its 57th GA held on June 11th, MPAI released a Call for Neural Network Watermarking (MPAI-NNW) – Technologies (NNW-TEC). This call requests Neural Network Traceability Technologies that make it possible to:

verify that the data provided by an Actor, and received by another Actor is not compromised, i.e. it can be used for the intended scope,
identify the Actors providing and receiving the data, and
evaluate the quality of solutions supporting the previous two items.

An Actor is a process producing, providing, processing, or consuming information.

MPAI NNT Actors

Four types of Actors are identified as playing a traceability-related role in the use cases.

NN owner – the developer of the NN, who needs to ensure that ownership of NN can be claimed.
NN traceability provider – the developer of the traceability technology able to carry a payload in a neural network or in an inference.
NN customer – the user who needs the NN owner’s NN to make a product or offer a service.
NN end-user – the user who buys an NN-based product or subscribes to an NN-based service.

Examples of Actors are:

Edge-devices and software
Application devices and software
Network devices and software
Network services

MPAI NNT Use cases

MPAI use cases relate to both the NN per se (i.e., to the data representation of the model) and to the inference of that NN (i.e., to the result produced by the network when fed with some input data), as illustrated in Figure 1.

Figure 1: Synopsis of NNT generic use cases: Identify the ownership of an NN, Identify an NN (e.g. DOI) and Verify integrity of an NN

The NNW-TEC use cases document is available; it includes sequence diagrams describing the positions and actions of the four main Actors in the workflow.

MPAI NNT Service and application scenarios

MPAI NNT is relevant for services and applications benefitting from one or several conventional NN tasks such as:

Video/image/audio/speech/text classification
Video/image/audio/speech/text segmentation
Video/image/audio/speech/text generation
Video/image/audio/speech decoding

Figures 2, 3 and 4 present three typologies of services and applications aggregating the generic use cases presented above.

The first example (Traceable newsletter service – Figure 2) covers the case where an end-user subscribes to a newsletter that is produced by a Generative AI service (provided by an NN customer), according to the end-user profile. In such a use case, a malicious user might try to temper the very production of the personalized content or to modify it during its transmission.

The second example (Autonomous vehicle services – Figure 3) deals with the traceability and authenticity of the multimodal content that is exchanged in various ways: (1) the car A (acting as an NN end-user) sends to a server (acting as an NN customer or owner) acquired signals for data-processing, (2) An embedded AI transmits instructions such as braking, turning, or accelerating to the car (NN owner and end-user), (3) Another vehicle B in the environment transmits environmental information to vehicle A. Various types of malicious attacks with critical consequences can be envisaged: AI interception and corruption (e.g. adversarial learning), les données can be corrupted in their transmission from and/or to the autonomous vehicle (forced connection interruption or data modification).

The third example (AI generated or processed information services – Figure 4) shows how NNT can be beneficial when real images are modified by a deepfake process. A user capturing a video sequence by a connected camera would like to appear as the archetype secret agent (say, James Bond) by interacting with a generative AI service remotely accessible in the network. This module synthesizes novel audiovisual content, which is then rendered on a large display for the user to enjoy. Such services are not immune from security threats: the attacker can intercept the encoded stream prior to its arrival at the trusted AI server and processed is through a malicious edge-deployed generative AI or it can affect the very trusted generative AI service (e.g. by employing some adversarial training techniques).

Figure 2: Autonomous vehicle services

Figure 3: AI-generated newsletter example

Figure 4: AI generated or processed information services

No Comments InAll posts

Leonardo Chiariglione
2025-06-11

MPAI calls for Neural Network Traceability Technologies

Geneva, Switzerland – 11^th June 2025. MPAI – Moving Picture, Audio and Data Coding by Artificial Intelligence – the international, non-profit, unaffiliated organisation developing AI-based data coding standards – has concluded its 57^th General Assembly (MPAI-57) with the publication of the Call for Neural Network Traceability Technologies and three supporting documents.

The Call for Technologies: Neural Network Watermarking (MPAI-NNW) – Technologies (NNW-TEC) requests Neural Network Traceability Technologies that make it possible:

To verify that the data provided by an Actor, and received by another Actor is not compromised, i.e. it can be used for the intended scope.
To identify the Actors providing and receiving the data.
To evaluate the quality of solutions supporting points 1 and 2 above implemented with the proposed Neural Network Traceability Technologies.

An Actor is a process producing, providing, processing, or consuming information.

An online presentation of the Call will be made on 2025/07/01T15 UTC. Please register at https://bit.ly/4mW6AWX to attend.

Responses to the Call are due to the MPAI Secretariat on 2025/09/27 T23:59 UTC.

MPAI is continuing its work plan that involves the following activities:

AI Framework (MPAI-AIF): developing a new MPAI-AIF specification that facilitates the creation of new workflows using available AIMs.
AI for Health (MPAI-AIH): developing the specification of a system receiving and processing licenses AI Health Data and enabling clients to improve health processing models via federated learning.
Context-based Audio Enhancement (CAE-DC): developing the Audio Six Degrees of Freedom (CAE-6DF) and Audio Object Scene Rendering (CAE-AOR) specifications.
Connected Autonomous Vehicle (MPAI-CAV): investigating extensions of the current CAV-TEC specification.
Compression and Understanding of Industrial Data (MPAI-CUI): developing the Company Performance Prediction V2.0 specification.
End-to-End Video Coding (MPAI-EEV): exploring the potential of AI-based End-to-End Video coding.
AI-Enhanced Video Coding (MPAI-EVC): developing an optimised Up-sampling Filter for Video applications (EVC-UFV) standard.
Governance of the MPAI Ecosystem (MPAI-GME): working on version 2.0 of the Specification.
Human and Machine Communication (MPAI-HMC): developing reference software and performance assessment.
Multimodal Conversation (MPAI-MMC): Developing the notion of Perceptive and Agentive AI (PAAI) capable of handling more complex questions.
MPAI Metaverse Model (MPAI-MMM): extending the capabilities of the MMM-TEC specs to support more applications.
Neural Network Watermarking (MPAI-NNW): Issuing a Call on Neural Network Traceability Technologies.
Object and Scene Description (MPAI-OSD): extending the capabilities of the MPAI-OSD V1.3 to support more applications.
Portable Avatar Format (MPAI-PAF): extending the capabilities of the MPAI-PAF V1.4 to support more applications.
AI Module Profiles (MPAI-PRF): extending the scope of the current version of AI Module Profiles.
Server-based Predictive Multiplayer Gaming (MPAI-SPG): exploring new standard opportunities in the domain.
Data Types, Formats, and Attributes (MPAI-TFA) extending the standard to data types used by MPAI standards (e.g., automotive, health, and metaverse).
XR Venues (MPAI-XRV): developing the standard for improved development and execution of Live Theatrical Performances.

Legal entities and representatives of academic departments supporting the MPAI mission and able to contribute to the development of standards for the efficient use of data can become MPAI members.

No Comments InAll posts

Leonardo Chiariglione
2025-05-26

Soon MPAI will be five

Abstract

In five years, MPAI – Moving Picture, Audio, and Data Coding by Artificial Intelligence, the international, unaffiliated, not-for-profit association developing standards for AI-based data coding – has been carrying out its mission, making its results widely known and its status recognised. However, not all are clear why MPAI was established, which are its distinctive features, and why it stands out among the organisations developing standards.

This is the first of a series of articles that will revisit the driving force that allowed MPAI to develop 15 standards in a variety of fields where AI can make the difference and have half a score under development. The articles will pave the way for the next five years of MPAI.

1 Standards have a divine nature

MPAI was established as a Swiss not-for-profit organisation on 30 September 2020 with the mission of developing data coding standards based on Artificial Intelligence. The first element that has driven MPAI into existence is the very word STANDARD. If you ask people “what is a standard?” you are likely to obtain a wide range of responses. The simplest, most effective, and although rather obscure answer is instead “a tool that connects minds”. A standard establishes a paradigm that allows a mind to interpret what another mind expresses in words or by other means of communication.

The example which is most immediate yet with a deep mening of standard is language itself. A language assigns labels – words – to physical and intellectual objects. Apparently, the language-defined labels are what make humans different from animals, even though there are innumerable forms of language used by animals that are less rich than the human one.

Words have a divine nature. So, it is no surprise that St. John’s Gospel starts with the sentence “In the beginning was the Word, and the Word was with God, and God was the Word”.

Is this divine nature only applicable to language? Well, no. If we say that 20 mm of rain fell in 4 hours, we are using the language to convey information about rain that would be void of a quantitative value if there was not a standard for length called meter and a standard for time called hour.

Next to these lofty goals, standards have other practical purposes, A technical standard enacted by a country may be used as a tool to limit and sometimes even outlaw products not conforming to that standard, in that country. The existence of standards used for this purpose was the main driver toward the establishment of the Moving Picture Experts Group (MPEG): a single digital video standard that eventually uprooted scores of analogue television standards, with myriads of sub-standards. The intention was to nullify this malignant use of standards.

The goal of MPEG was to serve humans by enabling them to hear and see, thanks to machines that were able to “understand” the bits generated by other machines. In other words, it was necessary that machines be made able to understand the “words” of other machines.

Come 2020, Artificial Intelligence was not yet in the headlines, but the direction was clear. AI systems would be endowed with more and more “intelligence”, communicate between each other, and let humans communicate with them. What forms the “word” would take were not clear, but that the forms would be manifold was.

While MPEG had ushered in a new spirit in standardisation, it had also fostered a transformation in the way the need for standards would take shape and how standards would be exploited. In the early MPEG days, most industries still had research laboratories where the advancement of technology was monitored and fostered with a view of exploiting technology for new products and services. New findings would be patented and new patent-based products launched. The market would bless the winner among different products doing more or less the same thing with different technologies.

MPEG rendered useless the costly step of battling on patents in the market bringing to the standards committee a battle on standards. As a patent in a successful standard was highly remunerative, it did not take much time for the industry to invest in any patentable research. By industry, we intend “any” industry, actually not even industries that had a business in products and services. The idea was that a day would come that a patent would be needed in “a” standard. That single patent would repay the tens-hundreds-thousands of unused patents.

This was significant progress in terms of optimising efforts to generate innovation, and letting more actors join the fray. That progress, however, came at a cost, the disconnect of exploitation from generating innovation. An industry that had made an innovation to achieve an investment had every interest to see that innovation deployed. A company that has a portfolio of 10,000 patents may decide not to license a patent (in practice, by dragging its feet in releasing a patent) if that helps it license a more remunerative patent instead.

Certainly, this epochal evolution has resulted in a more efficient generation of innovation but is actually hindering exploitation of valuable innovations.

2 Inside MPAI

MPAI has been established to uphold the social value of standards, to adapt the notion of standard to the new Artificial Intelligence domain, and to rescue the right of society to exploit innovation. Let’s review how this has been implemented.

Laws pay much attention to standards because the patents that enable them give monopoly rights to exploit the patents to holders for a significant time span. This is the reason why a standards body requests that a proposal of a standard be accompanied by a declaration that, in simplified form, boils down to answering one of the three possibilities: if there are patent rights to the enabling technologies, does the submitter agree to allow the use of the patents for free, or are royalties applied, or is the use of patents not allowed? Setting aside the 1^st and 3^rd cases, in the second case the submitter is asked to declare that it will license the patents “to an unrestricted number of applicants on a worldwide, non-discriminatory basis and on reasonable terms and conditions to make, use and sell implementations of the above document” (i.e., the standard).

In past ages this used to be good enough, but is this still OK in an age when technology moves fast? Is it OK if it takes ten years for the declaration to be implemented and if, ten years later, a user discovers that the licensing terns are not acceptable?

A reasonable answer is that, no, this handling of patents in standards is not acceptable because society is deprived of technologies that may bring significant benefits to it. It may also not be acceptable to some patent holders because they are deprived of their ability to monetise the fruits of their efforts.

The MPAI handling of patents is exactly the opposite on this point. MPAI Members engaged in the development of a standard agree on a set of statements called “Framework Licence” that remove some uncertainty on the time and terms of the eventual licence.

Why only “some” and not all uncertainty? Because there are so-called antitrust laws that forbid this full clarification as this could be maliciously used to distort the market. As Monsieur de Montesquieu used to say: “Better is the enemy of good”, MPAI wants to make things that are good and not perfect things that create problems.

MPAI has developed a model process to develop standards which is based on a combination of openness – when the purpose of a standard is developed – and restriction – when the technologies making up the standard are developed. Figure 1 depicts the eight steps of a standard.

Figure 1 – The MPAI standard life cycle

A standard may be proposed by anybody (Stage 0). To define the scope of the standard, use cases are developed (Stage 1). To define what the standard should exactly do, Functional Requirements are developed (Stage 2). Participation in these stages is open to anybody.

To define the conditions of use of the standard, Commercial Requirements are developed (Stage 3). Of course, MPAI is not engaged in any sort of commerce. “Commercial Requirements” are what above have been called Framework Licence. When the two types of requirements are ready, a Call for Technologies is issued (Stage 4).

Anybody can answer the call (Stage 5), but submitted technologies must satisfy the requirements. Proposed technologies are used to develop the exclusive standard with the participation of MPAI Members and the respondents to the call if they have joined MPAI. If they do not, their submission is discarded.

When the standard has reached a stability plateau, MPAI may publish the current draft before final publication (Stage 6). Anybody is entitled to send their comments to the MPAI Secretariat. Comments will be considered for inclusion in the published standard (Stage 7).

Stage 7 is divided in four Steps. The first Step concerns the publication of the standard, but a standard usually includes other Steps. Step #2 is the reference software, a conforming implementation of the standard that is included in the standard itself with reference to the MPAI Git from where the software – published with a BSD 3 Clause licence – can be downloaded. The third Step is Conformance Testing. An implementation of an AI Workflow conforms with MPAI-MMC if it accepts as input and produces as output Data and/or Data Objects (the combination of Data of a Data Type and its Qualifier) conforming with those specified by MPAI-MMC (see later for more about Qualifiers).

The last Step is Performance Assessment. Performance is a multidimensional entity because it can have various connotations, and the Performance Assessment Specification should provide methods to measure how well an AIW performs its function, using a metric that depends on the nature of the function, such as:

Quality: the Performance of an AI System that Answers Questionscan measure how well it answers a question related to an image.
Bias: Performance of the same AI System can measure the quality of responses depending on the type of images, i.e., the ability of the System to provide balanced unbiased answers.
Legalcompliance: the Performance of an AIW can measure the compliance of the AI System to a regulation, e.g., the European AI Act.
Ethicalcompliance: the Performance Assessment of an AI System can measure its compliance to a target ethical standard.

3 The MPAI mission

According to Article 3 of the MPAI Statutes, the mission of MPAI is to develop standards for data coding using Artificial Intelligence (AI). However, to fully understand this mission, two fundamental questions must be addressed: What exactly is a data coding standard? And what role does AI play in data coding?

In today’s digital society, it is widely recognized that we are inundated with data. One estimate predicts that by 2025, approximately 180 zettabytes (10²¹ bytes) of data will be generated – a 20% increase from the estimated 150 zettabytes produced in 2024. This raises critical questions: What does all this data represent? What are the costs associated with storing even a portion of it? And how expensive is it to transmit such vast amounts of information?

These concerns are not new. Since the advent of the digital era, it has been clear that applications often do not require all available data, or that data can be represented in more compact forms using fewer bits. This concept laid the foundation for data processing techniques that have been refined over time, ultimately enabling the widespread use of technologies like video.But new problems and needs surface. We continue to need less data with the same scope but there is a growing need to automatically “understand” the meaning of this many data and we are using ever larger amounts of data to transfer human knowledge to machines to equip them with new capabilities – what we have been accustomed Artificial Intelligence.

What does it mean to develop AI standards, then?

It is one thing to say that we need standards for AI and quite another to say what a standard for AI should specify. The first norm is that the standard should specify what flows though the interface of an AI System – data – but not what is in the AI System that processes input data and produces output data. What is an AI System is left undefined. It can be a large system, or a small one as long as the system retains a practical value as an individual entity.

A standard that enables the building of larger AI Systems from smaller components has advantages because it allows users to have a better idea of the operation of the system, it facilitates integration of components from disparate competence fields, enables optimisation of individual components, promotes the availability of components made available by competing developers, and more.

Component systems are thus a target for AI-based data coding standards, but what is the process that can produce candidate components for standardisation? The answer to this question is in identifying application domains and, within these, representative use cases. For each use case, an analysis is made of what components might be needed and which data enter and exit systems and components. The process could be implemented because the MPAI membership comes indeed from a variety of domains: media, finance, human-machine interaction, online gaming, entertainment, and more.

Having components – the building blocks – is a necessity but how should components interconnect with other components to make a system? And, once there is a system, how is it going to be executed?

This was one of the first questions that MPAI had to address. The solution found was called AI Framework, an environment enabling initialisation, dynamic configuration, execution, and control of components called AI Modules (AIM) assembled in systems called AI Workflows (AIW).

The reference model of the standard – called AI Framework (MPAI-AIF), now at Version 2.1 – is depicted in Figure 2. The model assumes that there is:

A User Agent so that users can act on the system.
A Controller that
1. Provides basic functionalities such as scheduling, communication between AIMs and other AIF Components such as AIM-specific and global storage.
2. Acts as a resource manager, according to instructions given by the User through the User Agent.
3. Exposes API for User Agent, AIMs, and Controller-to-Controller.
4. Downloads AIWs and AIMs from the MPAI Store.
Communication connecting an output Port of an AIM with an input Port of another AIM.

Figure 2 – Reference Model of MPAI-AIF

The MPAI Store is a foundational element of the AIF architecture. Assume that you are an AIM developer and that you want to make available your latest AIM for users to download and use in their systems, e.g., to replace an existing AIM with the same functionality in an app. Who certifies that the AIM is a correct implementation of an MPAI standard, and is also secure?

This is a new problem because, in the current context, the integration of components is done by a device, service, or application developer who has both the means and the technical expertise to do a proper job. Instead, MPAI is attempting to make possible a world where components – not full-fledged applications – can be downloaded by a user without particular expertise and integrated in an application developed by a third party.

This is an issue because any standard gives rise to an ecosystem whose actors are: the entity developing the standard, the implementer of the standard, the user of the standard, and the guarantor of the correct functioning of the ecosystem. Depending on the application domain, the guarantor can be the state, a certification authority, or even the manufacturer. Making the state the guarantor is not a solution for the current reality. In principle, the “manufacturer” of MPAI solutions (AIW) may very well not exist, but only the developer of basic components distributed as AIMs. So, we need a new entity – the MPAI Store.

The MPAI Store is incorporated in Scotland as a company limited by guarantee, a not-for-profit business set up to serve social, charitable, community-based or other non-commercial objectives. MPAI has signed an agreement with the MPAI Store giving it the exclusive rights to act as Implementer ID Registration Authority (IIDRA). Any entity wishing to develop and distribute AI Systems (AIW) and components (AIM) for use in an AI Framework (AIF) shall obtain an ID from the IIDRA.

How is an AIW or AIM implementation distributed? The registered implementer submits the implementation to the MPAI Store. The implementation is tested for conformance according to the Conformance Testing specification, a document developed, approved, and published by MPAI for each MPAI standard. If the implementation passes conformance testing, it is verified for security and posted on the MPAI Store website for users to access, download, and install.

All done, then? Well, in general at least, no. Conformance simply checks that an implementation receives data with the correct format, does what the reference standard specifies, and produces data with the correct format. But there is no guarantee that the AIM or AIW does a “good job”.

This is the consequence of the MPAI standard specifying input and output data and functionality but being silent of how the functionality is achieved. The Performance Assessment specification mentioned above makes up for the need of a measure of how “good” an implementation is. As we have already mentioned “good” has many dimensions and it would make no sense for the MPAI Store to attempt to issue statements on how good an implementation is. MPAI can appoint entities as Performance Assessors for specific domains and the MPAI Store can publish Performance Assessments of an implementation.

Figure 3 depicts the actors of the MPAI ecosystem and their functions.

Figure 3 – Management of the MPAI Ecosystem

Technical Report: Governance of the MPAI Ecosystem (MPAI-GME) V1.1 specifies the functions and responsibilities of the MPAI Ecosystem’s actors.

1 An overview of MPAI standards

A foundational element of the MPAI initiative is that data produced in application domains benefits from appropriate representations, and that AI is an excellent technology that powers such representations. Of course, the semantics of the data often depends on the application, but the ways AI is applied are similar. AI is the current technology unifying disparate applications that share data generation.

Because of this, the scope of applications targeted by MPAI standards is most diversified. The current scope of applications is depicted by Figure 4.

Figure 4 – Application areas targeted by MPAI standards

Application domains in Figure 4 are organised in alphabetical order. So, the first is AI Framework of which some details have already been provided. Context-based Audio Enhancement (MPAI-CAE) is the next. In the first phase of this project four use cases were considered: Preservation of open-reel tapes, Emotion-enhanced speech, Speech Restoration System, and Enhanced audio conference. These are the names of the four AIWs specified in MPAI-CAE Use Cases (CAE-USC).

MPAI-CAE offers an opportunity to introduce the way MPAI identifies projects, standards and subdivisions, if any. An MPAI project is identified by three characters preceded by MPAI, e.g., MPAI-AIF for AI Framework and MPAI-CAE for Context-based Audio Enhancement. A project that is expected to originate a standard is identified with the same acronym of the project. A project may originate a single document for the corresponding standard. This is the case of MPAI-AIF. A standard may be revised. Different revisions are called versions. A version is indicated with the letter V followed by two digits separated by a period. For instance, the latest version of MPAI-AIF is V2.1.

Some areas are prolific of standards. In its first edition, MPAI-CAE first dealt with four Use Cases. Subsequently, another sub-project was started: Six degrees of freedom. Clearly, this belongs to the MPAI-CAE project but is definitely different from one collecting Use Cases. Adding the new standard to the existing Use Cases would make the standard unwieldy and possibly not attractive to implementers. For this reason, a new “part” of the MPAI-CAE standard was introduced: 6DF. Therefore, the MPAI-CAE standard has the current structure:

Standard: Context-based Audio Enhancement (MPAI-CAE)

Part 1: Use Cases (CAE-USC) V2.3

Part 2: Six degrees of Freedom (CAE-6DF).

Part 3: Audio Object Scene Rendering (CAE-AOR).

Part 2 and 3 have not been approved yet.

The Automotive project is called Connected Autonomous Vehicle (MPAI-CAV). Its goal is to specify a Connected Autonomous Vehicle’s Architecture composed of Subsystems – implemented as AI Workflows (AIW) – and Components – implemented as AI Modules (AIM) – exchanging Data Types specified by JSON Schemas. MPAI-CAV started as a single standard, then the need for an Architecture standard was identified (CAV-ARC) and finally a new part was developed, approved, and published: Technologies (CAV-TEC), currently at V1.0.

The Finance project has an even broader name: Compression and Understanding of Industrial Data (MPAI-CUI). A first version of the standard was published as MPAI-CUI, but the new V2.0 under development benefits from the clarification explained above and is called Company Performance Prediction (CUI-CPP). Its goal is to specify the AI Workflow, the AI Modules, and the Data Types enabling a system receiving Finance and Governance Data, Primary Risk Data, (i.e., data for which an authorised AI Module is available), Secondary Risk Assessment (i.e., data for which only company-provided assessment is available), and a Predictions Horizon to produce Governance Descriptors, Primary Default and Discontinuity Descriptors, and Secondary Discontinuity Probability.

The next area is Human-Machine interaction. This is not one but is two projects. Let’s start from the second (in alphabetical order) which was the first to be kicked off: Multimodal Conversation (MPAI-MMC). Currently at V2.3, MPAI-MMC provides technologies enabling a machine to interact with an Entity (a human but potentially a machine as well) by understanding not only what the Entity expresses in terms of Text and Speech but also Face and Gesture. Using this information, the machine can create a textual response and a “simulated” emotion. The second project is Human and Machine Communication that addresses a more extensive version of the same problem.

MPAI Metaverse Model (MPAI-MMM) is a project that has had an evolution similar to that of MPAI-CAV. It was first developed as a single standard, then the Architecture part was started (MMM-ARC). Currently the Technologies (MMM-TEC) V2.0 includes all the results obtained in the 3.5 years since the project was started. The standard enables the development of metaverse instances (M-Instances) that can communicate with other independently designed M-Instances and clients.

Neural Network Watermarking (MPAI-NNW) is an MPAI project developing Technical Reports and Specifications on the application of watermarking and other content management technologies to neural networks. MPAI-NNW V1.0 specifies standard settings to measure the performance of a Watermarked Neural Network. Neural Network Traceability V1.0 extends the methodology to include fingerprinting.

The Object and Scene Description project (MPAI-OSD) is an excellent example of how MPAI standards are developed within one project and then reused by other projects. MPAI-OSD specifies technologies enabling the digital representation of spatial information of Audio and Visual Objects and Scenes for consistent use across MPAI Technical Specifications. Today the definition feels slightly restrictive because the current scope has considerably widened to include a variety of object and scene types beyond audio and video: LiDAR, RADAR, Ultrasound, and even maps.

The Portable Avatar Format (MPAI-OSD) specifies the Portable Avatar (PA) and related data types that enable a receiving party to render a digital human represented by a Portable Avatar as intended by the sending party. Among the data types included in the PA are Text, Speech Model, Avatar – which includes Face and Body Descriptors, and Avatar Model – and Scene Descriptors.

AI Module Profiles (MPAI-PRF) V1.0 enables an instance of AI Module to signal its Attributes – input data, output data, or functionality – and Sub-Attributes that uniquely characterise it. The notion of AIM Profile is being extended to AI Workflows.

Server-based Predictive Multiplayer Gaming (MPAI-SPG) – Mitigation of Data Loss Effects (SPG-MDL) V1.0 provides guidelines on and an example of the design and use of Neural Networks producing reliable and accurate predictions. In case the control data of some players in a multiplayer game based on authoritative server are missing, the predictions can be used to compensate for the absence of control data.

Data Types, Formats and Attributes (MPAI-TFA) V1.4 specifies Qualifiers, data types that facilitate or even enable the operation of an AI Module that receives a data type instance by providing information about Sub-Types (e.g., colour space), Formats (e.g., compression and transport), and Attributes (e.g., metadata) of the data type. Currently, Qualifiers are defined for Automotive, Health, Machine Learning, Media, Metaverse, and Space-Time.

No Comments InAll posts

Leonardo Chiariglione
2025-05-15

MPAI publishes AIW and AIM Implementation Guidelines for Community Comments

Geneva, Switzerland – 14^th May 2025. MPAI – Moving Picture, Audio and Data Coding by Artificial Intelligence – the international, non-profit, unaffiliated organisation developing AI-based data coding standards – has concluded its 56^th General Assembly (MPAI-56) with the publication of the Technical Report: AIW and AIM Implementation Guidelines (MPAI-WMG) for Community Comments.

A significant share of MPAI standards rely on AI Framework (MPAI-AIF) that supports componentisation of processing elements called AI Modules (AIM) organised in an AI Framework. As AIMs are agnostic of the technologies used in their implementation – traditional data processing or various types of AI technologies, the AIW and AIM implementation Guidelines Technical Report includes an analysis of a significant number of already specified AIMs from the implementation perspective. MPAI-WMG also includes an analysis of the use of the Perceptive and Agentive AI (PAAI) Model that may provide a new basis for new forms of AI Frameworks.

MPAI is continuing its work plan that involves the following activities:

AI Framework (MPAI-AIF): building a community of MPAI-AIF-based implementers.
AI for Health (MPAI-AIH): developing the specification of a system enabling clients to improve models processing health data and federated learning to share the training.
Context-based Audio Enhancement (CAE-DC): developing the Audio Six Degrees of Freedom (CAE-6DF) standard.
Connected Autonomous Vehicle (MPAI-CAV): investigating extensions of the current CAV-TEC standard.
Compression and Understanding of Industrial Data (MPAI-CUI): developing Company Performance Prediction standard V2.0.
End-to-End Video Coding (MPAI-EEV): exploring the potential of video coding using AI-based End-to-End Video coding.
AI-Enhanced Video Coding (MPAI-EVC): developing the Up-sampling Filter for Video applications (EVC-UFV) standard.
Governance of the MPAI Ecosystem (MPAI-GME): working on version 2.0 of the Specification.
Human and Machine Communication (MPAI-HMC): developing reference software and performance assessment.
Multimodal Conversation (MPAI-MMC): Developing technologies for more Natural-Language-based user interfaces capable of handling more complex questions.
MPAI Metaverse Model (MPAI-MMM): extending the MMM-TEC specs to support more applications.
Neural Network Watermarking (MPAI-NNW): studying the use of fingerprinting as a technology for neural network traceability.
Object and Scene Description (MPAI-PAF): studying applications requiring more space-time handling applications.
Portable Avatar Format (MPAI-PAF): studying more applications using digital humans needing new technologies.
AI Module Profiles (MPAI-PRF): specifying which features AI Workflow or more AI Modules need to support.
Server-based Predictive Multiplayer Gaming (MPAI-SPG): exploring new standard opportunities in the domain.
Data Types, Formats, and Attribes (MPAI-TFA) extending the standard to data types used by MPAI standards (e.g., aomotive and health).
XR Venues (MPAI-XRV): developing the standard for improved development and execion of Live Theatrical Performances and studying the prospects of Collaborative Immersive Laboratories.

Legal entities and representatives of academic departments supporting the MPAI mission and able to contribute to the development of standards for the efficient use of data can become MPAI members.

Please visit the MPAI website, contact the MPAI secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook, Instagram, and YouTube.

No Comments InAll posts

Leonardo Chiariglione
2025-05-04

An overview of the Connected Autonomous Vehicle – Technologies (CAV-TEC) standard

1. Introduction

The Connected Autonomous Vehicle (CAV) specified by CAV-TEC is a system that instructs a vehicle with at least three wheels to reach a Destination from a current Pose at the request of a human or a process. It does so by exploiting information captured and processed by the vehicle and communicated by other CAVs while respecting the local traffic law, Figure 1 represents an example of the type of environment that a CAV is requested to traverse and Figure 2 depicts the four subsystems of which a CAV is composed, although this partitioning is not a functional requirement as components of a subsystem may be located in another subsystem, provided the interfaces specified by CAV-TEC are preserved.

Figure 1 – An example of an environment traversed by a CAV

In Figure 2, a human approaches a CAV and requests the Human-CAV Interaction Subsystem (HCI) to be taken to a destination using a combination of four media – Text, Speech, Face, and Gesture. Alternatively, a remote process may make a similar request to the CAV.

Figure 2 – The subsystems of a CAV

Either request is passed to the Autonomous Motion Subsystem (AMS), which requests the Environment Sensing Subsystem (ESS) to provide the current CAV Pose. With current Pose information, the Destination, the AMS can propose one or more Routes by accessing Offline Maps. Eventually the human or process can select a Route.

With the human aboard, the AMS continues to receive environment information from the ESS – possibly complemented by information received from other CAVs in range – and instructs the Motion Actuation Subsystem to make appropriate motions.

2. Human-CAV Interaction

The CAV-HCI Reference Model of Figure 3 explains the operation of the HCI in its interaction with humans.

Figure 3 – Reference Model of CAV-HCI

The Audio-Visual Scene Description (AVS) monitors the environment and produces Audio-Visual Scene Descriptors from which it extracts Speech Scene Descriptors and from these, Speech Objects corresponding to any speaking humans in the environment surrounding the CAV. Visual Scene Descriptors may also be extracted in the form of Face and Body Descriptors of all humans present.

The CAV activates Automatic Speech Recognition (ASR) to have the speech of each human recognised and converted into Recognised Text. Each Speech Object is identified according to their position in space. The CAV also activates the Visual Object Identification (VOI) that is able to produce the Instance IDs of Visual Objects as indicated by humans.

Natural Language Understanding (NLU) processes the Speech Objects, produces Refined Text, and extracts Meaning from the Text of each input Speech. This process is facilitated by the use of the IDs of the Visual Objects provided by VOI.

Speaker Identity Recognition (SIR) and Face Identity Recognition (FIR) help the CAV to reliably obtain the Identifiers of the the humans the HCI is interacting with. If the Face ID(s) provided by FIR correspond to the ID(s) provided by SIR, the CAV may proceed to attend to further requests. Especially with humans aboard, Personal Status Extraction (PSE) provides useful information regarding the humans’ state of mind by extracting their Personal Status.

The CAV interacts with humans through Entity Dialogue Processing (EDP). When a human requests to be taken to a Destination, the EDP interprets and communicates the request to the Autonomous Motion Subsystem (AMS). A dialogue may then ensue where the AMS may offer different choices to satisfy potentially different human needs (e.g., a long but comfortable Route or short but less predictable).

Then, while the CAV moves to the Destination, the HCI may have a conversation with the humans, show the Full Environment Descriptors developed by the AMS to the passengers, and may communicate information about the CAV from the Ego AMS or more generally from the HCIs of remote CAVs.

The HCI responds using the two main outputs of the EDP: Text and Personal Status. These are used by the Personal Status Display (PSD) to produce the Portable Avatar of the HCI conveying Speech, Face, and Gesture synthesised to render the HCI Text and Personal Status. Audio-Visual Scene Rendering (AVR) renders Audio, Speech, and Visual information using the HCI Portable Avatar. Alternatively, it can display the AMS’s Full Environment Descriptors from the Point of View selected by the human.

The HCI interacts with passengers in several ways:

By responding to commands/queries from one or more humans at the same time, e.g.:
1. Commands to go to a waypoint, park at a place, etc.
2. Commands with an effect in the cabin, e.g., turn off air conditioning, turn on the radio, call a person, open a window or door, search for information, etc.
By conversing with and responding to questions from one or more humans at the same time about travel-related issues, e.g.:
1. Humans request information, e.g., time to destination, route conditions, weather at destination, etc.
2. Humans ask questions about objects in the cabin.
By following the conversation on travel matters held by humans in the cabin if
1. The passengers allow the HCI to do so, and
2. The processing is carried out privately inside the CAV.

3. Environment Sensing Subsystem

The operation of the Environment Sensing Subsystem (ESS) is best explained using the Reference Model of the CAV-ESS subsystem depicted in Figure 4.

Figure 4 – Reference Model of CAV-ESS

When the CAV is activated in response to a request by a human owner or renter or by a process, Spatial Attitude Generation continuously computes the CAV’s Spatial Attitude relying on the initial Motion Actuation Subsystem’s Spatial Attitude, and information from the Global Navigation Satellite Systems (GNSS), if available.

An ESS may be equipped with a variety of Environment Sensing Technologies (EST). CAV-TEC assumes they are (but not required to all be supported by an ESS implementation) Audio, LiDAR, RADAR, Ultrasound, and Visual. Offline Map is considered as an EST.

An EST-specific Scene Description receives EST-specific Data Objects, produces EST specific Scene Descriptors which are integrated into the Basic Environment Descriptors (BED) by the Basic Environment Description using all available sensing technologies, Weather Data, Road State, and possibly the Full Environment Descriptors of previous instants provided by the AMS. Note that, although in Figure 4 each sensing technology is processed by an individual EST, an implementation may combine two or more Scene Description AIMs to handle two or more ESTs, provided the relevant interfaces are preserved. An EST-specific Scene Description may need to access the BED of previous instants and may produce Alerts that are immediately communicated to AMS.

The Objects in the BEDs may carry Annotations specifically related to traffic signalling, e.g.: Position and Orientation of traffic signals in the environment, Traffic Policemen, Road signs (lanes, turn right/left on the road, one way, stop signs, words painted on the road), Traffic signs – vertical signalisation (signs above the road, signs on objects, poles with signs), Traffic lights, Walkways, and Traffic sounds (siren, whistle, horn).

4. Autonomous Motion Subsystem

The Reference Model of the CAV-AMS subsystem depicted in Figure 5 explains the operation of the Autonomous Motion Subsystem (AMS) is.

Figure 5 – Reference Model of CAV-AMS

When the HCI sends the AMS a human or process request to move the CAV to a Destination, Route Planning uses the Basic Scene Descriptors from the ESS and produces a set of Waypoints starting from the current Pose up to the Destination.

When the CAV is in motion, Route Planning causes Path Selection Planning to generate a set of Poses to reach the next Waypoint. Full Environment Description may request the AMSs of Remote CAVs to send (subsets of) their Scene Descriptors and integrates all sources of Environment Descriptors into its Full Environment Descriptors (FED), and may also respond to similar requests from Remote CAVs.

Motion Selection Planning generates a Trajectory to reach the next Pose in each Path. Traffic Obstacle Avoidance receives the Trajectory and checks if any Alert was received that would cause a collision with the current Trajectory. If a potential collision is detected, Traffic Obstacle Avoidance requests a new Trajectory from Motion Planner, otherwise Traffic Obstacle Avoidance issues an AMS-MAS Message to Motion Actuation Subsystem (MAS).

The MAS sends an AMS-MAS Message to AMS informing it about the execution of the AMS-MAS Message received. The AMS, based on the received AMS-MAS Messages, may discontinue the execution of the earlier AMS-MAS Message, issue a new AMS-MAS Message, and inform Traffic Obstacle Avoidance. The decision of each element of the chain may be recorded in the AMS Memory (“black box”).

5. Motion Actuation Subsystem

The operation of the (MAS) is explained using the Reference Model of the CAV-MAS subsystem depicted in Figure 6.

Figure 6 – Reference Model of CAV-AMS

When the AMS Message Interpretation receives the AMS-MAS Message from the AMS, it interprets the Messages, partitions it into commands, and sends them to the Brake, Motor, and Wheel mechanical subsystems. CAV-TEC is silent on how the three mechanical subsystems process the commands but specifies the format of the commands issued to AMS Message Interpretation. The result of the interpretation is sent as an AMS-MAS Message to AMS.

MAS includes two more AIMs. Spatial Attitude Generation computes the initial Ego CAV’s Spatial Attitude using the Spatial Data provided by Odometer, Speedometer, Accelerometer, and Inclinometer. This initial Spatial Attitude is sent to the ESS to integrate its GSNN-based Spatial Attitude. Ice Condition Analysis augments the Weather Data by analysing the Brake, Motor, and Wheel mechanical subsystems’ responses and sends the augmented Weather Data to the ESS.

6. Conclusions

CAVs promise to bring benefits that will positively affect industry, society, and the environment, for example:

Saving lives and reducing injuries by removing human error thanks to a machine less prone to errors.
Giving humans more time for rewarding activities, such as interpersonal communication.
Optimising the use of vehicles and infrastructure.
Reducing congestion and pollution.
Supporting elderly and disabled people.

Therefore, society and individuals will be positively impacted by the transformation of today’s “niche market” into tomorrow’s vibrant “mass market” of Connected Autonomous Vehicles. A market of interchangeable components conforming with the CAV-TEC V2.0 specifications can offer affordable and safe Connected Autonomous Vehicles sooner and more efficiently than waiting for market forces to produce “monolithic” cars with progressively higher SAE Levels.

The CAV-TEC Reference Model can create a competitive market offering increasingly performing components until a new, more powerful reference model will eventually replace the model with another, initiating a new sequence of performance improvements. The Reference Model will help:

Researchers to optimise component technologies.
Component manufacturers to bring their standard-conforming components to market once they are mature.
Car manufacturers to access an open global market of interchangeable components.
Regulators to oversee conformance testing of components following standard procedures.
Users to rely on Connected Autonomous Vehicles whose operation they can explain to a large extent.

No Comments InAll posts

Leonardo Chiariglione
2025-04-27

Communicating avatars in worlds

An avatar is generally intended as a representation of a real or fictitious human in a virtual space. Research has dedicated much effort to creating and animating realistic avatars. However, the scope of use is typically assumed to be a closed environment such as a proprietary video game. Therefore, the portability of avatars has seldom been a priority.

Some 30 years ago, the Humanoid Animation (H-Anim) standard was developed that defined a human skeleton composed of joints, segments, and sites exhibiting four levels of articulation and the default skeleton pose. The latest versions of the H-Anim standards are ISO/IEC 19774-1:2019, ISO/IEC 19774-2:2019).

An avatar is a basic but not the only element in an application. What if you want to convey to a third party a speaking avatar immersed in an environment with all its features so that it is displayed as you intended?

Technical Specification: Portable Avatar Format (MPAI-PAF), whose Version 1.4 has recently been approved offers a solution to this problem for a broad range of applications called Portable Avatar. Personal Status is a package of data conveying the following information:

The ID of the virtual space (M-Instance) where the Portable Avatar is to be placed.
The space and time information of the “environment” to be placed in the M-Instance.
The Audio-Visual Scene representing the “environment”.
The space and time information of the Avatar in the scene.
The Avatar represented as a 3D Model, its Face Descriptors and Body Descriptors.
The Language Preference of the Avatar.
The Text Object the Avatar is associated with, or which will be converted into a Speech Object.
The Speech Model used to synthesise the Text Object.
The Speech Object alternative to the Text Object that the Avatar utters.
The Personal Status of the Avatar.

Here is a brief description of the Portable Avatar components.

The ID of the virtual space (M-Instance). This is the ID of a virtual space where the Portable Avatar is to be placed. It can be a metaverse (for MPAI, this would be an M-Instance of MMM-TEC).
The space and time information of the “environment”. MPAI has defined a data type called Space-Time that defines:
1. The space information as Spatial Attitude, Position, Orientation, and optionally their velocities and acceleration.
2. Time defined as either absolute (from 1970/01/01T00:00 or from an arbitrary origin of time.
The Audio-Visual Scene. MPAI has defined Scene as a data type that describes a scene as composed of scenes and objects with their Space-Time information. The MPAI scene is thus hierarchical (see Audio-Visual Scene Descriptors).
The space and time information of the Avatar considered a particular type of object in the scene. The Avatar data type has its own space-time information. This is overridden by the scene time information, if different.
The Avatar. The MPAI Avatar data type is a structure that includes:
1. The space-time information (that is overridden by the space-time information of the scene).
2. The 3D Model Object composed of data and Qualifier giving additional information to the data, e.g., the format.
3. The Face Descriptors. MPAI has adopted the Actions Units of the Facial Action Coding System (FACS).
4. The Body Descriptors. MPAI has adopted the H-Anim standard, but the 3D Model Qualifier allows the use of other standards.
The Language Preference. MPAI supports the signalling of a large number of language codes.
The Text Object. An avatar may have a textual description represented by a Text Object (text data and Qualifier providing various types of information on the text, e.g., language and character code).
The Text Object may be used to synthesise a Speech Object (speech data and Qualifier). The Portable Avatar can convey a neural network speech model to synthesise the text. In MPAI, a neural network model (more generally, a Machine Learning Model) has an associated Qualifier providing various types of information such as the conformity of the model to a particular regulation.
The Speech Object. In some cases, the speech associated with the avatar is conveyed by the Portable Avatar. Same as for the Text Object, a Speech Object includes speech data and a Qualifier that may be used to provide information on the language, compression format, speaker identity etc.
The Personal Status. The Personal Status is an MPAI data type including information on the Cognitive State, Emotion, and Social Attitude of the Text, Speech, Face, and Gesture of an Entity (in MPAI Entity is used to indicate either a human or the process animating an avatar).
1. Cognitive State represents the internal state of an Entity that has knowledge of the context such as “surprised” or “interested”.
2. Emotion represents the internal state of an Entity such as that resulting from its interaction with the context, such as “angry” or “sad”.
3. Social Attitude represents the internal state of an Entity related to the way it intends to position itself vis-à-vis the context, e.g., “respectful” or “soothing”.

The Portable Avatar is an essential component of a variety of use cases. It is typically used as input to the Audio-Visual Scene Rendering AI Module that produces Speech, Audio, and Visual Objects from Portable Avatar, Audio-Visual Scene Descriptors (in case one is not available in the Portable Avatar), and a Point of View as depicted in Figure 1.

Figure 1 – Audio-Visual Scene Rendering AI Module

The Personal Status Display (PAF-PSD) AIM produces a Portable Avatar corresponding to an Avatar Model uttering a Speech Object synthesised from a Text Object with a Speech Model and displaying a Personal Status:

Figure 2 – Personal Status Display

Here, the input is a Text Object, a Neural Network Speech Model (in case the PAF-PSD does not have one embedded), an Avatar Model, and a Personal Status:

The Text is used to synthesise speech modulated with the Speech component of the input Personal Status.
The Speech, input Text and Face component of the input Personal Status, and the input Avatar Model are used to synthesise a face;
The Text, the Gesture component of the input Personal Status, and the input Avatar Model are used to synthesise the body.

In MPAI, the Lego approach to avatar deployment in applications is a reality.

No Comments InAll posts

Cookie	Duration	Description
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Technical".
CookieLawInfoConsent	1 year	The cookie is set by the GDPR Cookie Consent plug-in and is used to store whether the user has consented to the use of cookies or not. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_pk_id.6.08a8	13 months	Used to store a few details about the user such as the unique visitor ID
_pk_ses.6.08a8	30 minutes	Short lived cookies used to temporarily store data for the visit

Archives: 2024-01-27

2025 July 10 T09-13 – Campus Biotech Innovation Park, Av. de Sécheron 15, 1202 Genève, Switzerland (CH)

Abstract

1 Standards have a divine nature

2 Inside MPAI

3 The MPAI mission

1 An overview of MPAI standards

1. Introduction

2. Human-CAV Interaction

3. Environment Sensing Subsystem

4. Autonomous Motion Subsystem

5. Motion Actuation Subsystem

6. Conclusions

Notice