Moving Picture, Audio and Data Coding
by Artificial Intelligence

MPAI is running at full speed

Established in September 2020, MPAI has published five standard this week bringing the total to nine. Let’s see what they are about.

MPAI Metaverse Model (MPAI-MMM) – Architecture is the first technical metaverse standard published by a standard body. MPAI MMM specifies technologies enabling two metaverse instances M-InstanceA and M-InstanceB to interoperate if they: rely on the same Operation Model, use the same Profile, and either use the same technologies, or use independent technologies while accessing Conversion Services that losslessly transform data of an M-InstanceA to data of an M-InstanceB.

AI Framework (MPAI-AIF) V2 specifies a secure environment called AI Framework (AIF) enabling dynamic configuration, initialisation, and control of AI Workflows (AIW) composed of AI Modules (AIM). AIMs and AIWs are defined by function and interfaces; AIWs also by AIM topology.

Connected Autonomous Vehicle (MPAI-CAV) – Architecture  is the first technical standard on connected autonomous vehicles published by a standard body. MPAI-CAV specifies the Architecture of a CAV based on a Reference Model comprising a CAV composed of Subsystems (AIW) with specified Functions, I/O Data, and Topology. Each Subsystem is made up of Components with specified Functions and I/O Data.

Multimodal Conversation (MPAI-MMC) V2 specifies data formats for analysis of text, speech, and other non-verbal components as used in human-machine and machine-machine conversation applications and Multimodal Conversation-related AIWs and AIWs using data formats from MPAI-MMC and other MPAI standards.

Portable Avatar Format (MPAI-PAF) specifies the Portable Avatar and related data formats allowing a sender to enable a receiver to decode and render an Avatar as intended by the sender; the Personal Status Display Composite AI Module allowing the conversion of a Text and a Personal Status to a Portable Avatar; and the AIWs and AIMs used by the Avatar-Based Videoconference Use Case.

Let’s see now which are the previously developed standards.

Context-based Audio Enhancement (MPAI-CAE) specifies data types for the improvement of the user experience in audio-related applications for a variety of contexts using context information and Audio-related AIWs and AIWs using data formats from MPAI-CAE and other MPAI standards.

Neural Network Watermarking (MPAI-NNW) specifies methodologies to evaluate the following aspects of neural network (NN) watermarking-related technologies: The impact on the performance of a watermarked NN and its inference; The ability of an NN watermarking detector/decoder to detect/decode a payload of a modified watermarked NN; The computational cost of injecting, detecting, or decoding a payload in the watermarked NN.

Compression and Understanding of Industrial Data (MPAI-CUI) specifies data formats, AIMs and an AIW to predict a company’s probability of default and business discontinuity, and to provide an organisational model index (Company Performance Prediction Use Case).

Governance of the MPAI Ecosystem (MPAI-GME) specifies the roles and rules of Ecosystem players: MPAI, Implementers, MPAI Store, Performance Assessors, Users.

MPAI was established to develop AI-enabled data coding standards across industry domains and is keeping its promise. Time to join MPAI!

Image by starline on Freepik


New AI-driven standards pioneer the future of immersive entertainment

Press Release

Geneva, 2023/09/29 – The founder of MPEG, Leonardo Chiariglione, inspired by the prospects of AI, is leading an initiative – MPAI (Moving Picture, Audio and Data Coding by Artificial Intelligence) – to drive AI standards that will supercharge next-generation immersive entertainment venues.  They have already developed a range of AI standards for audio enhancement and more natural forms of human-machine conversation and others which have subsequently been adopted by the IEEE.

The MPAI community is now focused on developing standards for XR Venues – particularly venues supporting live theatrical performances where the user experience spans both real and virtual environments.

The purpose of the planned MPAI-XRV – Live Theatrical Stage Performance standard is to address AI functions that facilitate live multisensory immersive performances. Broadway theatres, musicals, dramas, operas, and other performing arts are increasingly using video scrims, backdrops, as well as projection mapping to create digital sets. Such shows ordinarily require extensive digital set design and on-site show control staff to operate. The use of AI will allow faster mounting of shows, more direct, precise yet spontaneous show implementation and control to achieve the show director’s vision. It will also free staff from repetitive and technical tasks allowing them to amplify their artistic and creative skills.

Ultimately, the MPAI-XRV standard will allow the entire performance stage to become an immersive digital virtual environment which, when merged with a metaverse environment, creates a “digital twin” representation of live performers within the virtual world. Major metaverse concert events can therefore originate as a live performance with an in-person audience while simultaneously being enjoyed by millions in virtual reality. Emerging immersive venues such as MSG Sphere and COSM and various immersive art galleries are already well suited to such an approach.

MPAI recently issued a Call for Technologies, inviting industry participation in the MPAI-XRV Live Theatrical Stage Performance standards effort. Participating companies are encouraged to respond to the call at https://mpai.community/standards/mpai-xrv/. Individuals wishing to participate may also join MPAI by contacting secretariat@mpai.community.

MPAI – Moving Picture, Audio and Data Coding by Artificial Intelligence is an international not-for-profit association with the mission to develop AI-based data coding standards with clear IPR frameworks.


MPAI celebrates its third anniversary by publishing five standards

Geneva, Switzerland – 29 September 2023. MPAI, Moving Picture, Audio and Data Coding by Artificial Intelligence, the international, non-profit, and unaffiliated organisation developing AI-based data coding standards has concluded its 36th General Assembly (MPAI-36) approving the publication of five standards: AI Framework V2, Connected Autonomous Vehicle Architecture, Multimodal Conversation V2, MPAI Metaverse Model, and Portable Avatar Format, and one Conformance Testing of the Context-based Audio Enhancement standard.

In three years, MPAI has been able to produce nine standards in the areas of execution of AI applications, audio enhancement, autonomous vehicles, financial data, ecosystem governance, MPAI multimodal conversation, metaverse, neural network watermarking, and portable avatars, produced a second extended version for three, and is now looking forward to receiving responses to two Calls for Technologies on AI for Health and XR Venues – Live Theatrical Stage performance. More information about standards and projects can be found here.

Ominously, the date of the 36th General Assembly falls on the eve of the third anniversary of the MPAI foundation.

MPAI is continuing its work plan that involve the following activities:

  1. AI Framework (MPAI-AIF): reference software, conformance testing, and application areas.
  2. Avatar Representation and Animation (MPAI-PAF): reference software, conformance testing and new areas.
  3. Context-based Audio Enhancement (CAE-DC): new projects.
  4. Connected Autonomous Vehicle (MPAI-CAV): Functional Requirements of CAV architecture.
  5. Compression and Understanding of Industrial Data (MPAI-CUI): preparation for extension of existing standard.
  6. Multimodal Conversation (MPAI-MMC): reference software, drafting conformance testing, and new areas.
  7. MPAI Metaverse Model (MPAI-MMM): reference software and metaverse technologies requiring standards.
  8. Neural Network Watermarking (MPAI-NNW): reference software for enhanced applications.
  9. AI Health (MPAI-AIH): preparation for the development of the standard.
  10. End-to-End Video Coding (MPAI-EEV): video coding using AI-based End-to-End Video coding.
  11. AI-Enhanced Video Coding (MPAI-EVC). video coding with AI tools added to existing tools.
  12. Server-based Predictive Multiplayer Gaming (MPAI-SPG): technical report on mitigation of data loss and cheating.
  13. XR Venues (MPAI-XRV): preparation for the development of the standard.

Legal entities and representatives of academic departments supporting the MPAI mission and able to contribute to the development of standards for the efficient use of data can become MPAI members.

Please visit the MPAI website, contact the MPAI secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook, Instagram, and YouTube.

 

 


Summary of MPAI Calls for Technologies and Standards

MPAI has concluded a series of presentations illustrating 3 Calls for Technologies and 5 Technical Specifications posted for Community Comments planned for final approval on 29 September 2023.

Whether you intend to respond to a Call or make comments on a Standard, get familiar with the website (1st column), the overview (2nd column), the slides (3rd column), and the video recording (4th column) of the presentation of each Call  or Standard.

Calls for Technologies

Artificial Intelligence for Health Data Overview Slides Video
Object and Scene Description Overview Slides Video
XR Venues – Live Theatrical Stage Performance Overview Slides Video

Standards open to Community Comments

AI Framework V2 Overview Slides Video
Avatar Representation and Animation Overview Slides Video
Connected Autonomous Vehicle – Architecture Overview Slides Video
MPAI Metaverse Model – Architecture Overview Slides Video
Multimodal Conversation V2 Overview Slides Video

There are a few weeks to respond to a Call for Technologies and a few more days to comment on the Standards posted for Community Comments.


What is the Object and Scene Description (MPAI-OSD) Call for Technologies about?

Object and Scene Description (MPAI-OSD) is a project for a standard specifying technologies for object description and their localisation in space. Such technologies are used across several use cases of several MPAI standards.

Figure 1 gives two examples that assume the types of output to Audio and Visual Scene Descriptors.

Figure 1 – Audio and Visual Scene Description

The next Figure 2 provides one solution to the problem of assigning identifiers to the Objects – extracted from an audio-visual scene, especially for the purpose of identifying those that are audio-visual such as a human and their speech.

Figure 2 – Audio-Visual Alignment

Another example is provided by Figure 3.

Figure 3 – Visual Spatial Object Identification

Figure 4 is an example of the Conversation with Personal Status use case that makes use of all the (Composite) AI Modules described above.

Figure 4 – Reference Model of Conversation with Personal Status (MPAI-CPS)

MPAI has sought proposals for data formats and reference models for the identified application areas.

Call for Technologies (closed) html,  pdf
Use Cases and Functional Requirements htmlpdf
Framework Licence htmlpdf
Template for responses htmldocx

See also the video recordings (YouTubeWimTV) and the slides of the presentation made on 07 September.

 


An overview of AI Framework (MPAI-AIF)

From its early days, MPAI realised that AI-based data coding standards could facilitate AI explainability if monolithic AI applications could be broken down to individual components with identified functionality processing and producing data with semantics known as far as possible. An important side effect of this approach was identified in the possibility for developers to provide components with standard interfaces and potentially better performance than that provided by other developers. Version 1 of AI Framework (MPAI-AIF) published in September 2021 was the first standard supporting the original vision. Most MPAI application standards are built on top of MPAI-AIF.

Figure 1 – Reference Model of AI Framework (MPAI-AIF) V1

The main features of MPAI-AIF V1 were:

  1. Independence of the Operating System.
  2. Components have specified interfaces encapsulating Components to abstract them from the development environment.
  3. Interface with the MPAI Store enabling access to validated Components.
  4. Components implementable as software only, hardware only, and Hybrid hardware-software.
  5. Execution in local and distributed Zero-Trust architectures.
  6. Possibility to interact with other AIF implementations operating in proximity.

The Components have the following functionalities:

  • Access: to access to static or slowly changing data such as domain knowledge data, data models.
  • AI Module (AIM): a data processing element receiving Inputs and producing Outputs according to its Function. An AIM may be an aggregation of AIMs.
  • AI Workflow (AIW): an organised aggregation of AIMs implementing a Use Case.
  • Communication: connects the Components of an AIF.
  • Controller: can run May run one or more AIWs and exposes three APIs:
    • AIM API modules can register, communicate and access the AIF environment; can start, stop, and suspend AIMs.
    • User API User or other Controllers can perform high-level tasks (e.g., switch the Controller on and off, give inputs to the AIW through the Controller).
    • MPAI Store API communication between the AIF and the Store.
  • Global Storage: stores data shared by AIMs.
  • AIM/AIW Storage: stores data of the individual AIMs (securely/non-securely).
  • MPAI Store: stores Implementations for users to download.
  • User Agent: The Component interfacing the user with an AIF through the Controller

Table 1 gives the APIs exposed by MPAI-AIF V1 are:

Table 1 – APIs of MPAI-AIF V1

# API
8.1 Store API called by Controller
8.1.1 Get and parse archive
8.2 Controller API called by User Agent
8.2.1 General
8.2.2 Start/Pause/Resume/Stop Messages to other AIWs
8.2.3 Inquire about state of AIWs and AIMs
8.2.4 Management of Shared and AIM Storage for AIWs
8.2.5 Communication management
8.2.6 Resource allocation management
8.3 Controller API called by AIMs
8.3.1 General
8.3.2 Resource allocation management
8.3.3 Register/deregister AIMs with the Controller
8.3.4 Send Start/Pause/Resume/Stop Messages to other AIMs
8.3.5 Register Connections between AIMs
8.3.6 Using Ports
8.3.7 Operations on messages
8.3.8 Functions specific to machine learning
8.3.9 Controller API called by Controller

Version 1 assumed that the environment was Zero-Trust but its implementation was left to the developer. Version 2 extends Version 1 by making V1 as the Basic Profile of MPAI-AIF. The Basic Profile utilises:

  1. Non-Secure Controller.
  2. Non-Secure Storage.
  3. Secure Communication enabled by secure communication libraries.
  4. Basic API.

The Secure Profile utilises all the technologies in this Technical Specification.

V2 adds the necessary technology for a new Secure Profile offering the following functionalities:

  • The Framework provides access to the following Trusted Services:
    • A selected range of cyphering algorithms.
    • A basic attestation function.
    • Secure storage.
    • Certificate-based secure communication.
  • The AIF can execute only one AIW containing only one AIM. The sole AIM has the following features:
    • The AIM may be a Composite AIM.
    • The AIMs of the Composite AIM cannot access the Secure API.
  • The AIF Trusted Services may rely on hardware and OS security features already existing in the hardware and software of the environment in which the AIF is implemented.

Figure 2 – Reference Model of AI Framework (MPAI-AIF) V2

By virtue of incorporating the Secure Abstraction Layer, MPAI-AIF V2 adds the following features to V1:

  1. The AIMs of a Composite AIM must run on the same computing platform.
  2. The AIW
    • The AIMs in the AIW trust each other and communicate without special security concerns.
    • Communication among AIMs in the Composite AIM is non-secure.
    • The AIW/AIMs call the Secure Abstraction Layer via API.
  3. The Controller
    • Communicates securely with the MPAI-Store and the User Agent (Authentication, Attestation, and Encryption).
    • Accesses Communication, Global Storage, Access and MPAI Store via Trusted Services API.
    • Is split in two parts:
      • Secure Controller accesses Secure Communication and Secure Storage.
      • Non-Secure Controller can access the non-secure parts of the AIF.
    • Interfaces with the User Agent in the area where non-secure code is executed.
    • Interface with the Composite AIM in the area where secure code is executed,
  4. The AIM/AIW Storage
    • Secure Storage functionality is provided through key exchange.
    • Non-secure functionality is provided without reference to secure API calls.

The capabilities of the AIF, AIW, and AIM are described by a standard JSON metadata format that enables an AIF to download suitable AIW and AIMs from the MPAI Store.

Table 2 gives the APIs exposed by MPAI-AIF V2:

Table 1 – APIs of MPAI-AIF V2

# API
9.1 Data characterization structure.
9.2 API called by User Agent
9.3 API to access Secure Storage
9.3.1 User Agent initialises Secure Storage API
9.3.2 User Agent writes Secure Storage API
9.3.3 User Agent reads Secure Storage API
9.3.4 User Agent gets info from Secure Storage API
9.3.5 User Agent deletes a p_data in Secure Storage API
9.4 API to access Attestation
9.5 API to access cryptographic functions
9.5.1 Hashing
9.5.2 Key management
9.5.3 Key exchange
9.5.4 Message Authentication Code
9.5.5 Cyphers
9.5.6 Authenticated encryption with associated data (AEAD)
9.5.7 Signature
9.5.8 Asymmetric Encryption
9.6 API to enable secure communication

MPAI has published a Working Draft of Version 2 (html, pdfrequesting Community Comments. Version 2 extends the capabilities of Version 1 making it easier to AI application developers to support security in their applications. Comments should be sent to the MPAI Secretariat by 2023/09/24T23:59 UTCAn online presentation of the AI Framework V2 WD will be held on September 11 at 08 and 15 UTC. Register here for the 08 UTC and here for the 15 UTC presentations.

For the future, MPAI plans on:

  • Publishing MPAI-AIF as a Technical Specification at the 36th General Assembly (29 September 2023).
  • Continuing the implementation of AIF V1 for more OSs and programming languages than currently available.
  • Implementing the Reference Software of MPAI-AIF V2.

 


What is the AI for Health Call for Technologies about?

AI for Health (MPAI-AIH) is a project addressing interfaces and data types involved in an AIH Platform where End Users acquire and process health data on their handsets equipped with an AI Framework executing AI Workflows enabled by models distributed by an AIH Back end and installed in their handsets (AIH Front ends). Figure 1 depicts the AIH Front end.

Figure 1 – The AIH Front-End

End Users upload their processed health data with associated Smart Contracts granting the AIH Back end the Rights to use the data.

AIH Back end:

  1. Stores/processes health data delivered by AIH Front ends.
  2. Collects AI Models trained by AIH Front ends with End Users’ health data, updates the common Model and distributes it to AIH Front ends (federated learning).

The Back end is depicted in Figure 2.

Figure 2 – The MPAI-AIH Platform

Third-Party Users may access the Back end to process their own or End User-provided processed health data based on the rights granted by End Users via smart contracts. External Data Sources may provide subsidiary data to the AIH Back end. This data is also governed by smart contracts.

The Call for Technologies requests proposals for the following:

Templates of smart contracts between the following parties:

  1. End User and AIH Back-End
  2. AIH Back-End and Third-Party Entity
  3. Third-Party Entity and AIH Back-End

Data Types and Usage

See Table 1.

Table 1 – Data Types and Usage

Data Type Short Description
Historical User Health Data End User’s medical history, lab results, etc
Time series Vital sign measurements (such as heart rate and blood pressure)
Sensor Data from wearable devices: smartwatches, fitness trackers, etc.
Geolocation Geographic location of individuals/samples
Social media Chats, posts, comments, and other related data
Text Unstructured data, e.g., clinical notes and patient-generated data
Audio Speech and audio recordings
Video Data from endoscopic procedures, laparoscopic surgeries, etc.
Medical images X-ray, CT, MRI, and ultrasound images
Genomic DNA sequencing data and other types of genetic information
Medical imaging 3D images, 4D images (e.g., MRI over time), and multimodal images

Aggregated Health Data Format with the following features:

  • A container to carry data from a Front-End to the Back-End.
  • Electronic Health Records (EHR) improve the efficiency and quality of healthcare by offering comprehensive, up-to-date, and accurate information about a patient’s health history to healthcare providers.
  • Fast Healthcare Interoperability Resources (FHIR): one example of a data standard used for exchanging healthcare information electronically.
  • The Aggregated Health Data Format should be wrapped in a secure envelope along with associated encryption methods and containing the user’s health data records.
  • MPAI AIH healthcare information should be exchanged electronically and wrapped in an adequate envelope.
  • The envelope format should be independent of the data it contains.

APIs

  1. AIH Back end ↔ Platform Front end
  2. AIH Back end (Federated Learning) ↔ AIH Back end
  3. AIH Back end ↔Third-Party User
  4. AIH Back end System ↔ Blockchain

MPAI is seeking proposals of technologies that enable the implementation of standard components (AI Modules) to make real the vision described above. The deadline for submitting a response is October 19 at 23:59 UTC. Those intending to submit a response should become fully familiar with the following documents:

Call for Technologies html, pdf
Use Cases and Functional Requirements html, pdf
Framework Licence html, pdf
Template for responses html, docx

See also the video recordings (YouTubeWimTV) and the slides of the presentation made on 07 September.


An overview of Portable Avatar Format (MPAI-PAF)

“Digital humans” are computer-created digital objects that can be rendered with a human appearance and called Avatars. As Avatars have mostly been created, animated, and rendered in closed environments, it is no surprise that there has been very little need for standards.

In a communication context, say, in an interoperable metaverse, digital humans may not be constrained to be in a closed environment. Therefore, if a sender requires that a remote receiving client reproduce a digital human as intended by the sender, standards are needed.

Technical Specification: Portable Avatar Format is a first response to this need, with the following goals:

  • Objective1: To enable a user to reproduce a virtual environment as intended.
  • Objective2: to enable a user to reproduce a sender’s avatar and its animation as intended by the sender.
  • Objective3: to estimate the personal status of a human or avatar.
  • Objective4: to display an avatar with a selected personal status.

Personal Status is a data type standardised by Multimodal Conversation V2 representing the ensemble of the information internal to a person, including Emotion, Cognitive State, and Attitude. See more on Personal Status here.

The MPAI-PAF standard has been designed to provide all the standards that are required to implement the Avatar-Based Videoconference Use Case where Avatars, having the visual appearance and uttering the real voice of human participants, meet in a virtual environment (Figure 1).

Figure 1 – Avatar-Based Videoconference

MPAI-PAF assumes that the system is composed of fours subsystems, as depicted in Figure 2.

Figure 2 – Avatar-Based Videoconference System

This is how the system works:

Remotely located Transmitting Clients sends to Server:

  1. At the beginning:
    1. Avatar Model(s) and Language Preferences.
    2. Speech Object and Face Object for Authentication.
  2. Continuously sends:
    1. Avatar Descriptors and Speech to Server.

The Server:

  1. At the beginning:
    1. Selects an Environment, e.g., a meeting room.
    2. Equips the room with objects, i.e., meeting table and chairs.
    3. Places Avatar Models around the table.
    4. Distributes Environment, Avatars, and their positions to all receiving Clients.
    5. Authenticates Speech and Face Objects
  2. Continuously:
    1. Translates Speech from participants according to Language Preferences.
    2. Sends Avatar Descriptors and Speech to receiving Clients.

The Virtual Secretary

  1. Receives Text, Speech, and Avatar Descriptors of conference participants.
  2. Recognises Speech streams.
  3. Refines Recognised Text and extracts Meaning.
  4. Extracts Avatars’ Personal Status.
  5. Produces a Summary.
  6. Produces Edited Summary using the comments received from participants.
  7. Produces Text and Personal Status.
  8. Creates Speech and Avatar Descriptors from Text and Personal Status.

The Receiving Clients:

  1. At the beginning:
    1. Environment Model
    2. Avatar Models
    3. Spatial Attitudes
  2. Continuously:
    1. Creates Audio and Visual Scene Descriptors.
    2. Renders the Audio-Visual Scene from the Point of View selected by Participant.

Only the Receiving Client of Avatar-Based Videconference is depicted in Figure 3.

Figure 3 – Receiving Client of Avatar-Based Videconference

The data types use by the Avatar-Based Videconference use case are given by Table 1.

Table 1 – Data Types used by PAF-ABV

Name of Data Format Specified by
Environment OSD
Body Model ARA
Body Descriptors ARA
Face Model ARA
Face Descriptors ARA
Avatar Model ARA
Avatar Descriptors ARA
Spatial Attitude OSD
Audio Scene Descriptors CAE
Visual Scene Descriptors OSD
Text MMC
Language identifier MMC
Meaning MMC
Personal Status MMC

We note that MPAI-PAF only specifies Body Model and Descriptors, Face Model and Descriptors, and Avatar Model and Descriptors. Three other MPAI standards provide the needed specifications.

The MPAI-PAF Working Draft (html, pdf) is published with a request for Community Comments. See also the video recordings (YT, WimTV) and the slides of the presentation made on 07 September. Comments should be sent to the MPAI Secretariat by 2023/09/26T23:59 UTC. MPAI will use the Comments received to develop the final draft planned to be published at the 36th General Assembly (29 September 2023).

As we said, this is a first contribution to avatar interoperability. MPAI will continue the development of Reference Software, start the development of Conformance Testing and study extensions of MPAI-PAF (e.g., compression of Avatar Description).


An overview of Connected Autonomous Vehicle (MPAI-CAV) – Architecture

Connected Autonomous Vehicles (CAV) promise to replace human errors with a lower machine errors rate, give more time to human brains for rewarding activities, optimise use of vehicles, infrastructure, and traffic management, reduce congestion and pollution, and help elderly and disabled people have a better life.

MPAI believes that standards can accelerate the coming of CAVs as an established reality and so the first MPAI standard for this is “Connected Autonomous Vehicles – Architecture”. It specifies a CAV Reference Model broken down into Subsystems for which it specifies the Functions and the data exchanged between subsystems. Each subsystem is further broken down into components for which it specifies the Functions, the data exchanged between components and the topology.

The Subsystem-level Reference model is represented in Figure 1.

Figure 1 – The MPAI-CAV – Architecture Reference Model

There are four subsystem-level reference models. Each subsystem is specified in term of:

  1. The functions the subsystem performs.
  2. The Reference model designed to be compatible with the AI Framework (MPAI-AIF) Technical Specification.
  3. The input/output data exchanged by the subsystem with other subsystems and the environment.
  4. The functions of each of the subsystem components, intended to be implemented as AI Modules.
  5. The input/output data exchanged by the component with other components.

In the following the functions and the reference models of the MPAI-CAV – Architecture will be given. The other three components can be found in the draft Technical Specification ( (htmlpdf)).

Human-CAV Interactions (HCI)

The HCI functions are:

  1. To authenticates humans, e.g., to let them into the CAV.
  2. To converses with humans interpreting utterances, e.g., to go to a destination, or during a conversation. HCI makes use of the MPAI-MMC “Personal Status” data type.
  3. To Converses with the Autonomous Motion Subsystem to implement human conversation and execute commands.
  4. To enables passengers to navigate the Full Environment Representation.
  5. Appears as a speaking avatar showing a Personal Status.

The HCI Reference Model is depicted in Figure 2.

Figure 2 – HCI Reference Model

The full HCI specification is available here.

Environment Sensing Subsystem (ESS)

The ESS functions are:

  1. To acquire Environment information using Subsystem’s RADAR, LiDAR, Cameras, Ultrasound, Offline Map, Audio, GNSS, …
  2. To receive Ego CAV’s position, orientation, and environment data (temperature, humidity, etc.) from Motion Actuation Subsystem.
  3. To produce Scene Descriptors for each sensor technology in a common format.
  4. To produce the Basic Environment Representation (BER) by integrating the sensor-specific Scene Descriptors during the travel.
  5. To hand over the BERs, including Alerts, to the Autonomous Motion Subsystem.

The ESS Reference Model is depicted in Figure 3.

Figure 3 – ESS Reference Model

The full ESS specification is available here.

Autonomous Motion Subsystem (AMS)

The AMS functions are:

  1. To compute human-requested Route(s).
  2. To receive current BER from Environment Sensing Subsystem.
  3. To communicate with other CAVs’ AMSs (e.g., to exchange subsets of BER and other data).
  4. To produce the Full Environment Representation by fusing its own BER with info from other CAVs in range.
  5. To send Commands to Motion Actuation Subsystem to take the CAV to the next Pose.
  6. To receive and analyse responses from MAS.

The AMS Reference Model is depicted in Figure 3.

Figure 4 – AMS Reference Model

The full AMS specification is available here.

Motion Actuation Subsystem

The MAS functions are:

  1. To transmit spatial/environmental information from sensors/mechanical subsystems to the Environment Sensing Subsystem.
  2. To receive Autonomous Motion Subsystem Commands.
  3. To translates Commands into specific Commands to its own mechanical subsystems, e.g., brakes, wheel directions, and wheel motors.
  4. To receive Responses from its mechanical subsystems.
  5. To Sends responses to Autonomous Motion Subsystem about execution of commands.

The MAS Reference Model is depicted in Figure 5.

Figure 5 – MAS Reference Model

The full MAS specification is available here.

The WD of Connected Autonomous Vehicle – Architecture is published with a request for Community Comments. The MPAI-CAV – Architecture Working Draft (htmlpdf) is published with a request for Community Comments. See also the video recordings (YTWimTV) and the slides of the presentation made on 06 September.  Anybody may make comment on the WD. Comments should reach the MPAI Secretariat by 2023/09/26T23:59 UTC. No specific format is required to make comments. MPAI plans on publishing MPAI-CAV – Architecture at the 36th General Assembly (29 September 2023).

The MPAI-CAV Architecture standard is the starting point for the next steps of the MPAI-CAV roadmap. The current specification does not include the Functional Requirements of the data exchanged between subsystems and components and this is exactly the activity that will start in October 2023.

Visit How to join to join MPAI.


What is the XR Venues – Live Theatrical Stage Performance Call for Technologies about?

XR Venues is an MPAI project addressing contexts enabled by Extended Reality (XR) – any combination of Augmented Reality (AR), Virtual Reality (VR) and Mixed Reality (MR) technologies – and enhanced by Artificial Intelligence (AI) technologies. The word “Venue” is used as a synonym for Real and Virtual Environments.

MPAI thinks that the Live Theatrical Stage Performance use case fits well with the current trend that sees theatrical stage performances such as Broadway theatres, musicals, dramas, operas, and other performing arts increasingly using video scrims, backdrops, and projection mapping to create digital sets rather than constructing physical stage sets, allowing the entire stage and theatre to become a digital virtual environment thus reducing the cost of mounting shows.

The use of immersion domes – especially LED volumes – can completely surround audiences with virtual environments that live performers can inhabit and interact with. In addition, Live Theatrical Stage Performance can extend into the metaverse as a digital twin. Elements of the Virtual Environment experience can be projected in the Real Environment and elements of the Real Environment experience can be rendered in the Virtual Environment (metaverse).

The purpose of the planned MPAI-XRV – Live Theatrical Stage Performance Technical Specification is to address AI Modules performing functions that facilitate live multisensory immersive performances which ordinarily require extensive on-site show control staff to operate. Use of the AI Modules organised in AI Workflows (see details here) enabled by the MPAI-XRV – LTSP Technical Specification will allow more direct, precise yet spontaneous show implementation and control to achieve the show director’s vision. It will also free staff from repetitive and technical tasks allowing them to amplify their artistic and creative skills.

Figure 1 provides the Reference Model of the Live Theatrical Stage Performance Use Case incorporating AI Modules (AIM’s). In this diagram, data extracted from the Real and Virtual Environments (on the left) are processed and injected into the same Real and Virtual Environments (on the right).

Data is collected from both the Real and Virtual Environments. This includes audio, video, volumetric or motion capture (mocap) data from stage performers, audio and video from participants, signals from control surfaces (e.g., audio, lighting, show control), and more. One or more AIMs extract features from participants (i.e., the audience) and performers which are output as Participant and Scene Descriptors. These Descriptors are further interpreted by Performance and Participant Status AIMs to determine the Cue Point in the show (according to the Script) and Participants Status (in general, an assessment of the audience’s reactions).

Figure 1 – Live theatrical stage performance architecture (AI Modules shown in green)

Likewise, data from the Show Control computer or control surface, consoles for audio, DJ, VJ, lighting and FX (typically commanded by operators) – if needed – are interpreted by the Operator Command Interpreter AIM and output as Interpreted Operator Control. The Action Generation AIM accepts Participant Status, Cue Point and Interpreted Operator Controls and uses them to direct action in both the Real and Virtual Environments via Scene and Action Descriptors. These general descriptors are converted into actionable commands (e.g., DMX, MIDI, USD) required by the Real and Virtual Environments – according to their Venue Specifications – to enable multisensory Experience Generation in both the Real and Virtual Environments. In this manner, the desired experience can automatically be adapted to a variety of specific real and virtual venue instances.

MPAI is seeking proposals of technologies that enable the implementation of standard components (AI Modules) to make real the vision described above. The deadline for submitting a response is November 20 at 23:59 UTC. See the published documents:

Those intending to submit a response should familiarise with the following documents:

Call for Technologies html,  pdf
Use Cases and Functional Requirements htmlpdf
Framework Licence htmlpdf
Template for responses html, docx

See the video recordings (YouTubeWimTV) and the slides from the presentation made on 12 September. Read What is the XR Venues – Live Theatrical Stage Performance Call for Technologies about?