Moving Picture, Audio and Data Coding
by Artificial Intelligence

Archives: 2024-01-27

Standards that innovate technology and standardisation

At its 40th General Assembly (MPAI-40), MPAI approved one draft, one new, and three extension standards. For an organisation that has already nine standards in its game bag, this may not look like big news. There are two reasons, though, to consider this a remarkable moment in the MPAI short but intense life.

The first reason is that the draft standard posted for Community Comments – Human and Machine Communication (MPAI-HMC) – does not specify new technologies but leverages technologies from existing MPAI standards: Context-based Audio Enhancement (MPAI-CAE), Multimodal Conversation (MPAI-MMC), the newly approved Object and Scene Description (MPAI-OSD), and Portable Avatar Format (MPAI-PAF).

If not new technologies, what does MPAI-HMC specify then? To answer this question let’s consider Figure 1.

Figure 1 – The MPAI-HMC communications model

The human labelled as #1 is part of a scene with audio and visual attributes and communicates with the Machine by transmitting speech information and the entire audio-visual scene including him or herself. The Machine receives that information, processes it, and emits internally generated audio-visual scenes that include itself uttering vocal and displaying visual manifestations of its own internal state generated to interact more naturally with the human. The human may also communicate with the Machine when other humans are in the scene with him or her and the Machine can discern the individual human and identify (i.e., give a name to) audio and visual objects. However, only one human at a time can communicate with the Machine.

The Machine need not capture the human in a real space. His or her digital representation can be rendered in a Virtual Space as a Digitised Human. The human may not be alone but together with other Digitised Humans or with Virtual Humans, i.e., audio-visual representations of processes, such as Machines. For this reason, we will use the word Entity to indicate both a human or their avatar and a Machine rendered as an avatar.

The Machine can also act as an interpreter between the Entities and Contexts labelled as #1 or #2 and #3 or #4. By Context we mean information surrounding an Entity that provides additional insight into the information communicated by the Entity. An example of Context is language and, more generally, culture.

Communication between #1 and #3 represents the case of a human in a Context communicating with a Machine, e.g., an information service, in another Context. In this case the Machine communicates with the human by sensing and actuating audio-visual information, but the communication between the Machine and #3 may use a different communication protocol. The payload used to communicate is the “Portable Avatar” defined as a Data Type specified by the MPAI-PAF standard representing an Avatar and its Context.

Communication between the human in #1 and the Machine is based on raw audio-visual communication while communication between Machine and Entity #3 is carried out using a Portable Avatar .

Read a collection of usage scenarios.

The name of the standard is Human and Machine Communication (MPAI-HMC). It is published as a draft with a request for Community Comments, the last step before publication. Comments are due by 2024/02/19T23:59 UTC to secretariat@mpai.community.

To explain the second reason why the 40th General Assembly is a remarkable moment we have to recall that most MPAI application standards are based on the notion of AI Workflow (AIW) composed of interconnected AI Modules (AIM) executed in the AI Framework (AIF) specified by the MPAI-AIF standard. Four out of five documents are now  published in a new format where the Use Cases-AI Modules- Data Types chapters make reference to a common body of AIMs and Data Types.

Component-based software engineering aims to build software out of modular components. MPAI is implementing this notion in the world of standards.

See the links below and enjoy:

MPAI-HMC: https://mpai.community/standards/mpai-hmc/mpai-hmc-specification/

MPAI-MMC: https://mpai.community/standards/mpai-mmc/mpai-mmc-specification/

MPAI-OSD: https://mpai.community/standards/mpai-osd/mpai-osd-specification/

MPAI-PAF: https://mpai.community/standards/mpai-paf/mpai-paf-specification/


MPAI publishes 5 documents: 1 draft for community comments, 3 extensions, and 1 new standard

Geneva, Switzerland – 24 January 2024. MPAI – Moving Picture, Audio and Data Coding by Artificial Intelligence – the international, non-profit, and unaffiliated organisation developing AI-based data coding standards has concluded its 40th General Assembly (MPAI-40) approving the publication of a range of standards covering disparate technologies and application domains.

Human and Machine Communication (MPAI-HMC) is a draft published for Community Comments, the last step before publication. It includes a wide range technologies available from existing MPAI standards to enable an Entity, i.e., a human or a machine, to hold a communication with Entities as humans do. Comments are due by 2024/02/19T23:59 UTC to secretariat@mpai.community.

The newly-approved Object and Scene Description  (MPAI-OSD) V1.0 standard provides important technologies enabling the digital representation of position and orientation of Audio and Visual Objects and their combinations in Scenes. The MPAI-OSD capabilities enhance usability of the new Multimodal Conversation (MPAI-MMC) V2.1 and Portable Avatar Format (MPAI-PAF) V1.1.

MPAI Metaverse Model – Architecture (MPAI-MMM) V1.1 updates the MMM- Architecture Metadata to streamline communication between the Processes of a Metaverse Instance and uses the new MPAI-MMM Scripting Language (MMM-Script) to represent a wide range of use cases.

MPAI is now offering an innovative way to access to its new standards via the web:

MPAI-HMC: https://mpai.community/standards/mpai-hmc/mpai-hmc-specification/

MPAI-MMC: https://mpai.community/standards/mpai-mmc/mpai-mmc-specification/

MPAI-OSD: https://mpai.community/standards/mpai-osd/mpai-osd-specification/

MPAI-PAF: https://mpai.community/standards/mpai-paf/mpai-paf-specification/

MPAI is continuing its work plan that involving the following activities:

  1. AI Framework (MPAI-AIF): reference software, conformance testing, and application areas.
  2. AI for Health (MPAI-AIH): reference model and technologies for a system enabling clients to improve models processing health data and federated learning to share the training.
  3. Context-based Audio Enhancement (CAE-DC): new projects are brewing.
  4. Connected Autonomous Vehicle (MPAI-CAV): Functional Requirements of the data used by the MPIA-CAV – Architecture standard.
  5. Compression and Understanding of Industrial Data (MPAI-CUI): preparation for an extension to existing standard that includes support for more corporate risks.
  6. Human and Machine Communication (MPAI-HMC): model and technologies enabling a human or a machine to communicate with a machine or a human in a different cultural environment.
  7. Multimodal Conversation (MPAI-MMC): drafting reference software and conformance testing, and exploring new areas.
  8. MPAI Metaverse Model (MPAI-MMM): reference software and metaverse technologies requiring standards.
  9. Neural Network Watermarking (MPAI-NNW): reference software for enhanced applications.
  10. Portable Avatar Format (MPAI-PAF): reference software, conformance testing and new areas.
  11. End-to-End Video Coding (MPAI-EEV): video coding using AI-based End-to-End Video coding.
  12. AI-Enhanced Video Coding (MPAI-EVC). video coding with AI tools added to existing tools.
  13. Server-based Predictive Multiplayer Gaming (MPAI-SPG): technical report on mitigation of data loss and cheating.
  14. XR Venues (MPAI-XRV): development of the standard.

Legal entities and representatives of academic departments supporting the MPAI mission and able to contribute to the development of standards for the efficient use of data can become MPAI members.

Please visit the MPAI website, contact the MPAI secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook, Instagram, and YouTube.

 


MPAI publishes Context-based Audio Enhancement and Object and Scene Description for Community Comments

Geneva, Switzerland – 20 December 2023. MPAI, Moving Picture, Audio and Data Coding by Artificial Intelligence, the international, non-profit, and unaffiliated organisation developing AI-based data coding standards has concluded its 39th General Assembly (MPAI-39) approving the publication of the Context-based Audio Enhancement standard and Object and Scene Description standard for Community Comments.

The draft of the Context-based Audio Enhancement (MPAI-CAE) Version 2.1 standard enhances the compatibility of the Audio with the Visual and the Audio-Visual Scene Description specified by the draft Object and Scene Description (MPAI-OSD) standard. Both are published with requests for Community Comments. These are due by 2024/01/23T23:59 UTC and 17T23:58 UTC, respectively, to secretariat@mpai.community.

MPAI is continuing its work plan that involving the following activities:

  1. AI Framework (MPAI-AIF): reference software, conformance testing, and application areas.
  2. AI for Health (MPAI-AIH): reference model and technologies for a system enabling clients to improve models processing health data and federated learning to share the training.
  3. Context-based Audio Enhancement (CAE-DC): new projects are brewing.
  4. Connected Autonomous Vehicle (MPAI-CAV): Functional Requirements of the data used by the MPIA-CAV – Architecture standard.
  5. Compression and Understanding of Industrial Data (MPAI-CUI): preparation for an extension to existing standard that includes support for more corporate risks.
  6. Human and Machine Communication (MPAI-HMC): model and technologies enabling a human or a machine to communicate with a machine or a human in a different cultural environment.
  7. Multimodal Conversation (MPAI-MMC): drafting reference software and conformance testing, and exploring new areas.
  8. MPAI Metaverse Model (MPAI-MMM): reference software and metaverse technologies requiring standards.
  9. Neural Network Watermarking (MPAI-NNW): reference software for enhanced applications.
  10. Portable Avatar Format (MPAI-PAF): reference software, conformance testing and new areas.
  11. End-to-End Video Coding (MPAI-EEV): video coding using AI-based End-to-End Video coding.
  12. AI-Enhanced Video Coding (MPAI-EVC). video coding with AI tools added to existing tools.
  13. Server-based Predictive Multiplayer Gaming (MPAI-SPG): technical report on mitigation of data loss and cheating.
  14. XR Venues (MPAI-XRV): development of the standard.

Legal entities and representatives of academic departments supporting the MPAI mission and able to contribute to the development of standards for the efficient use of data can become MPAI members.

Please visit the MPAI website, contact the MPAI secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook, Instagram, and YouTube.


Connected autonomous vehicles can be connected to the metaverse

In a previous article, we have described the Architecture of the MPAI Connected Autonomous Vehicle (CAV). The CAV’s Environment Sensing Subsystem (ESS) captures data of the environment with a variety of sensors and produces the Basic Environment Representation (BER) that is passed to the Autonomous Motion Subsystem (AMS). This exchanges (subsets of) the BER with other CAVs in range and uses the received information to produce the Full Environment Representation (FER). Then, the AMS can issue commands to the Motion Actuation Subsystem (MAS) to move the CAV toward the destination.

In a previous article, we have described the Architecture of the MPAI Metaverse Model (MPAI-MMM) where a Metaverse Instance (M-Instance) is defined as a set of Processes providing some or all the following functions (terms beginning with small letters are in the Universe and terms beginning witj a large letter are in an M-Instance:

  1. To sense data from U-Locations.
  2. To process the sensed data and produce Data.
  3. To produce one or more M-Environments populated by Objects that can be either digitised or virtual, the latter with or without autonomy.
  4. To process Objects from the M-Instance or potentially from other M-Instances to affect U-Locations (in the Universe) and/or M-Locations (in this or other M-Instances) using Object in ways that are:
    • Consistent with the goals set for the M-Instance.
    • Effected within the capabilities of the M-Instance.
    • Complying with the Rules set for the M-Instance and applicable laws.

At a first glance, it looks like the way a CAV’s BER and FER bear a lot of similarities with the M-Instance of the MMM Architecture as we can see from the comparative .

 

Table 1 – Comparison between M-Instance and CAV

M-Instance

CAV

An M-Instance is a set of Processes providing some or all the following functions: A CAV is a set of Processes (Subsystems and AI Modules) providing the following functions:
1.   To sense data from U-Locations. 1.  To sense data from the environment.
2.  To process the sensed data and produce Data. 2.  To process the sensed data and produce Data processable by the CAV, in particular BERs.
3.  To produce one or more M-Environments populated by Objects that can be either digitised or virtual, the latter with or without autonomy. 3.  To produce one M-Instance populated by Objects
4.  To process Objects from the M-Instance or potentially from other M-Instances to affect U- and/or M-Environments using Objects in ways that are: 4.  To process (subsets of) BERs from the CAV’s M-Instance and potentially from other CAVs’ M-Instances in ways that are:
4.1.  Consistent with the goals set for the M-Instance. 4.1. Consistent with the goals set to the CAVs to reach a destination.
4.2.  Effected within the capabilities of the M-Instance. 4.2. Effected within the CAV’s capabilities (processing but also physical).
4.3.  Complying with the Rules set for the M-Instance and applicable laws. 4.3.  Complying with the Rules (law and traffic regulations).

We need to look more in detail into this “similarities”. Before proceeding, let’s recall two assumptions at the basis of MPAI-MMM – Architecture:

  1. User is a type of Process that represents and acts on behalf of a human. A human may have more than one User in an M-Instance.
  2. Persona is a rendered User.
  3. User may have or acquire the Rights to perform an Action, e.g., to authenticate another User.

To do that, let’s consider the simple case of two CAVs: CAVA and CAVB respectively owned by humanA and humanB, where humanA is friend to humanB. humanA has two Users: UserA.1 who represents humanA in the Human-CAV Interaction (HCI) Subsystem (or M-EnvironmentA.1) and UserA.2 who represents humanA in the Autonomous Motion Subsystem (or M-EnvironmentA.2). Similarly, for humanB.

humanA wants to see the landscape seen by humanB in their CAVB.

This is a simplified description of the workflow (a fuller workflow is in the MPAI=CAV – Architecture standard)

  1. humanA requests User1 (HCI) to take them to a destination.
  2. User1 requests UserA.2 (AMS) to take CAVA to destination.
  3. User2
    • Gets the BER from CAVA’s ESS (or M-Environment3).
    • Computes the Route to Destination.
    • Issues a series of Commands to the MAS.
    • Authenticates its peer User2.
    • Gets a subset of the BER from User2.
    • Produces CAVA’s FER.
  4. User1
    • Authenticates its peer User2.
    • Renders their Persona in CAVB (e.g., using advanced 3D rendering technologies).
    • Converses with humanB.
    • Watches CAVB’s M-Location corresponding to the environment currently traversed by CAVB.

This example is a first demonstration of the compatibility of an M-Instance produced by a CAV implementing the MPAI-CAV – Architecture standard with the MPAI-MMM – Architecture standard.


MPAI releases new version of Neural Network Watermarking Reference Software; starts new project on XR Venues – Live Theatrical Stage Performance

Geneva, Switzerland – 22 November 2023. MPAI, Moving Picture, Audio and Data Coding by Artificial Intelligence, the international, non-profit, and unaffiliated organisation developing AI-based data coding standards has concluded its 38th General Assembly (MPAI-38) approving the release of a new version of its Neural Network Watermarking reference software and the start of the development of the new XR Venues – Live Theatrical Stage Performance standard.

The new version of the Neural Network Watermarking (MPAI-NNW) reference software makes it possible to upgrade conventional AI-based processing workflows with traceability and integrity checking functions. For instance, it is now possible to add AI Modules to an MPAI-AIF workflow to detect whether a particular text was indeed produced by the expected service or AI Module (AIM). Register to attend the online presentation on 2023/12/12T15:00 UTC.

The XR Venues (MPAI-XRV) – Live Theatrical Stage Performance standard project specifies functions and interfaces of AI Modules designed to automate live multisensory immersive stage performances which ordinarily require extensive on-site show control staff to operate. By running AI Workflows (AIW) composed of AIMs, it will be possible to obtain a more direct, precise yet spontaneous show implementation and control of multiple complex systems to achieve the show director’s vision.

MPAI is continuing its work plan that involve the following activities:

  1. AI Framework (MPAI-AIF): reference software, conformance testing, and application areas.
  2. AI for Health (MPAI-AIH) development of the standard.
  3. Context-based Audio Enhancement (CAE-DC): new projects are bewing.
  4. Connected Autonomous Vehicle (MPAI-CAV): Functional Requirements of data used by the CAV architecture.
  5. Compression and Understanding of Industrial Data (MPAI-CUI): preparation for an extension to existing standard.
  6. Multimodal Conversation (MPAI-MMC): reference software, drafting conformance testing, and new areas.
  7. MPAI Metaverse Model (MPAI-MMM): reference software and metaverse technologies requiring standards.
  8. Neural Network Watermarking (MPAI-NNW): reference software for enhanced applications.
  9. Portable Avatar Format (MPAI-PAF): reference software, conformance testing and new areas.
  10. End-to-End Video Coding (MPAI-EEV): video coding using AI-based End-to-End Video coding.
  11. AI-Enhanced Video Coding (MPAI-EVC). video coding with AI tools added to existing tools.
  12. Server-based Predictive Multiplayer Gaming (MPAI-SPG): technical report on mitigation of data loss and cheating.
  13. XR Venues (MPAI-XRV): development of the standard.

Legal entities and representatives of academic departments supporting the MPAI mission and able to contribute to the development of standards for the efficient use of data can become MPAI members.

Please visit the MPAI website, contact the MPAI secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook, Instagram, and YouTube.


Visiting MPAI standards – the MPAI Metaverse Model foundations

Much has been and is being said about the vagueness of the notion of “metaverse”. To compensate for this, the current trend is to add an adjective to the “metaverse” name. So, now we have studies on industrial metaverse, medical metaverse, tourist metaverse, and more.

In the early phases of its metaverse studies, when it was scoping the field, MPAI did consider 18 metaverse domains (use cases). Now, however, that phase is over because the right approach to standards is to identify what is common first and the differences (profiles) later.

In this paper we will try to identify features that are expected to be common across metaverse instances (M-Instances).

The basic metaverse features are the ability to:

  • Sense U-Environments (i.e., portions of the Universe) and their elements: inanimate and animate objects, and measurable features (temperature, pressure, etc.). By animate we mean humans, animals, and machines that move such as robots.
  • Create a virtual space (M-Instance) and its subsets (M-Environments).
  • Populate virtual spaces with digitised objects (captured from U-Environments) and virtual objects (created in the M-Instance).
  • Communicate with other M-Instances.
  • Actuate U-Environments as a result of the activities taking place in the M-Instance.

How will such an M-Instance be implemented?

We assume that an M-Instance is composed of a set of processes running on a computing environment. Of course, the M-Instance could be implemented as a single process, but this is a detail. What is important is that the M-Instance implements a variety of functions. Here we assume that functions correspond to processes. These are individually activated where they are accessible at the atomic level or by a single large process.

While a process is a process is a process, it is useful to characterise some processes. The first type of process is a Device, having the task to “connect” a U-Environment with an M-Environment. We assume that to achieve a safe governance, a Device should be connected to an M-Instance under the responsibility of a human. The second type of process is the User, a process that “represents” a human in the M-Instance and acts on their behalf. The third type is a Service, a process able to perform specific functions such as creating objects. The fourth type is an App, a process running on a Device. An example of App is a User that is not executing in the metaverse platform but rather on the Device.

An M-Instance includes objects connected with a U-Instance; some objects, like digitised humans, mirror activities carried out in the Universe; activities in the M-Instance may have effects on U-Environments. There are sufficient reasons to assume that the operation of an M-Instance be governed by Rules. A reasonable application of the notion of Rules is that a human wishing to connect a Device to, or deploy Users in an M-Instance should register and provide some data.

Data required to register could be a subset of the human’s Personal Profile, Device IDs, and User IDs. There are, however, other important elements that may have to be provided for a fuller experience. One is what we call Persona, i.e., an Avatar Model that the User process can utilise to render itself. Obviously, a User can be rendered as different Personae, if the Rules so allow. A second important element is the Wallet: a registering human may decide to allow one of their Users to access a particular Wallet to carry out its economic activity in the M-Instance.

Figure 1 pictorially represents some of the points made so far.

Figure 1 – Universe-Metaverse interaction

The activities of a human in a U-Environment captured by a Device may drive the activities of a User in the M-Instance. The human can let one of User:

  1. Just execute in the M-Instance without rendering itself.
  2. Render itself as an autonomously animated Persona.
  3. Render itself as a Persona animated by the movements of the human.

We have treated the important case of a human and their User agent. What about other objects?

Besides processes performing various functions, an M-Instance is populated by Items, i.e., Data and Metadata supported by the M-Instance and bearing an Identifier. An Item may be produced by Identifying imported Data/Metadata or internally produced by an Authoring Service. This is depicted in Figure 2 where User produces:

  1. Item1 by calling the Authoring Service1
  2. Item2 by importing data and metadata and then calling Identification Service2.

Figure 2 – Objects in an M-Instance

A more complete view of an M-Instance is provided by Figure 3.

Figure 3 – M-Instance Model

In Figure 3 we see that:

  1. Human1 and Human3 are connected to the M-Instance via a Device, but Human2 is connected to the M-Instance with two Device.
  2. Human1 has deployed one User, User 2 two Users and Human3
  3. User1.1 of Human1 is rendered as one Persona1.1.1, User 2.1 of Human2 as two Personae (Persona2.1.1 and Persona2.1.2), and User2.2 as one Persona2.2.1.
  4. Object1 in the U-Environment is captured by 1 and Device3.1 and mapped as two distinct Objects: Object1.2 and Object3.1.
  5. Users and Services variously interact.

What are the interactions referred to in point 5. above? We assume that an M-Instance is populated by Services performing functions that are useful for the life of the M-Instance. We call standard functions “Actions”.  MPAI has specified the functional requirements of a set of Actions:

  1. General Actions (Register, Change, Hide, Authenticate, Identify, Modify, Validate, Execute).
  2. Call a Service (Author, Discover, Inform, Interpret, Post, Transact, Convert, Resolve).
  3. M-Instance to M-Instance (MM-Add, MM-Animate, MM-Disable, MM-Embed, MM-Enable, MM-Send).
  4. M-Instance to U-Environment (MU-Actuate, MU-Render, MU-Send, Track).
  5. U-Environment to M-Instance (UM-Animate, UM-Capture, UM-Render, UM-Send).

The semantics of some of these Actions are:

  1. Identify: convert data and metadata into an Item bearing an Identifier.
  2. Discover: request a Service to provide Items and/or Processes with certain features.
  3. MM-Embed: place an Item at a particular M-Instance location (M-Location).
  4. MU-Render: select Items at an M-Location and render them at a U-Environment.
  5. UM-Animate: use a captured animation stream to animate a Persona.

How do interactions take place in the M-Instance?

A User may have the capability to perform certain Actions on certain Items but more commonly a User may ask a Device to do something for it, like capture an animation stream and use it to animate a Persona. The help of the Device may not be sufficient because MPAI assumes that an animation stream is not an Item until it gets Identified as such. Hence, the help of the Identify Service is also needed.

MPAI has defined the Inter-Process Communication Protocol (Figure 4) whereby

  1. A process creates, identifies and sends a Request-Action Item to the destination process.
  2. The receiving process
    1. May or may not perform the action requested
    2. Sends a Response-Action.

Figure 4 – The Inter-Process Interaction Protocol

Table 1 – The Inter-Process Interaction Protocol

Request-Action Response-Action Comments
Request-Action ID Response-Action ID Unique ID
Emission Time Emission Time Time of Issuance
Source Process ID Source Process ID Requesting Process ID
Destination Process ID Destination Process ID Requested Process ID
Action The Action requested
InItems OutItems In/Output Items of Action
InLocations Locations of InItems
OutLocations Locations of OutItems
OutRights Expected Rights on OutItems

The Request-Action payload includes the ID, the time, the requesting process and destination process IDs, the Action requested, the InItems on which the action is applied, where the InItems are found and the resulting OutItems are found, and the Rights the requesting process needs to have in order to act on the OutItems.

Having defined standard Actions, here is how standard Items are defined:

  1. General (M-Instance, M-Capabilities, M-Environment, Identifier, Rules, Rights, Program, Contract)
  2. Human/User-related (Account, Activity Data, Personal Profile, Social Graph, User Data).
  3. Process Interaction (Message, P-Capabilities, Request-Action, Response-Action).
  4. Service Access (AuthenticateIn, AuthenticateOut, DiscoverIn, DiscoverOut, InformIn, InformOut, InterpretIn, InterpretOut).
  5. Finance-related (Asset, Ledger, Provenance, Transaction, Value, Wallet).
  6. Perception-related (Event, Experience, Interaction, Map, Model, Object, Scene, Stream, Summary).
  7. Space-related (M-Location, U-Location).

Here are a few examples of the Item semantics:

  1. Rights: the Item describes the ability of a process to perform an Action on an Item at a time and at M-Location.
  2. Social Graph: the log of a process, e.g., a User.
  3. P-Capabilities: the Item describes the Rights held by a process and related abilities.
  4. DiscoverIn: the description of the User’s request.
  5. Asset: an Item that can be transacted.
  6. Model: data exposing animation interfaces.
  7. MLocation: delimits a space in the M-Instance.

We also need to define several entities – called Data Types – used in the M-Instance:

  1. Location and time (Address, Coordinates, Orientation, Point of View, Position, Spatial Attitude, Time).
  2. Transaction-related (Amount, Currency).
  3. Internal state of a User (Cognitive State, Emotion, Social Attitude, Personal Status).

Finally, we need to address the issue of a process in M-InstanceA requesting a process in another M-InstanceB to perform Actions on Items. In general, it is not possible for a process in M-InstanceA to communicate with a process in M-InstanceB because of security concerns, but also because the other M-InstanceB may use different data types. MPAI solves this process by extending the Inter-Process Interaction Protocol and introducing two Services:

  1. Resolution ServiceA: can talk to Resolution ServiceB.
  2. Conversion Service: can convert the format of M-InstanceA data into the format of M-InstanceB.

Figure 5 – The Inter-Process Interaction Protocol between M-Instances

This is a very high-level description of the MPAI Metaverse Model – Architecture standard that enables Interoperability of two or more M-Instances if they:

  1. Rely on the Operation Model, and
  2. Use the same Profile Architecture, and
    1. Either the same technologies, or
    2. Independent technologies while accessing Conversion Services that losslessly transform Data of an M-InstanceA to Data of an M-InstanceB.

Visiting MPAI standards – Connected Autonomous Vehicles (MPAI-CAV) – Architecture

Introduction

MMPAI-36 (September 2023) has approved the publication of five standards. Two are extensions of already published standards (and adopted by IEEE without modifications) and three brand new. This is an overview of the main content of one of the new standards, Technical Specification: Connected Autonomous Vehicles – Architecture (MPAI-CAV).
MPAI works on CAV standards because replacing current vehicles with CAVs is desirable from many viewpoints. CAVs are expected to offer a safer drive by replacing human errors with machine errors that are expected to be less frequent, allowing more time for rewarding activities, offering opportunities for better use of vehicles and road infrastructure, enabling more sophisticated traffic management, reducing congestion and pollution, and helping elderly and disabled people to have a better life. On the other hand, CAVs are available today more for experimental than regular use also because, unlike the current highly componentised automotive industry, CAV companies are monolithic, and develop internally and assemble all the components they need to make their CAVs and because the level of reliability is insufficient were CAVs deployed in massive numbers.
A CAV standard would allow the acceleration of the maturity of the CAV industry. But which standard? First, the standard MPAI is targeting would not be related to the “hardware” side but to the “software” of a CAV. Second, the standard should not address CAVs in its entirety but adopt a component approach, not unlike what the automotive industry does for the “hardware” side.
A component approach is useful for the two expected stages of the process. The first stage applies now because research can concentrate on a specific component, defined by its function and interface and optimise the component performance, possibly attaching proposed revised functions and interfaces. The second stage applies to the time when CAVs will have reached a sufficient level of performance, and component-based mass production of CAVs becomes attractive. An open market of CAV components can naturally be formed where competing providers can offer components with standard functions and interfaces but with alternative performance compared to what is available on the market at that time.

The MPAI-CAV – Architecture standard

MPAI-CAV – Architecture is a standard implementing the first step of this strategy. It defines a CAV Reference Model composed of four subsystems, each composed of interconnected components. Technical Specification: AI Framework (MPAI-AIF) is the natural selection for implementing this Reference Model. The AI Framework (AIF) specified by the standard is the environment executing AI Workflows (AIW) that correspond to the subsystems of the Reference Model. Each subsystem is composed of components called AI Modules (AIM).
The Subsystem-level Reference model is represented in Figure 1.

Figure 1 – The MPAI-CAV – Architecture Reference Model

There are four subsystem-level reference models, each specified in term of:

  1. The functions the subsystem performs.
  2. The AIF-based Reference Model.
  3. The input/output data exchanged by the CAV subsystem with other subsystems and the environment.
  4. The functions of each subsystem components to be implemented as AI Modules.
  5. The input/output data exchanged by the component with other components.

The functions and reference models of the MPAI-CAV – Architecture Subsystems will be presented next.

Human-CAV Interaction (HCI)

The HCI functions are:

  1.  To authenticates humans, e.g., to let them into the CAV.
  2. To converse with humans by interpreting utterances, e.g., to go to a destination, or during a conversation.
  3. To converse with the Autonomous Motion Subsystem to implement and execute human commands.
  4. To enable passengers to navigate the Full Environment Representation (FER), i.e., the best representation of the external environment achieved by the CAV.
  5. Appears as a speaking avatar showing a Personal Status, i.e., a simulated internal status of the machine represented according to the criteria used by humans (see https://mpai.community/standards/mpai-mmc/about-mpai-mmc/).

The HCI Reference Model is depicted in Figure 2.

Figure 2 – Human CAV Interaction Reference Model

Environment Sensing Subsystem (ESS)

The ESS functions are:

  1. To acquire Environment information using Subsystem’s RADAR, LiDAR, Cameras, Ultrasound, Offline Map, Audio, GNSS, …
  2. To receive Ego CAV’s position, orientation, and environment data (temperature, humidity, etc.) from Motion Actuation Subsystem.
  3. To produce Scene Descriptors for each sensor technology in a common format.
  4. To produce the Basic Environment Representation (BER) by integrating the sensor-specific Scene Descriptors produced during the travel.
  5. To hand over the BERs, including Alerts, to the Autonomous Motion Subsystem.

The ESS Reference Model is depicted in Figure 3.Ăą

Figure 3 – Environment Sensing Subsystem Reference Model

Autonomous Motion Subsystem (AMS)

The AMS functions are:

  1. To compute and execute the human-requested Route(s).
  2. To receive current BER from Environment Sensing Subsystem.
  3. To communicate with other CAVs’ AMSs (e.g., to exchange subsets of BER and other data).
  4. To produce the Full Environment Representation by fusing its own BER with info from other CAVs in range.
  5. To send Commands to Motion Actuation Subsystem to take the CAV to the next Pose.
  6. To receive and analyse responses from the MAS.

The AMS Reference Model is depicted in Figure 4.

Figure 4 – Autonomous Motion Subsystem Reference Model

Motion Actuation Subsystem

The MAS functions are:

  1. To transmit spatial/environmental information from sensors/mechanical subsystems to the Environment Sensing Subsystem.
  2. To receive Autonomous Motion Subsystem Commands.
  3. To translates Commands into specific Commands to its own mechanical subsystems, e.g., steering brakes, wheel directions, and wheel motors.
  4. To receive and analyse Responses from its mechanical subsystems.
  5. To Sends responses to Autonomous Motion Subsystem about execution of commands.

The MAS Reference Model is depicted in Figure 5.

Figure 5 – Motion Actuation Subsystem Reference Model

Conclusions

The MPAI-CAV Architecture standard is the starting point for the next steps of the MPAI-CAV roadmap addressing functional requirements of the data exchanged by subsystems and components


XR Venues – Live Theatrical Stage Performance

Scope

MPAI has issued a Call for Technologies seeking proposals for MPAI-XRV – Live Theatrical Stage Performance, an MPAI standard project seeking to define interfaces of AI Modules (AIM) facilitating live multisensory immersive stage performances. By running AI Workflows (AIW) composed of AIMs, it will be possible to obtain a more direct, precise yet spontaneous show implementation and control of multiple complex systems to achieve the show director’s vision.

Setting

An implementation of MPAI-XRV – Live Theatrical Stage Performance includes:

  1. A physical stage.
  2. Lighting, projections (e.g., dome or immersive display, holograms, AR goggles), and spatialised audio and Special Effects (FX) including touch, smell, pyro, motion seats etc.
  3. Audience (standing or seated) in the real and virtual venue and external audiences via interactive streaming.
  4. Interactive interfaces to allow audience participation (e.g., voting, branching, real-virtual action generation).
  5. Performers on stage, platforms around domes or moving through the audience (immersive theatres).
  6. Capture of biometric data from audience and/or performers from wearables, sensors embedded in the seat, remote sensing (e.g., audio, video, lidar), and from VR headsets.
  7. Show operator(s) to allow manual augmentation and oversight of an AI that has been trained by show operator activity.
  8. Virtual Environment (metaverse) that mirrors selected elements of the Real Environment. For example, performers on the stage are mirrored by digital twins in the metaverse, using motion capture (MoCap), green screen or volumetrically captured 3D images.
  9. Real Environment can also mirror selected elements of the Metaverse similar to in-camera visual effects/virtual production techniques. Elements of the Metaverse such as avatars, landscape, sky, objects can be represented in the Real Environment through immersive displays, video mapped stages and set pieces, AR overlays, lighting, and FX.
  10. The physical stage and set pieces blend seamlessly into the virtual 3D backdrop projected onto the immersive display such that the spectators perceive as a single immersive environment.
  11. The Script or cue list describes the show events, guiding and synchronising the actions of all AI Modules (AIM) as the show evolves from cue to cue and scene to scene. In addition to performing the show, the AIMs might spontaneously innovate show variations, amplify the actions of performers or respond to commands from operators by modifying the Real or Virtual Environment within scripted guidelines.

Requirements

Respondents to Call for Technologies should consider the system requirements depicted in the Figure and in the following text. Note that terms in bold underlined refer to AIMs and those in bold refer to data types.

Participants Description capture the mood, engagement, and choices of participants (audience) and produce Participant Descriptors with the following features:

  1. Use a known (i.e., standard) format to enable processing by subsequent AIMs.
  2. Express Participants’:
    1. Visual behaviour (hand waving, standing, etc.)
    2. Audio reaction (clapping, laughing, booing, etc.)
    3. Choice (voting, motion controller, text, etc.)

Performance Description extracts information from the stage regarding position and orientation of performers and immersive objects and produce the Scene Descriptors with the following features:

  1. Use existing or new formats independent of the capturing technology, e.g.,
    1. For audio: Multichannel Audio, Spatial Audio, Ambisonics, etc.
    2. For visual: audio and raw video, volumetric, MoCap, etc.
  2. Allow for the following features:
    1. Accurate description of spatial and AV components.
    2. Individual accessibility and processability of objects.
    3. Unique association of objects with their digital representations.
    4. Association of the audio and visual components of audio-visual objects.

Participants Status interprets the Participants Descriptors and provides the Participants Status data type expressed either by:

  1. A Format supporting the semantics of a set of statuses over time in terms of:
    1. Sentiment (e.g., measurement of spatial position-based audience reaction).
    2. Expression of choice (e.g., voting, physical movement of audience).
    3. Emergent behaviour (e.g., pattern emerging from coordinated movement).
  2. A Language describing the Participant Status both at a time and as a trend.

 Performance Status interprets the Scene Descriptors and indicates the current Cue point determined by a real or virtual phrase, gesture, dance motion, prop status etc. according to the Script.

Operator Command Interpreter interprets data and commands such as Audio/DJ/VJ, Show Control, Lighting/FX and generates Interpreted Operator Controls, a data type independent of the specific input format and includes:

  1. Show control consoles (e.g., rigging, elevator stages, prop motions, and pyro).
  2. Audio control consoles (e.g., controls audio mixing and effects).
  3. DJ/VJ control consoles (e.g., real-time AV playback and effects).

All consoles may include sliders, buttons, knobs, gesture/haptic interfaces, joystick, touch pads.

Interpreted Controls may be Script-dependent.

Script includes written Show Script including character dialog, song lyrics, stage action, etc. plus a Master Cue Sheet with a corresponding technical description of all experiential elements including sound, lighting, follow spots, set movements, cue number, and show time of cue.

The cue sheet advances to the next cue point based on quantifiable or clearly defined actions such as spoken word, gesture, etc.

Various formats for show scripts and cue sheets exist.

Script is a data type with the following features:

  1. It is machine readable and actionable.
  2. It has an extensible set of clearly defined events/criteria for triggering the cue.
  3. Uses a language for expressing Action Descriptors, which define the experiential elements associated with each cue.

Action Generation

Action Generation uses Participants Status, Scene Descriptors, Cue Points and Interpreted Operator Controls to produce Action Descriptors, a data type that may be either existing or known (e.g., text prompts) or new with the following features:

  1. Ability to describe the Actions necessary to create the complete experience – in both the Real and Virtual Environments – in accordance with the Script.
  2. Ability to express all aspects of the experience including the performers’ and objects’ position, orientation, gesture, costume, etc.
  3. Independence of the specifications of a particular venue.

Virtual Environment Experience Generation processes Action Descriptors and produces commands that are actionable in the Virtual Environment. It produces the following data:

  1. Virtual Environment Descriptors A variety of controls for 3D geometry, shading, lighting, materials, cameras, physics, etc. as required to affect the Virtual Environment, e.g., OpenUSD. The actual format used is dependent upon the current Virtual Environment Venue Specification.
  2. Audio-Visual (A/V) Data and commands for all A/V experiential elements with the virtual environment, including audio and video.
  3. Audio: M Audio channels (generated by or passed through the AIM), placement of audio channels, including attachments to objects or characters, Virtual Audio device commands, e.g., MIDI.
  4. Video: N Video channels (generated by or passed through the AIM), Placement of Video channels, including mapping to objects and characters, Virtual VJ console commands/data, Virtual Video mixing console commands.
  5. Virtual Environment Venue Specification An input to the Virtual Experience Generation AIM defining protocols, and data and command structures for the specific Virtual Environment Venue.

Real Environment Experience Generation processes Action Descriptors and produces commands that are actionable in the Real Environment:

  1. Real Environment Venue Specification defines protocols, data, and command structures for the specific Real Environment Venue. This could include number, type, and placement of lighting fixtures, special effects, sound, and video display resources.
  2. FX (Effects) Commands and data for all FX generators (e.g., fog, rain, pyro, mist, etc. machines, 4D seating, stage props, rigging etc.), typically using various standard protocols. FX systems may include video and audio provided by Real Experience Generation.
  3. Lighting Commands and data for all lighting systems, devices, and elements, typically using the DMX protocol or similar. Lighting systems may include video provided by Real Experience Generation.
  4. Audio-Visual (A/V) Data and commands for all A/V experiential elements, including audio, video, and capture cameras/microphones. They include:
  5. Camera control: Camera #n Camera on/off, Keyframe based (Spatial Attitude, Optical parameters (aperture, focus, zoom), Frame rate, Camera)
  6. Audio: M Audio channels (generated by or passed through the AIM), Audio source location designation (channel number or spatial orientation of STEM), MIDI device commands, Audio server commands, Mixing console commands.
  7. Video: N Video channels (generated by or passed through the AIM), Video display location designation (display number or spatial orientation and mapping details), Video server commands, VJ console commands/data, Video mixing console commands.

Image by pch.vector on Freepik

 


MPAI starts new standard project on “AI for Health”

Geneva, Switzerland – 25 October 2023. MPAI, Moving Picture, Audio and Data Coding by Artificial Intelligence, the international, non-profit, and unaffiliated organisation developing AI-based data coding standards has concluded its 37th General Assembly (MPAI-37) approving the start of a new project on AI for Health.

AI for Health (MPAI-AIH) envisages a system where clients acquire and process individuals’ health data using shared AI models and upload data to the backend with attached licences expressed by smart contracts. Third parties may process health data based on the relevant licence. From time to time the backend uses federated learning to collect and use the AI models to retrain the common models. These are redistributed to all clients.

MPAI is continuing its work plan that involve the following activities:

  1. AI Framework (MPAI-AIF): reference software, conformance testing, and application areas.
  2. Avatar Representation and Animation (MPAI-PAF): reference software, conformance testing and new areas.
  3. Context-based Audio Enhancement (CAE-DC): new projects.
  4. Connected Autonomous Vehicle (MPAI-CAV): Functional Requirements of CAV architecture.
  5. Compression and Understanding of Industrial Data (MPAI-CUI): preparation for extension of existing standard.
  6. Multimodal Conversation (MPAI-MMC): reference software, drafting conformance testing, and new areas.
  7. MPAI Metaverse Model (MPAI-MMM): reference software and metaverse technologies requiring standards.
  8. Neural Network Watermarking (MPAI-NNW): reference software for enhanced applications.
  9. End-to-End Video Coding (MPAI-EEV): video coding using AI-based End-to-End Video coding.
  10. AI-Enhanced Video Coding (MPAI-EVC). video coding with AI tools added to existing tools.
  11. Server-based Predictive Multiplayer Gaming (MPAI-SPG): technical report on mitigation of data loss and cheating.
  12. XR Venues (MPAI-XRV): preparation for the development of the standard.

Legal entities and representatives of academic departments supporting the MPAI mission and able to contribute to the development of standards for the efficient use of data can become MPAI members.

Please visit the MPAI website, contact the MPAI secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook, Instagram, and YouTube.


MPAI is running at full speed

Established in September 2020, MPAI has published five standard this week bringing the total to nine. Let’s see what they are about.

MPAI Metaverse Model (MPAI-MMM) – Architecture is the first technical metaverse standard published by a standard body. MPAI MMM specifies technologies enabling two metaverse instances M-InstanceA and M-InstanceB to interoperate if they: rely on the same Operation Model, use the same Profile, and either use the same technologies, or use independent technologies while accessing Conversion Services that losslessly transform data of an M-InstanceA to data of an M-InstanceB.

AI Framework (MPAI-AIF) V2 specifies a secure environment called AI Framework (AIF) enabling dynamic configuration, initialisation, and control of AI Workflows (AIW) composed of AI Modules (AIM). AIMs and AIWs are defined by function and interfaces; AIWs also by AIM topology.

Connected Autonomous Vehicle (MPAI-CAV) – Architecture  is the first technical standard on connected autonomous vehicles published by a standard body. MPAI-CAV specifies the Architecture of a CAV based on a Reference Model comprising a CAV composed of Subsystems (AIW) with specified Functions, I/O Data, and Topology. Each Subsystem is made up of Components with specified Functions and I/O Data.

Multimodal Conversation (MPAI-MMC) V2 specifies data formats for analysis of text, speech, and other non-verbal components as used in human-machine and machine-machine conversation applications and Multimodal Conversation-related AIWs and AIWs using data formats from MPAI-MMC and other MPAI standards.

Portable Avatar Format (MPAI-PAF) specifies the Portable Avatar and related data formats allowing a sender to enable a receiver to decode and render an Avatar as intended by the sender; the Personal Status Display Composite AI Module allowing the conversion of a Text and a Personal Status to a Portable Avatar; and the AIWs and AIMs used by the Avatar-Based Videoconference Use Case.

Let’s see now which are the previously developed standards.

Context-based Audio Enhancement (MPAI-CAE) specifies data types for the improvement of the user experience in audio-related applications for a variety of contexts using context information and Audio-related AIWs and AIWs using data formats from MPAI-CAE and other MPAI standards.

Neural Network Watermarking (MPAI-NNW) specifies methodologies to evaluate the following aspects of neural network (NN) watermarking-related technologies: The impact on the performance of a watermarked NN and its inference; The ability of an NN watermarking detector/decoder to detect/decode a payload of a modified watermarked NN; The computational cost of injecting, detecting, or decoding a payload in the watermarked NN.

Compression and Understanding of Industrial Data (MPAI-CUI) specifies data formats, AIMs and an AIW to predict a company’s probability of default and business discontinuity, and to provide an organisational model index (Company Performance Prediction Use Case).

Governance of the MPAI Ecosystem (MPAI-GME) specifies the roles and rules of Ecosystem players: MPAI, Implementers, MPAI Store, Performance Assessors, Users.

MPAI was established to develop AI-enabled data coding standards across industry domains and is keeping its promise. Time to join MPAI!

Image by starline on Freepik