Moving Picture, Audio and Data Coding
by Artificial Intelligence

Archives: 2021-11-21

Visiting MPAI standards – the MPAI Metaverse Model foundations

Much has been and is being said about the vagueness of the notion of “metaverse”. To compensate for this, the current trend is to add an adjective to the “metaverse” name. So, now we have studies on industrial metaverse, medical metaverse, tourist metaverse, and more.

In the early phases of its metaverse studies, when it was scoping the field, MPAI did consider 18 metaverse domains (use cases). Now, however, that phase is over because the right approach to standards is to identify what is common first and the differences (profiles) later.

In this paper we will try to identify features that are expected to be common across metaverse instances (M-Instances).

The basic metaverse features are the ability to:

  • Sense U-Environments (i.e., portions of the Universe) and their elements: inanimate and animate objects, and measurable features (temperature, pressure, etc.). By animate we mean humans, animals, and machines that move such as robots.
  • Create a virtual space (M-Instance) and its subsets (M-Environments).
  • Populate virtual spaces with digitised objects (captured from U-Environments) and virtual objects (created in the M-Instance).
  • Communicate with other M-Instances.
  • Actuate U-Environments as a result of the activities taking place in the M-Instance.

How will such an M-Instance be implemented?

We assume that an M-Instance is composed of a set of processes running on a computing environment. Of course, the M-Instance could be implemented as a single process, but this is a detail. What is important is that the M-Instance implements a variety of functions. Here we assume that functions correspond to processes. These are individually activated where they are accessible at the atomic level or by a single large process.

While a process is a process is a process, it is useful to characterise some processes. The first type of process is a Device, having the task to “connect” a U-Environment with an M-Environment. We assume that to achieve a safe governance, a Device should be connected to an M-Instance under the responsibility of a human. The second type of process is the User, a process that “represents” a human in the M-Instance and acts on their behalf. The third type is a Service, a process able to perform specific functions such as creating objects. The fourth type is an App, a process running on a Device. An example of App is a User that is not executing in the metaverse platform but rather on the Device.

An M-Instance includes objects connected with a U-Instance; some objects, like digitised humans, mirror activities carried out in the Universe; activities in the M-Instance may have effects on U-Environments. There are sufficient reasons to assume that the operation of an M-Instance be governed by Rules. A reasonable application of the notion of Rules is that a human wishing to connect a Device to, or deploy Users in an M-Instance should register and provide some data.

Data required to register could be a subset of the human’s Personal Profile, Device IDs, and User IDs. There are, however, other important elements that may have to be provided for a fuller experience. One is what we call Persona, i.e., an Avatar Model that the User process can utilise to render itself. Obviously, a User can be rendered as different Personae, if the Rules so allow. A second important element is the Wallet: a registering human may decide to allow one of their Users to access a particular Wallet to carry out its economic activity in the M-Instance.

Figure 1 pictorially represents some of the points made so far.

Figure 1 – Universe-Metaverse interaction

The activities of a human in a U-Environment captured by a Device may drive the activities of a User in the M-Instance. The human can let one of User:

  1. Just execute in the M-Instance without rendering itself.
  2. Render itself as an autonomously animated Persona.
  3. Render itself as a Persona animated by the movements of the human.

We have treated the important case of a human and their User agent. What about other objects?

Besides processes performing various functions, an M-Instance is populated by Items, i.e., Data and Metadata supported by the M-Instance and bearing an Identifier. An Item may be produced by Identifying imported Data/Metadata or internally produced by an Authoring Service. This is depicted in Figure 2 where User produces:

  1. Item1 by calling the Authoring Service1
  2. Item2 by importing data and metadata and then calling Identification Service2.

Figure 2 – Objects in an M-Instance

A more complete view of an M-Instance is provided by Figure 3.

Figure 3 – M-Instance Model

In Figure 3 we see that:

  1. Human1 and Human3 are connected to the M-Instance via a Device, but Human2 is connected to the M-Instance with two Device.
  2. Human1 has deployed one User, User 2 two Users and Human3
  3. User1.1 of Human1 is rendered as one Persona1.1.1, User 2.1 of Human2 as two Personae (Persona2.1.1 and Persona2.1.2), and User2.2 as one Persona2.2.1.
  4. Object1 in the U-Environment is captured by 1 and Device3.1 and mapped as two distinct Objects: Object1.2 and Object3.1.
  5. Users and Services variously interact.

What are the interactions referred to in point 5. above? We assume that an M-Instance is populated by Services performing functions that are useful for the life of the M-Instance. We call standard functions “Actions”.  MPAI has specified the functional requirements of a set of Actions:

  1. General Actions (Register, Change, Hide, Authenticate, Identify, Modify, Validate, Execute).
  2. Call a Service (Author, Discover, Inform, Interpret, Post, Transact, Convert, Resolve).
  3. M-Instance to M-Instance (MM-Add, MM-Animate, MM-Disable, MM-Embed, MM-Enable, MM-Send).
  4. M-Instance to U-Environment (MU-Actuate, MU-Render, MU-Send, Track).
  5. U-Environment to M-Instance (UM-Animate, UM-Capture, UM-Render, UM-Send).

The semantics of some of these Actions are:

  1. Identify: convert data and metadata into an Item bearing an Identifier.
  2. Discover: request a Service to provide Items and/or Processes with certain features.
  3. MM-Embed: place an Item at a particular M-Instance location (M-Location).
  4. MU-Render: select Items at an M-Location and render them at a U-Environment.
  5. UM-Animate: use a captured animation stream to animate a Persona.

How do interactions take place in the M-Instance?

A User may have the capability to perform certain Actions on certain Items but more commonly a User may ask a Device to do something for it, like capture an animation stream and use it to animate a Persona. The help of the Device may not be sufficient because MPAI assumes that an animation stream is not an Item until it gets Identified as such. Hence, the help of the Identify Service is also needed.

MPAI has defined the Inter-Process Communication Protocol (Figure 4) whereby

  1. A process creates, identifies and sends a Request-Action Item to the destination process.
  2. The receiving process
    1. May or may not perform the action requested
    2. Sends a Response-Action.

Figure 4 – The Inter-Process Interaction Protocol

Table 1 – The Inter-Process Interaction Protocol

Request-Action Response-Action Comments
Request-Action ID Response-Action ID Unique ID
Emission Time Emission Time Time of Issuance
Source Process ID Source Process ID Requesting Process ID
Destination Process ID Destination Process ID Requested Process ID
Action The Action requested
InItems OutItems In/Output Items of Action
InLocations Locations of InItems
OutLocations Locations of OutItems
OutRights Expected Rights on OutItems

The Request-Action payload includes the ID, the time, the requesting process and destination process IDs, the Action requested, the InItems on which the action is applied, where the InItems are found and the resulting OutItems are found, and the Rights the requesting process needs to have in order to act on the OutItems.

Having defined standard Actions, here is how standard Items are defined:

  1. General (M-Instance, M-Capabilities, M-Environment, Identifier, Rules, Rights, Program, Contract)
  2. Human/User-related (Account, Activity Data, Personal Profile, Social Graph, User Data).
  3. Process Interaction (Message, P-Capabilities, Request-Action, Response-Action).
  4. Service Access (AuthenticateIn, AuthenticateOut, DiscoverIn, DiscoverOut, InformIn, InformOut, InterpretIn, InterpretOut).
  5. Finance-related (Asset, Ledger, Provenance, Transaction, Value, Wallet).
  6. Perception-related (Event, Experience, Interaction, Map, Model, Object, Scene, Stream, Summary).
  7. Space-related (M-Location, U-Location).

Here are a few examples of the Item semantics:

  1. Rights: the Item describes the ability of a process to perform an Action on an Item at a time and at M-Location.
  2. Social Graph: the log of a process, e.g., a User.
  3. P-Capabilities: the Item describes the Rights held by a process and related abilities.
  4. DiscoverIn: the description of the User’s request.
  5. Asset: an Item that can be transacted.
  6. Model: data exposing animation interfaces.
  7. MLocation: delimits a space in the M-Instance.

We also need to define several entities – called Data Types – used in the M-Instance:

  1. Location and time (Address, Coordinates, Orientation, Point of View, Position, Spatial Attitude, Time).
  2. Transaction-related (Amount, Currency).
  3. Internal state of a User (Cognitive State, Emotion, Social Attitude, Personal Status).

Finally, we need to address the issue of a process in M-InstanceA requesting a process in another M-InstanceB to perform Actions on Items. In general, it is not possible for a process in M-InstanceA to communicate with a process in M-InstanceB because of security concerns, but also because the other M-InstanceB may use different data types. MPAI solves this process by extending the Inter-Process Interaction Protocol and introducing two Services:

  1. Resolution ServiceA: can talk to Resolution ServiceB.
  2. Conversion Service: can convert the format of M-InstanceA data into the format of M-InstanceB.

Figure 5 – The Inter-Process Interaction Protocol between M-Instances

This is a very high-level description of the MPAI Metaverse Model – Architecture standard that enables Interoperability of two or more M-Instances if they:

  1. Rely on the Operation Model, and
  2. Use the same Profile Architecture, and
    1. Either the same technologies, or
    2. Independent technologies while accessing Conversion Services that losslessly transform Data of an M-InstanceA to Data of an M-InstanceB.

Visiting MPAI standards – Connected Autonomous Vehicles (MPAI-CAV) – Architecture

Introduction

MMPAI-36 (September 2023) has approved the publication of five standards. Two are extensions of already published standards (and adopted by IEEE without modifications) and three brand new. This is an overview of the main content of one of the new standards, Technical Specification: Connected Autonomous Vehicles – Architecture (MPAI-CAV).
MPAI works on CAV standards because replacing current vehicles with CAVs is desirable from many viewpoints. CAVs are expected to offer a safer drive by replacing human errors with machine errors that are expected to be less frequent, allowing more time for rewarding activities, offering opportunities for better use of vehicles and road infrastructure, enabling more sophisticated traffic management, reducing congestion and pollution, and helping elderly and disabled people to have a better life. On the other hand, CAVs are available today more for experimental than regular use also because, unlike the current highly componentised automotive industry, CAV companies are monolithic, and develop internally and assemble all the components they need to make their CAVs and because the level of reliability is insufficient were CAVs deployed in massive numbers.
A CAV standard would allow the acceleration of the maturity of the CAV industry. But which standard? First, the standard MPAI is targeting would not be related to the “hardware” side but to the “software” of a CAV. Second, the standard should not address CAVs in its entirety but adopt a component approach, not unlike what the automotive industry does for the “hardware” side.
A component approach is useful for the two expected stages of the process. The first stage applies now because research can concentrate on a specific component, defined by its function and interface and optimise the component performance, possibly attaching proposed revised functions and interfaces. The second stage applies to the time when CAVs will have reached a sufficient level of performance, and component-based mass production of CAVs becomes attractive. An open market of CAV components can naturally be formed where competing providers can offer components with standard functions and interfaces but with alternative performance compared to what is available on the market at that time.

The MPAI-CAV – Architecture standard

MPAI-CAV – Architecture is a standard implementing the first step of this strategy. It defines a CAV Reference Model composed of four subsystems, each composed of interconnected components. Technical Specification: AI Framework (MPAI-AIF) is the natural selection for implementing this Reference Model. The AI Framework (AIF) specified by the standard is the environment executing AI Workflows (AIW) that correspond to the subsystems of the Reference Model. Each subsystem is composed of components called AI Modules (AIM).
The Subsystem-level Reference model is represented in Figure 1.

Figure 1 – The MPAI-CAV – Architecture Reference Model

There are four subsystem-level reference models, each specified in term of:

  1. The functions the subsystem performs.
  2. The AIF-based Reference Model.
  3. The input/output data exchanged by the CAV subsystem with other subsystems and the environment.
  4. The functions of each subsystem components to be implemented as AI Modules.
  5. The input/output data exchanged by the component with other components.

The functions and reference models of the MPAI-CAV – Architecture Subsystems will be presented next.

Human-CAV Interaction (HCI)

The HCI functions are:

  1.  To authenticates humans, e.g., to let them into the CAV.
  2. To converse with humans by interpreting utterances, e.g., to go to a destination, or during a conversation.
  3. To converse with the Autonomous Motion Subsystem to implement and execute human commands.
  4. To enable passengers to navigate the Full Environment Representation (FER), i.e., the best representation of the external environment achieved by the CAV.
  5. Appears as a speaking avatar showing a Personal Status, i.e., a simulated internal status of the machine represented according to the criteria used by humans (see https://mpai.community/standards/mpai-mmc/about-mpai-mmc/).

The HCI Reference Model is depicted in Figure 2.

Figure 2 – Human CAV Interaction Reference Model

Environment Sensing Subsystem (ESS)

The ESS functions are:

  1. To acquire Environment information using Subsystem’s RADAR, LiDAR, Cameras, Ultrasound, Offline Map, Audio, GNSS, …
  2. To receive Ego CAV’s position, orientation, and environment data (temperature, humidity, etc.) from Motion Actuation Subsystem.
  3. To produce Scene Descriptors for each sensor technology in a common format.
  4. To produce the Basic Environment Representation (BER) by integrating the sensor-specific Scene Descriptors produced during the travel.
  5. To hand over the BERs, including Alerts, to the Autonomous Motion Subsystem.

The ESS Reference Model is depicted in Figure 3.ù

Figure 3 – Environment Sensing Subsystem Reference Model

Autonomous Motion Subsystem (AMS)

The AMS functions are:

  1. To compute and execute the human-requested Route(s).
  2. To receive current BER from Environment Sensing Subsystem.
  3. To communicate with other CAVs’ AMSs (e.g., to exchange subsets of BER and other data).
  4. To produce the Full Environment Representation by fusing its own BER with info from other CAVs in range.
  5. To send Commands to Motion Actuation Subsystem to take the CAV to the next Pose.
  6. To receive and analyse responses from the MAS.

The AMS Reference Model is depicted in Figure 4.

Figure 4 – Autonomous Motion Subsystem Reference Model

Motion Actuation Subsystem

The MAS functions are:

  1. To transmit spatial/environmental information from sensors/mechanical subsystems to the Environment Sensing Subsystem.
  2. To receive Autonomous Motion Subsystem Commands.
  3. To translates Commands into specific Commands to its own mechanical subsystems, e.g., steering brakes, wheel directions, and wheel motors.
  4. To receive and analyse Responses from its mechanical subsystems.
  5. To Sends responses to Autonomous Motion Subsystem about execution of commands.

The MAS Reference Model is depicted in Figure 5.

Figure 5 – Motion Actuation Subsystem Reference Model

Conclusions

The MPAI-CAV Architecture standard is the starting point for the next steps of the MPAI-CAV roadmap addressing functional requirements of the data exchanged by subsystems and components


XR Venues – Live Theatrical Stage Performance

Scope

MPAI has issued a Call for Technologies seeking proposals for MPAI-XRV – Live Theatrical Stage Performance, an MPAI standard project seeking to define interfaces of AI Modules (AIM) facilitating live multisensory immersive stage performances. By running AI Workflows (AIW) composed of AIMs, it will be possible to obtain a more direct, precise yet spontaneous show implementation and control of multiple complex systems to achieve the show director’s vision.

Setting

An implementation of MPAI-XRV – Live Theatrical Stage Performance includes:

  1. A physical stage.
  2. Lighting, projections (e.g., dome or immersive display, holograms, AR goggles), and spatialised audio and Special Effects (FX) including touch, smell, pyro, motion seats etc.
  3. Audience (standing or seated) in the real and virtual venue and external audiences via interactive streaming.
  4. Interactive interfaces to allow audience participation (e.g., voting, branching, real-virtual action generation).
  5. Performers on stage, platforms around domes or moving through the audience (immersive theatres).
  6. Capture of biometric data from audience and/or performers from wearables, sensors embedded in the seat, remote sensing (e.g., audio, video, lidar), and from VR headsets.
  7. Show operator(s) to allow manual augmentation and oversight of an AI that has been trained by show operator activity.
  8. Virtual Environment (metaverse) that mirrors selected elements of the Real Environment. For example, performers on the stage are mirrored by digital twins in the metaverse, using motion capture (MoCap), green screen or volumetrically captured 3D images.
  9. Real Environment can also mirror selected elements of the Metaverse similar to in-camera visual effects/virtual production techniques. Elements of the Metaverse such as avatars, landscape, sky, objects can be represented in the Real Environment through immersive displays, video mapped stages and set pieces, AR overlays, lighting, and FX.
  10. The physical stage and set pieces blend seamlessly into the virtual 3D backdrop projected onto the immersive display such that the spectators perceive as a single immersive environment.
  11. The Script or cue list describes the show events, guiding and synchronising the actions of all AI Modules (AIM) as the show evolves from cue to cue and scene to scene. In addition to performing the show, the AIMs might spontaneously innovate show variations, amplify the actions of performers or respond to commands from operators by modifying the Real or Virtual Environment within scripted guidelines.

Requirements

Respondents to Call for Technologies should consider the system requirements depicted in the Figure and in the following text. Note that terms in bold underlined refer to AIMs and those in bold refer to data types.

Participants Description capture the mood, engagement, and choices of participants (audience) and produce Participant Descriptors with the following features:

  1. Use a known (i.e., standard) format to enable processing by subsequent AIMs.
  2. Express Participants’:
    1. Visual behaviour (hand waving, standing, etc.)
    2. Audio reaction (clapping, laughing, booing, etc.)
    3. Choice (voting, motion controller, text, etc.)

Performance Description extracts information from the stage regarding position and orientation of performers and immersive objects and produce the Scene Descriptors with the following features:

  1. Use existing or new formats independent of the capturing technology, e.g.,
    1. For audio: Multichannel Audio, Spatial Audio, Ambisonics, etc.
    2. For visual: audio and raw video, volumetric, MoCap, etc.
  2. Allow for the following features:
    1. Accurate description of spatial and AV components.
    2. Individual accessibility and processability of objects.
    3. Unique association of objects with their digital representations.
    4. Association of the audio and visual components of audio-visual objects.

Participants Status interprets the Participants Descriptors and provides the Participants Status data type expressed either by:

  1. A Format supporting the semantics of a set of statuses over time in terms of:
    1. Sentiment (e.g., measurement of spatial position-based audience reaction).
    2. Expression of choice (e.g., voting, physical movement of audience).
    3. Emergent behaviour (e.g., pattern emerging from coordinated movement).
  2. A Language describing the Participant Status both at a time and as a trend.

 Performance Status interprets the Scene Descriptors and indicates the current Cue point determined by a real or virtual phrase, gesture, dance motion, prop status etc. according to the Script.

Operator Command Interpreter interprets data and commands such as Audio/DJ/VJ, Show Control, Lighting/FX and generates Interpreted Operator Controls, a data type independent of the specific input format and includes:

  1. Show control consoles (e.g., rigging, elevator stages, prop motions, and pyro).
  2. Audio control consoles (e.g., controls audio mixing and effects).
  3. DJ/VJ control consoles (e.g., real-time AV playback and effects).

All consoles may include sliders, buttons, knobs, gesture/haptic interfaces, joystick, touch pads.

Interpreted Controls may be Script-dependent.

Script includes written Show Script including character dialog, song lyrics, stage action, etc. plus a Master Cue Sheet with a corresponding technical description of all experiential elements including sound, lighting, follow spots, set movements, cue number, and show time of cue.

The cue sheet advances to the next cue point based on quantifiable or clearly defined actions such as spoken word, gesture, etc.

Various formats for show scripts and cue sheets exist.

Script is a data type with the following features:

  1. It is machine readable and actionable.
  2. It has an extensible set of clearly defined events/criteria for triggering the cue.
  3. Uses a language for expressing Action Descriptors, which define the experiential elements associated with each cue.

Action Generation

Action Generation uses Participants Status, Scene Descriptors, Cue Points and Interpreted Operator Controls to produce Action Descriptors, a data type that may be either existing or known (e.g., text prompts) or new with the following features:

  1. Ability to describe the Actions necessary to create the complete experience – in both the Real and Virtual Environments – in accordance with the Script.
  2. Ability to express all aspects of the experience including the performers’ and objects’ position, orientation, gesture, costume, etc.
  3. Independence of the specifications of a particular venue.

Virtual Environment Experience Generation processes Action Descriptors and produces commands that are actionable in the Virtual Environment. It produces the following data:

  1. Virtual Environment Descriptors A variety of controls for 3D geometry, shading, lighting, materials, cameras, physics, etc. as required to affect the Virtual Environment, e.g., OpenUSD. The actual format used is dependent upon the current Virtual Environment Venue Specification.
  2. Audio-Visual (A/V) Data and commands for all A/V experiential elements with the virtual environment, including audio and video.
  3. Audio: M Audio channels (generated by or passed through the AIM), placement of audio channels, including attachments to objects or characters, Virtual Audio device commands, e.g., MIDI.
  4. Video: N Video channels (generated by or passed through the AIM), Placement of Video channels, including mapping to objects and characters, Virtual VJ console commands/data, Virtual Video mixing console commands.
  5. Virtual Environment Venue Specification An input to the Virtual Experience Generation AIM defining protocols, and data and command structures for the specific Virtual Environment Venue.

Real Environment Experience Generation processes Action Descriptors and produces commands that are actionable in the Real Environment:

  1. Real Environment Venue Specification defines protocols, data, and command structures for the specific Real Environment Venue. This could include number, type, and placement of lighting fixtures, special effects, sound, and video display resources.
  2. FX (Effects) Commands and data for all FX generators (e.g., fog, rain, pyro, mist, etc. machines, 4D seating, stage props, rigging etc.), typically using various standard protocols. FX systems may include video and audio provided by Real Experience Generation.
  3. Lighting Commands and data for all lighting systems, devices, and elements, typically using the DMX protocol or similar. Lighting systems may include video provided by Real Experience Generation.
  4. Audio-Visual (A/V) Data and commands for all A/V experiential elements, including audio, video, and capture cameras/microphones. They include:
  5. Camera control: Camera #n Camera on/off, Keyframe based (Spatial Attitude, Optical parameters (aperture, focus, zoom), Frame rate, Camera)
  6. Audio: M Audio channels (generated by or passed through the AIM), Audio source location designation (channel number or spatial orientation of STEM), MIDI device commands, Audio server commands, Mixing console commands.
  7. Video: N Video channels (generated by or passed through the AIM), Video display location designation (display number or spatial orientation and mapping details), Video server commands, VJ console commands/data, Video mixing console commands.

Image by pch.vector on Freepik

 


MPAI starts new standard project on “AI for Health”

Geneva, Switzerland – 25 October 2023. MPAI, Moving Picture, Audio and Data Coding by Artificial Intelligence, the international, non-profit, and unaffiliated organisation developing AI-based data coding standards has concluded its 37th General Assembly (MPAI-37) approving the start of a new project on AI for Health.

AI for Health (MPAI-AIH) envisages a system where clients acquire and process individuals’ health data using shared AI models and upload data to the backend with attached licences expressed by smart contracts. Third parties may process health data based on the relevant licence. From time to time the backend uses federated learning to collect and use the AI models to retrain the common models. These are redistributed to all clients.

MPAI is continuing its work plan that involve the following activities:

  1. AI Framework (MPAI-AIF): reference software, conformance testing, and application areas.
  2. Avatar Representation and Animation (MPAI-PAF): reference software, conformance testing and new areas.
  3. Context-based Audio Enhancement (CAE-DC): new projects.
  4. Connected Autonomous Vehicle (MPAI-CAV): Functional Requirements of CAV architecture.
  5. Compression and Understanding of Industrial Data (MPAI-CUI): preparation for extension of existing standard.
  6. Multimodal Conversation (MPAI-MMC): reference software, drafting conformance testing, and new areas.
  7. MPAI Metaverse Model (MPAI-MMM): reference software and metaverse technologies requiring standards.
  8. Neural Network Watermarking (MPAI-NNW): reference software for enhanced applications.
  9. End-to-End Video Coding (MPAI-EEV): video coding using AI-based End-to-End Video coding.
  10. AI-Enhanced Video Coding (MPAI-EVC). video coding with AI tools added to existing tools.
  11. Server-based Predictive Multiplayer Gaming (MPAI-SPG): technical report on mitigation of data loss and cheating.
  12. XR Venues (MPAI-XRV): preparation for the development of the standard.

Legal entities and representatives of academic departments supporting the MPAI mission and able to contribute to the development of standards for the efficient use of data can become MPAI members.

Please visit the MPAI website, contact the MPAI secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook, Instagram, and YouTube.


MPAI is running at full speed

Established in September 2020, MPAI has published five standard this week bringing the total to nine. Let’s see what they are about.

MPAI Metaverse Model (MPAI-MMM) – Architecture is the first technical metaverse standard published by a standard body. MPAI MMM specifies technologies enabling two metaverse instances M-InstanceA and M-InstanceB to interoperate if they: rely on the same Operation Model, use the same Profile, and either use the same technologies, or use independent technologies while accessing Conversion Services that losslessly transform data of an M-InstanceA to data of an M-InstanceB.

AI Framework (MPAI-AIF) V2 specifies a secure environment called AI Framework (AIF) enabling dynamic configuration, initialisation, and control of AI Workflows (AIW) composed of AI Modules (AIM). AIMs and AIWs are defined by function and interfaces; AIWs also by AIM topology.

Connected Autonomous Vehicle (MPAI-CAV) – Architecture  is the first technical standard on connected autonomous vehicles published by a standard body. MPAI-CAV specifies the Architecture of a CAV based on a Reference Model comprising a CAV composed of Subsystems (AIW) with specified Functions, I/O Data, and Topology. Each Subsystem is made up of Components with specified Functions and I/O Data.

Multimodal Conversation (MPAI-MMC) V2 specifies data formats for analysis of text, speech, and other non-verbal components as used in human-machine and machine-machine conversation applications and Multimodal Conversation-related AIWs and AIWs using data formats from MPAI-MMC and other MPAI standards.

Portable Avatar Format (MPAI-PAF) specifies the Portable Avatar and related data formats allowing a sender to enable a receiver to decode and render an Avatar as intended by the sender; the Personal Status Display Composite AI Module allowing the conversion of a Text and a Personal Status to a Portable Avatar; and the AIWs and AIMs used by the Avatar-Based Videoconference Use Case.

Let’s see now which are the previously developed standards.

Context-based Audio Enhancement (MPAI-CAE) specifies data types for the improvement of the user experience in audio-related applications for a variety of contexts using context information and Audio-related AIWs and AIWs using data formats from MPAI-CAE and other MPAI standards.

Neural Network Watermarking (MPAI-NNW) specifies methodologies to evaluate the following aspects of neural network (NN) watermarking-related technologies: The impact on the performance of a watermarked NN and its inference; The ability of an NN watermarking detector/decoder to detect/decode a payload of a modified watermarked NN; The computational cost of injecting, detecting, or decoding a payload in the watermarked NN.

Compression and Understanding of Industrial Data (MPAI-CUI) specifies data formats, AIMs and an AIW to predict a company’s probability of default and business discontinuity, and to provide an organisational model index (Company Performance Prediction Use Case).

Governance of the MPAI Ecosystem (MPAI-GME) specifies the roles and rules of Ecosystem players: MPAI, Implementers, MPAI Store, Performance Assessors, Users.

MPAI was established to develop AI-enabled data coding standards across industry domains and is keeping its promise. Time to join MPAI!

Image by starline on Freepik


New AI-driven standards pioneer the future of immersive entertainment

Press Release

Geneva, 2023/09/29 – The founder of MPEG, Leonardo Chiariglione, inspired by the prospects of AI, is leading an initiative – MPAI (Moving Picture, Audio and Data Coding by Artificial Intelligence) – to drive AI standards that will supercharge next-generation immersive entertainment venues.  They have already developed a range of AI standards for audio enhancement and more natural forms of human-machine conversation and others which have subsequently been adopted by the IEEE.

The MPAI community is now focused on developing standards for XR Venues – particularly venues supporting live theatrical performances where the user experience spans both real and virtual environments.

The purpose of the planned MPAI-XRV – Live Theatrical Stage Performance standard is to address AI functions that facilitate live multisensory immersive performances. Broadway theatres, musicals, dramas, operas, and other performing arts are increasingly using video scrims, backdrops, as well as projection mapping to create digital sets. Such shows ordinarily require extensive digital set design and on-site show control staff to operate. The use of AI will allow faster mounting of shows, more direct, precise yet spontaneous show implementation and control to achieve the show director’s vision. It will also free staff from repetitive and technical tasks allowing them to amplify their artistic and creative skills.

Ultimately, the MPAI-XRV standard will allow the entire performance stage to become an immersive digital virtual environment which, when merged with a metaverse environment, creates a “digital twin” representation of live performers within the virtual world. Major metaverse concert events can therefore originate as a live performance with an in-person audience while simultaneously being enjoyed by millions in virtual reality. Emerging immersive venues such as MSG Sphere and COSM and various immersive art galleries are already well suited to such an approach.

MPAI recently issued a Call for Technologies, inviting industry participation in the MPAI-XRV Live Theatrical Stage Performance standards effort. Participating companies are encouraged to respond to the call at https://mpai.community/standards/mpai-xrv/. Individuals wishing to participate may also join MPAI by contacting secretariat@mpai.community.

MPAI – Moving Picture, Audio and Data Coding by Artificial Intelligence is an international not-for-profit association with the mission to develop AI-based data coding standards with clear IPR frameworks.


MPAI celebrates its third anniversary by publishing five standards

Geneva, Switzerland – 29 September 2023. MPAI, Moving Picture, Audio and Data Coding by Artificial Intelligence, the international, non-profit, and unaffiliated organisation developing AI-based data coding standards has concluded its 36th General Assembly (MPAI-36) approving the publication of five standards: AI Framework V2, Connected Autonomous Vehicle Architecture, Multimodal Conversation V2, MPAI Metaverse Model, and Portable Avatar Format, and one Conformance Testing of the Context-based Audio Enhancement standard.

In three years, MPAI has been able to produce nine standards in the areas of execution of AI applications, audio enhancement, autonomous vehicles, financial data, ecosystem governance, MPAI multimodal conversation, metaverse, neural network watermarking, and portable avatars, produced a second extended version for three, and is now looking forward to receiving responses to two Calls for Technologies on AI for Health and XR Venues – Live Theatrical Stage performance. More information about standards and projects can be found here.

Ominously, the date of the 36th General Assembly falls on the eve of the third anniversary of the MPAI foundation.

MPAI is continuing its work plan that involve the following activities:

  1. AI Framework (MPAI-AIF): reference software, conformance testing, and application areas.
  2. Avatar Representation and Animation (MPAI-PAF): reference software, conformance testing and new areas.
  3. Context-based Audio Enhancement (CAE-DC): new projects.
  4. Connected Autonomous Vehicle (MPAI-CAV): Functional Requirements of CAV architecture.
  5. Compression and Understanding of Industrial Data (MPAI-CUI): preparation for extension of existing standard.
  6. Multimodal Conversation (MPAI-MMC): reference software, drafting conformance testing, and new areas.
  7. MPAI Metaverse Model (MPAI-MMM): reference software and metaverse technologies requiring standards.
  8. Neural Network Watermarking (MPAI-NNW): reference software for enhanced applications.
  9. AI Health (MPAI-AIH): preparation for the development of the standard.
  10. End-to-End Video Coding (MPAI-EEV): video coding using AI-based End-to-End Video coding.
  11. AI-Enhanced Video Coding (MPAI-EVC). video coding with AI tools added to existing tools.
  12. Server-based Predictive Multiplayer Gaming (MPAI-SPG): technical report on mitigation of data loss and cheating.
  13. XR Venues (MPAI-XRV): preparation for the development of the standard.

Legal entities and representatives of academic departments supporting the MPAI mission and able to contribute to the development of standards for the efficient use of data can become MPAI members.

Please visit the MPAI website, contact the MPAI secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook, Instagram, and YouTube.

 

 


Summary of MPAI Calls for Technologies and Standards

MPAI has concluded a series of presentations illustrating 3 Calls for Technologies and 5 Technical Specifications posted for Community Comments planned for final approval on 29 September 2023.

Whether you intend to respond to a Call or make comments on a Standard, get familiar with the website (1st column), the overview (2nd column), the slides (3rd column), and the video recording (4th column) of the presentation of each Call  or Standard.

Calls for Technologies

Artificial Intelligence for Health Data Overview Slides Video
Object and Scene Description Overview Slides Video
XR Venues – Live Theatrical Stage Performance Overview Slides Video

Standards open to Community Comments

AI Framework V2 Overview Slides Video
Avatar Representation and Animation Overview Slides Video
Connected Autonomous Vehicle – Architecture Overview Slides Video
MPAI Metaverse Model – Architecture Overview Slides Video
Multimodal Conversation V2 Overview Slides Video

There are a few weeks to respond to a Call for Technologies and a few more days to comment on the Standards posted for Community Comments.


What is the Object and Scene Description (MPAI-OSD) Call for Technologies about?

Object and Scene Description (MPAI-OSD) is a project for a standard specifying technologies for object description and their localisation in space. Such technologies are used across several use cases of several MPAI standards.

Figure 1 gives two examples that assume the types of output to Audio and Visual Scene Descriptors.

Figure 1 – Audio and Visual Scene Description

The next Figure 2 provides one solution to the problem of assigning identifiers to the Objects – extracted from an audio-visual scene, especially for the purpose of identifying those that are audio-visual such as a human and their speech.

Figure 2 – Audio-Visual Alignment

Another example is provided by Figure 3.

Figure 3 – Visual Spatial Object Identification

Figure 4 is an example of the Conversation with Personal Status use case that makes use of all the (Composite) AI Modules described above.

Figure 4 – Reference Model of Conversation with Personal Status (MPAI-CPS)

MPAI has sought proposals for data formats and reference models for the identified application areas.

Call for Technologies (closed) html,  pdf
Use Cases and Functional Requirements htmlpdf
Framework Licence htmlpdf
Template for responses htmldocx

See also the video recordings (YouTubeWimTV) and the slides of the presentation made on 07 September.

 


An overview of AI Framework (MPAI-AIF)

From its early days, MPAI realised that AI-based data coding standards could facilitate AI explainability if monolithic AI applications could be broken down to individual components with identified functionality processing and producing data with semantics known as far as possible. An important side effect of this approach was identified in the possibility for developers to provide components with standard interfaces and potentially better performance than that provided by other developers. Version 1 of AI Framework (MPAI-AIF) published in September 2021 was the first standard supporting the original vision. Most MPAI application standards are built on top of MPAI-AIF.

Figure 1 – Reference Model of AI Framework (MPAI-AIF) V1

The main features of MPAI-AIF V1 were:

  1. Independence of the Operating System.
  2. Components have specified interfaces encapsulating Components to abstract them from the development environment.
  3. Interface with the MPAI Store enabling access to validated Components.
  4. Components implementable as software only, hardware only, and Hybrid hardware-software.
  5. Execution in local and distributed Zero-Trust architectures.
  6. Possibility to interact with other AIF implementations operating in proximity.

The Components have the following functionalities:

  • Access: to access to static or slowly changing data such as domain knowledge data, data models.
  • AI Module (AIM): a data processing element receiving Inputs and producing Outputs according to its Function. An AIM may be an aggregation of AIMs.
  • AI Workflow (AIW): an organised aggregation of AIMs implementing a Use Case.
  • Communication: connects the Components of an AIF.
  • Controller: can run May run one or more AIWs and exposes three APIs:
    • AIM API modules can register, communicate and access the AIF environment; can start, stop, and suspend AIMs.
    • User API User or other Controllers can perform high-level tasks (e.g., switch the Controller on and off, give inputs to the AIW through the Controller).
    • MPAI Store API communication between the AIF and the Store.
  • Global Storage: stores data shared by AIMs.
  • AIM/AIW Storage: stores data of the individual AIMs (securely/non-securely).
  • MPAI Store: stores Implementations for users to download.
  • User Agent: The Component interfacing the user with an AIF through the Controller

Table 1 gives the APIs exposed by MPAI-AIF V1 are:

Table 1 – APIs of MPAI-AIF V1

# API
8.1 Store API called by Controller
8.1.1 Get and parse archive
8.2 Controller API called by User Agent
8.2.1 General
8.2.2 Start/Pause/Resume/Stop Messages to other AIWs
8.2.3 Inquire about state of AIWs and AIMs
8.2.4 Management of Shared and AIM Storage for AIWs
8.2.5 Communication management
8.2.6 Resource allocation management
8.3 Controller API called by AIMs
8.3.1 General
8.3.2 Resource allocation management
8.3.3 Register/deregister AIMs with the Controller
8.3.4 Send Start/Pause/Resume/Stop Messages to other AIMs
8.3.5 Register Connections between AIMs
8.3.6 Using Ports
8.3.7 Operations on messages
8.3.8 Functions specific to machine learning
8.3.9 Controller API called by Controller

Version 1 assumed that the environment was Zero-Trust but its implementation was left to the developer. Version 2 extends Version 1 by making V1 as the Basic Profile of MPAI-AIF. The Basic Profile utilises:

  1. Non-Secure Controller.
  2. Non-Secure Storage.
  3. Secure Communication enabled by secure communication libraries.
  4. Basic API.

The Secure Profile utilises all the technologies in this Technical Specification.

V2 adds the necessary technology for a new Secure Profile offering the following functionalities:

  • The Framework provides access to the following Trusted Services:
    • A selected range of cyphering algorithms.
    • A basic attestation function.
    • Secure storage.
    • Certificate-based secure communication.
  • The AIF can execute only one AIW containing only one AIM. The sole AIM has the following features:
    • The AIM may be a Composite AIM.
    • The AIMs of the Composite AIM cannot access the Secure API.
  • The AIF Trusted Services may rely on hardware and OS security features already existing in the hardware and software of the environment in which the AIF is implemented.

Figure 2 – Reference Model of AI Framework (MPAI-AIF) V2

By virtue of incorporating the Secure Abstraction Layer, MPAI-AIF V2 adds the following features to V1:

  1. The AIMs of a Composite AIM must run on the same computing platform.
  2. The AIW
    • The AIMs in the AIW trust each other and communicate without special security concerns.
    • Communication among AIMs in the Composite AIM is non-secure.
    • The AIW/AIMs call the Secure Abstraction Layer via API.
  3. The Controller
    • Communicates securely with the MPAI-Store and the User Agent (Authentication, Attestation, and Encryption).
    • Accesses Communication, Global Storage, Access and MPAI Store via Trusted Services API.
    • Is split in two parts:
      • Secure Controller accesses Secure Communication and Secure Storage.
      • Non-Secure Controller can access the non-secure parts of the AIF.
    • Interfaces with the User Agent in the area where non-secure code is executed.
    • Interface with the Composite AIM in the area where secure code is executed,
  4. The AIM/AIW Storage
    • Secure Storage functionality is provided through key exchange.
    • Non-secure functionality is provided without reference to secure API calls.

The capabilities of the AIF, AIW, and AIM are described by a standard JSON metadata format that enables an AIF to download suitable AIW and AIMs from the MPAI Store.

Table 2 gives the APIs exposed by MPAI-AIF V2:

Table 1 – APIs of MPAI-AIF V2

# API
9.1 Data characterization structure.
9.2 API called by User Agent
9.3 API to access Secure Storage
9.3.1 User Agent initialises Secure Storage API
9.3.2 User Agent writes Secure Storage API
9.3.3 User Agent reads Secure Storage API
9.3.4 User Agent gets info from Secure Storage API
9.3.5 User Agent deletes a p_data in Secure Storage API
9.4 API to access Attestation
9.5 API to access cryptographic functions
9.5.1 Hashing
9.5.2 Key management
9.5.3 Key exchange
9.5.4 Message Authentication Code
9.5.5 Cyphers
9.5.6 Authenticated encryption with associated data (AEAD)
9.5.7 Signature
9.5.8 Asymmetric Encryption
9.6 API to enable secure communication

MPAI has published a Working Draft of Version 2 (html, pdfrequesting Community Comments. Version 2 extends the capabilities of Version 1 making it easier to AI application developers to support security in their applications. Comments should be sent to the MPAI Secretariat by 2023/09/24T23:59 UTCAn online presentation of the AI Framework V2 WD will be held on September 11 at 08 and 15 UTC. Register here for the 08 UTC and here for the 15 UTC presentations.

For the future, MPAI plans on:

  • Publishing MPAI-AIF as a Technical Specification at the 36th General Assembly (29 September 2023).
  • Continuing the implementation of AIF V1 for more OSs and programming languages than currently available.
  • Implementing the Reference Software of MPAI-AIF V2.