Moving Picture, Audio and Data Coding
by Artificial Intelligence

Archives: 2022-07-08

What is the AI for Health Call for Technologies about?

AI for Health (MPAI-AIH) is a project addressing interfaces and data types involved in an AIH Platform where End Users acquire and process health data on their handsets equipped with an AI Framework executing AI Workflows enabled by models distributed by an AIH Back end and installed in their handsets (AIH Front ends). Figure 1 depicts the AIH Front end.

Figure 1 – The AIH Front-End

End Users upload their processed health data with associated Smart Contracts granting the AIH Back end the Rights to use the data.

AIH Back end:

  1. Stores/processes health data delivered by AIH Front ends.
  2. Collects AI Models trained by AIH Front ends with End Users’ health data, updates the common Model and distributes it to AIH Front ends (federated learning).

The Back end is depicted in Figure 2.

Figure 2 – The MPAI-AIH Platform

Third-Party Users may access the Back end to process their own or End User-provided processed health data based on the rights granted by End Users via smart contracts. External Data Sources may provide subsidiary data to the AIH Back end. This data is also governed by smart contracts.

The Call for Technologies requests proposals for the following:

Templates of smart contracts between the following parties:

  1. End User and AIH Back-End
  2. AIH Back-End and Third-Party Entity
  3. Third-Party Entity and AIH Back-End

Data Types and Usage

See Table 1.

Table 1 – Data Types and Usage

Data Type Short Description
Historical User Health Data End User’s medical history, lab results, etc
Time series Vital sign measurements (such as heart rate and blood pressure)
Sensor Data from wearable devices: smartwatches, fitness trackers, etc.
Geolocation Geographic location of individuals/samples
Social media Chats, posts, comments, and other related data
Text Unstructured data, e.g., clinical notes and patient-generated data
Audio Speech and audio recordings
Video Data from endoscopic procedures, laparoscopic surgeries, etc.
Medical images X-ray, CT, MRI, and ultrasound images
Genomic DNA sequencing data and other types of genetic information
Medical imaging 3D images, 4D images (e.g., MRI over time), and multimodal images

Aggregated Health Data Format with the following features:

  • A container to carry data from a Front-End to the Back-End.
  • Electronic Health Records (EHR) improve the efficiency and quality of healthcare by offering comprehensive, up-to-date, and accurate information about a patient’s health history to healthcare providers.
  • Fast Healthcare Interoperability Resources (FHIR): one example of a data standard used for exchanging healthcare information electronically.
  • The Aggregated Health Data Format should be wrapped in a secure envelope along with associated encryption methods and containing the user’s health data records.
  • MPAI AIH healthcare information should be exchanged electronically and wrapped in an adequate envelope.
  • The envelope format should be independent of the data it contains.

APIs

  1. AIH Back end ↔ Platform Front end
  2. AIH Back end (Federated Learning) ↔ AIH Back end
  3. AIH Back end ↔Third-Party User
  4. AIH Back end System ↔ Blockchain

MPAI is seeking proposals of technologies that enable the implementation of standard components (AI Modules) to make real the vision described above. The deadline for submitting a response is October 19 at 23:59 UTC. Those intending to submit a response should become fully familiar with the following documents:

Call for Technologies html, pdf
Use Cases and Functional Requirements html, pdf
Framework Licence html, pdf
Template for responses html, docx

See also the video recordings (YouTubeWimTV) and the slides of the presentation made on 07 September.


An overview of Portable Avatar Format (MPAI-PAF)

“Digital humans” are computer-created digital objects that can be rendered with a human appearance and called Avatars. As Avatars have mostly been created, animated, and rendered in closed environments, it is no surprise that there has been very little need for standards.

In a communication context, say, in an interoperable metaverse, digital humans may not be constrained to be in a closed environment. Therefore, if a sender requires that a remote receiving client reproduce a digital human as intended by the sender, standards are needed.

Technical Specification: Portable Avatar Format is a first response to this need, with the following goals:

  • Objective1: To enable a user to reproduce a virtual environment as intended.
  • Objective2: to enable a user to reproduce a sender’s avatar and its animation as intended by the sender.
  • Objective3: to estimate the personal status of a human or avatar.
  • Objective4: to display an avatar with a selected personal status.

Personal Status is a data type standardised by Multimodal Conversation V2 representing the ensemble of the information internal to a person, including Emotion, Cognitive State, and Attitude. See more on Personal Status here.

The MPAI-PAF standard has been designed to provide all the standards that are required to implement the Avatar-Based Videoconference Use Case where Avatars, having the visual appearance and uttering the real voice of human participants, meet in a virtual environment (Figure 1).

Figure 1 – Avatar-Based Videoconference

MPAI-PAF assumes that the system is composed of fours subsystems, as depicted in Figure 2.

Figure 2 – Avatar-Based Videoconference System

This is how the system works:

Remotely located Transmitting Clients sends to Server:

  1. At the beginning:
    1. Avatar Model(s) and Language Preferences.
    2. Speech Object and Face Object for Authentication.
  2. Continuously sends:
    1. Avatar Descriptors and Speech to Server.

The Server:

  1. At the beginning:
    1. Selects an Environment, e.g., a meeting room.
    2. Equips the room with objects, i.e., meeting table and chairs.
    3. Places Avatar Models around the table.
    4. Distributes Environment, Avatars, and their positions to all receiving Clients.
    5. Authenticates Speech and Face Objects
  2. Continuously:
    1. Translates Speech from participants according to Language Preferences.
    2. Sends Avatar Descriptors and Speech to receiving Clients.

The Virtual Secretary

  1. Receives Text, Speech, and Avatar Descriptors of conference participants.
  2. Recognises Speech streams.
  3. Refines Recognised Text and extracts Meaning.
  4. Extracts Avatars’ Personal Status.
  5. Produces a Summary.
  6. Produces Edited Summary using the comments received from participants.
  7. Produces Text and Personal Status.
  8. Creates Speech and Avatar Descriptors from Text and Personal Status.

The Receiving Clients:

  1. At the beginning:
    1. Environment Model
    2. Avatar Models
    3. Spatial Attitudes
  2. Continuously:
    1. Creates Audio and Visual Scene Descriptors.
    2. Renders the Audio-Visual Scene from the Point of View selected by Participant.

Only the Receiving Client of Avatar-Based Videconference is depicted in Figure 3.

Figure 3 – Receiving Client of Avatar-Based Videconference

The data types use by the Avatar-Based Videconference use case are given by Table 1.

Table 1 – Data Types used by PAF-ABV

Name of Data Format Specified by
Environment OSD
Body Model ARA
Body Descriptors ARA
Face Model ARA
Face Descriptors ARA
Avatar Model ARA
Avatar Descriptors ARA
Spatial Attitude OSD
Audio Scene Descriptors CAE
Visual Scene Descriptors OSD
Text MMC
Language identifier MMC
Meaning MMC
Personal Status MMC

We note that MPAI-PAF only specifies Body Model and Descriptors, Face Model and Descriptors, and Avatar Model and Descriptors. Three other MPAI standards provide the needed specifications.

The MPAI-PAF Working Draft (html, pdf) is published with a request for Community Comments. See also the video recordings (YT, WimTV) and the slides of the presentation made on 07 September. Comments should be sent to the MPAI Secretariat by 2023/09/26T23:59 UTC. MPAI will use the Comments received to develop the final draft planned to be published at the 36th General Assembly (29 September 2023).

As we said, this is a first contribution to avatar interoperability. MPAI will continue the development of Reference Software, start the development of Conformance Testing and study extensions of MPAI-PAF (e.g., compression of Avatar Description).


An overview of Connected Autonomous Vehicle (MPAI-CAV) – Architecture

Connected Autonomous Vehicles (CAV) promise to replace human errors with a lower machine errors rate, give more time to human brains for rewarding activities, optimise use of vehicles, infrastructure, and traffic management, reduce congestion and pollution, and help elderly and disabled people have a better life.

MPAI believes that standards can accelerate the coming of CAVs as an established reality and so the first MPAI standard for this is “Connected Autonomous Vehicles – Architecture”. It specifies a CAV Reference Model broken down into Subsystems for which it specifies the Functions and the data exchanged between subsystems. Each subsystem is further broken down into components for which it specifies the Functions, the data exchanged between components and the topology.

The Subsystem-level Reference model is represented in Figure 1.

Figure 1 – The MPAI-CAV – Architecture Reference Model

There are four subsystem-level reference models. Each subsystem is specified in term of:

  1. The functions the subsystem performs.
  2. The Reference model designed to be compatible with the AI Framework (MPAI-AIF) Technical Specification.
  3. The input/output data exchanged by the subsystem with other subsystems and the environment.
  4. The functions of each of the subsystem components, intended to be implemented as AI Modules.
  5. The input/output data exchanged by the component with other components.

In the following the functions and the reference models of the MPAI-CAV – Architecture will be given. The other three components can be found in the draft Technical Specification ( (htmlpdf)).

Human-CAV Interactions (HCI)

The HCI functions are:

  1. To authenticates humans, e.g., to let them into the CAV.
  2. To converses with humans interpreting utterances, e.g., to go to a destination, or during a conversation. HCI makes use of the MPAI-MMC “Personal Status” data type.
  3. To Converses with the Autonomous Motion Subsystem to implement human conversation and execute commands.
  4. To enables passengers to navigate the Full Environment Representation.
  5. Appears as a speaking avatar showing a Personal Status.

The HCI Reference Model is depicted in Figure 2.

Figure 2 – HCI Reference Model

The full HCI specification is available here.

Environment Sensing Subsystem (ESS)

The ESS functions are:

  1. To acquire Environment information using Subsystem’s RADAR, LiDAR, Cameras, Ultrasound, Offline Map, Audio, GNSS, …
  2. To receive Ego CAV’s position, orientation, and environment data (temperature, humidity, etc.) from Motion Actuation Subsystem.
  3. To produce Scene Descriptors for each sensor technology in a common format.
  4. To produce the Basic Environment Representation (BER) by integrating the sensor-specific Scene Descriptors during the travel.
  5. To hand over the BERs, including Alerts, to the Autonomous Motion Subsystem.

The ESS Reference Model is depicted in Figure 3.

Figure 3 – ESS Reference Model

The full ESS specification is available here.

Autonomous Motion Subsystem (AMS)

The AMS functions are:

  1. To compute human-requested Route(s).
  2. To receive current BER from Environment Sensing Subsystem.
  3. To communicate with other CAVs’ AMSs (e.g., to exchange subsets of BER and other data).
  4. To produce the Full Environment Representation by fusing its own BER with info from other CAVs in range.
  5. To send Commands to Motion Actuation Subsystem to take the CAV to the next Pose.
  6. To receive and analyse responses from MAS.

The AMS Reference Model is depicted in Figure 3.

Figure 4 – AMS Reference Model

The full AMS specification is available here.

Motion Actuation Subsystem

The MAS functions are:

  1. To transmit spatial/environmental information from sensors/mechanical subsystems to the Environment Sensing Subsystem.
  2. To receive Autonomous Motion Subsystem Commands.
  3. To translates Commands into specific Commands to its own mechanical subsystems, e.g., brakes, wheel directions, and wheel motors.
  4. To receive Responses from its mechanical subsystems.
  5. To Sends responses to Autonomous Motion Subsystem about execution of commands.

The MAS Reference Model is depicted in Figure 5.

Figure 5 – MAS Reference Model

The full MAS specification is available here.

The WD of Connected Autonomous Vehicle – Architecture is published with a request for Community Comments. The MPAI-CAV – Architecture Working Draft (htmlpdf) is published with a request for Community Comments. See also the video recordings (YTWimTV) and the slides of the presentation made on 06 September.  Anybody may make comment on the WD. Comments should reach the MPAI Secretariat by 2023/09/26T23:59 UTC. No specific format is required to make comments. MPAI plans on publishing MPAI-CAV – Architecture at the 36th General Assembly (29 September 2023).

The MPAI-CAV Architecture standard is the starting point for the next steps of the MPAI-CAV roadmap. The current specification does not include the Functional Requirements of the data exchanged between subsystems and components and this is exactly the activity that will start in October 2023.

Visit How to join to join MPAI.


What is the XR Venues – Live Theatrical Stage Performance Call for Technologies about?

XR Venues is an MPAI project addressing contexts enabled by Extended Reality (XR) – any combination of Augmented Reality (AR), Virtual Reality (VR) and Mixed Reality (MR) technologies – and enhanced by Artificial Intelligence (AI) technologies. The word “Venue” is used as a synonym for Real and Virtual Environments.

MPAI thinks that the Live Theatrical Stage Performance use case fits well with the current trend that sees theatrical stage performances such as Broadway theatres, musicals, dramas, operas, and other performing arts increasingly using video scrims, backdrops, and projection mapping to create digital sets rather than constructing physical stage sets, allowing the entire stage and theatre to become a digital virtual environment thus reducing the cost of mounting shows.

The use of immersion domes – especially LED volumes – can completely surround audiences with virtual environments that live performers can inhabit and interact with. In addition, Live Theatrical Stage Performance can extend into the metaverse as a digital twin. Elements of the Virtual Environment experience can be projected in the Real Environment and elements of the Real Environment experience can be rendered in the Virtual Environment (metaverse).

The purpose of the planned MPAI-XRV – Live Theatrical Stage Performance Technical Specification is to address AI Modules performing functions that facilitate live multisensory immersive performances which ordinarily require extensive on-site show control staff to operate. Use of the AI Modules organised in AI Workflows (see details here) enabled by the MPAI-XRV – LTSP Technical Specification will allow more direct, precise yet spontaneous show implementation and control to achieve the show director’s vision. It will also free staff from repetitive and technical tasks allowing them to amplify their artistic and creative skills.

Figure 1 provides the Reference Model of the Live Theatrical Stage Performance Use Case incorporating AI Modules (AIM’s). In this diagram, data extracted from the Real and Virtual Environments (on the left) are processed and injected into the same Real and Virtual Environments (on the right).

Data is collected from both the Real and Virtual Environments. This includes audio, video, volumetric or motion capture (mocap) data from stage performers, audio and video from participants, signals from control surfaces (e.g., audio, lighting, show control), and more. One or more AIMs extract features from participants (i.e., the audience) and performers which are output as Participant and Scene Descriptors. These Descriptors are further interpreted by Performance and Participant Status AIMs to determine the Cue Point in the show (according to the Script) and Participants Status (in general, an assessment of the audience’s reactions).

Figure 1 – Live theatrical stage performance architecture (AI Modules shown in green)

Likewise, data from the Show Control computer or control surface, consoles for audio, DJ, VJ, lighting and FX (typically commanded by operators) – if needed – are interpreted by the Operator Command Interpreter AIM and output as Interpreted Operator Control. The Action Generation AIM accepts Participant Status, Cue Point and Interpreted Operator Controls and uses them to direct action in both the Real and Virtual Environments via Scene and Action Descriptors. These general descriptors are converted into actionable commands (e.g., DMX, MIDI, USD) required by the Real and Virtual Environments – according to their Venue Specifications – to enable multisensory Experience Generation in both the Real and Virtual Environments. In this manner, the desired experience can automatically be adapted to a variety of specific real and virtual venue instances.

MPAI is seeking proposals of technologies that enable the implementation of standard components (AI Modules) to make real the vision described above. The deadline for submitting a response is November 20 at 23:59 UTC. See the published documents:

Those intending to submit a response should familiarise with the following documents:

Call for Technologies html,  pdf
Use Cases and Functional Requirements htmlpdf
Framework Licence htmlpdf
Template for responses html, docx

See the video recordings (YouTubeWimTV) and the slides from the presentation made on 12 September. Read What is the XR Venues – Live Theatrical Stage Performance Call for Technologies about?

 


An overview of MPAI Metaverse Model (MPAI-MMM) – Architecture

There is no common definition of the word metaverse. MPAI characterises the word Metaverse Instance (M-Instance) as a system that captures data from the real world (in the following, called Universe), processes it, and combines it with internally generated data to create virtual environments that users can interact with.

Examples of M-Instances are plentiful. So far, their developers have made technology decisions that best responded to their needs, often without considering the choices that other developers might have made for similar purposes. Recently, however, many have expressed concerns that “walled gardens” do not fully exploit the opportunities offered by current and expected technologies and have called for M-Instances to be “interoperable”.

MPAI has studied the issue of M-Instance interoperability. What can be called direct interoperability is the most desirable and effective but metaverse technologies are still rapidly evolving. Mediated interoperability is a solution that lets M-Instance implementers make their own technology decisions assigning to a conversion service the task of converting data from one format to another. The practical use of mediated interoperability is limited because the data semantics of different M-Instances may greatly vary. As indicated in Figure 1, dataA.1 of M-InstanceA may be converted to dataB.1 and dataB.2 but a part of dataA.1 may not be converted at all.

Figure 1 – Data conversion between M-Instances

If technology standardisation is too early and data conversion not a solution, a standard for the functional requirements of data can provide a form of interoperability. After publishing two Technical Reports on Functionalities and Functionality Profiles, MPAI is now publishing MPAI-Metaverse Model (MPAI-MMM) – Architecture Technical Specification. M-Instances can interoperate if they:

  1. Rely on the same Operation Model.
  2. Use:
    • The same Profile specified by MPAI-MMM – Architecture, and
    • Either the same Technologies, or
    • Independent Technologies while accessing appropriate Conversion Services.

The above numbered list is contained in the first normative chapter of the standard – Scope. The next chapter – Terms and Definitions – collecting a large number (about 100) of specified Terms is also normative.

The next chapter – Functional Requirements –is informative. It contains thoroughly revised and integrated Functionalities of the Technical Report and provides the basic elements on which the next (normative) chapter – M-Instance Operation – is based. The main elements of this chapter are:

  1. The definition of an M-Instance as a set of Processes performing Actions on Items at Locations and Times.
  2. Specifies the minimum metadata format of Processes, Actions, and Items (metadata are extensible to cope with specific cases).
  3. Specifies the protocol whereby a Process can request another Process – potentially in another M-Instance – to perform Action on Items.
  4. Specifies four types of Process: Device, User, Service, and App.
  5. Renders Users as Personae.

Figure 1 graphically represents the last two points: Devices are Processes connecting the real world with the M-Instance, Users are Processes representing and acting on behalf of a registered human, and Apps are Processes running on a Device. Personae are rendered Users.

Figure 2 – Universe, Metaverse, Human, Device, User, Persona, App

The following (normative) chapter identifies the Actions, Items, and Data Types provides their Functional Requirements.

The following (informative) chapter uses the MPAI Metaverse Use Case Description Language to describe nine Use Cases:

1.     Virtual Lecture

2.     Virtual Meeting

3.     Hybrid working

4.     eSports Tournament

5.     Virtual performance

6.     AR Tourist Guide

7.     Virtual Dance

8.     Virtual Car Showroom.

9.     Drive a Connected Autonomous Vehicle.

 

All relevant elements of the M-Instance Operation Model, and Actions, Items, and Data Types have been used. The nine Use Cases do not need more elements than those specified in the MPAI-MMM – Architecture standard.

The last (normative) chapter – Functional Profiles – revisits and extends the four profiles of different complexity identified as examples in the Functionality Profiles Technical Report. The Baseline Profile supports basic forms of lecture, meeting, and hang-out, the Finance Profile enables a User to post assets and make transactions, the Management Profile adds the management of registration and rights to the Baseline Profile, and the High Profile supports all currently identified Functionalities.

The MPAI-MMM – Architecture Working Draft (htmlpdf)  is published with a request for Community Comments. See video recordings (YouTubeWimTV) of the presentation made on 1st September 2023. Send to the MPAI Secretariat by 2023/09/21T23:59 UTC. MPAI will use the Comments received to develop the final draft planned to be published at the 36th General Assembly (29 September 2023).


An overview of Multimodal Conversation (MPAI-MMC) V2

The goal of the Multimodal Conversation (MPAI-MMC) standard is to provide technologies that enable a human-machine conversation that is more human-like, richer in content, and able to emulate human-human conversation in completeness and intensity.

By learning from human interaction, machines can improve their “conversational” capabilities in the two main phases of conversation: understanding the meaning of an element and generation of a pertinent response.

Multimodal Conversation Version 2 achieves this goal by providing, among other technologies, a new standard data type – Personal Status – that represents the “internal status” of a conversing human conveyed by text, speech, face, and gesture. The same Personal Status data type is used by the machine to represent its own internal status as if it were a human.

Currently, Personal Status is composed of three Personal Status Factors:

  1. Emotion, the coded representation of the internal state resulting from the interaction of a human or avatar with the Environment or subsets of it, such as “Angry”, “Sad”, “Determined”.
  2. Cognitive State, the coded representation of the internal state reflecting the way a human or avatar understands the Environment, such as “Confused”, “Dubious”, “Convinced”.
  3. Social Attitude, the coded representation of the internal state related to the way a human or avatar intends to position vis-à-vis the environment, e.g., “Respectful”, “Confrontational”, “Soothing”.

Each Factor is represented by a standard set of labels and associated semantics with 2 tables:

  1. Table 1: Label Set contains descriptive labels relevant to the Factor in a three-level format:
    1. The CATEGORIES column specifies the relevant categories using nouns (e.g., “ANGER”).
    2. The GENERAL ADJECTIVAL column gives adjectival labels for general or basic labels within a category (e.g., “angry”).
    3. The SPECIFIC ADJECTIVAL column gives more specific (sub-categorised) labels in the relevant category (e.g., “furious”).
  2. Table 2: Label Semantics provides the semantics for each label in the GENERAL ADJECTIVAL and SPECIFIC ADJECTIVAL columns of the Label Set Table. For example, for “angry” the semantic gloss is “emotion due to perception of physical or emotional damage or threat.”

The mission of the international, unaffiliated, non-profit Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) Standards Developing Organisation is to develop AI-enabled data coding standards. MPAI believes that its standards should enable humans to select machines whose internal operation they understand to some degree, rather than machines that are just “black boxes” resulting from unknown training with unknown data. Thus, an implemented MPAI standard breaks up monolithic AI applications, yielding a set of interacting components with identified data whose semantics is known, as far as possible. The AI Framework (MPAI-AIF) standards (html, pdf) specifies an environment where AI Workflows (AIW) composed of AI Modules (AIM) are executed in environments implemented as an AI Framework (AIF).

Figure 1 – Reference Model of AI Framework (AIF)

MPAI-MMC has defined two “Composite AI Modules (AIM)” that achieve the goal of representing the internal state of the conversing human and machine (or machine and machine). The first is called “Personal Status Extraction (PSE)” and is represented by Figure 2.

Figure 2 – Personal Status Extraction

PSE computes the Descriptors of each considered Modality – Text, Speech, Face, and Gesture. The Personal Status embedded in the Modality is obtained by interpreting the Modality Descriptors. The need of several MPAI Use Cases has suggested that Descriptors may be computed outside of the PSE and provided by another AIM, obviously with the same semantics. This is signalled by Input Selection. Personal Status Combination is an AIM that integrates the Personal Status of the four Modalities in a standard Personal Status Format.

The second Composite AIM – “Personal Status Display (PSD)” is used to convert the Text produced by the Machine with the associated Personal Status and is represented in Figure 3.

Figure 3 – Personal Status Display

Machine Text is passed as PSD output and synthesised as Speech using the Personal Status provided by PS-Speech. Face Descriptors are produced using Machine Speech and PS-Face. Body Descriptors are produced using Avatar Model, PS-Gesture, and Text. Avatar Descriptors, the combination of Face Descriptors and Body Descriptors are produced by the Avatar Description AIM. The ready-to-render Machine Avatar is produced by Avatar Synthesis.

Input Selection is used to indicate whether the PSD should produce Avatar Descriptors or ready-to-render Machine Avatar.

Let’s now see how the definition of the PSE and PSD Composite AIMs enables a compact representation of human-machine conversation use cases by considering the reference model of Conversation with Personal Status use case depicted in Figure 4.

Figure 4 – Conversation with Personal Status

The visual frontend (Visual Scene Description) describes the visual scene providing the Descriptors of the Face and the Body of the human and a digital representation of the objects in the scene. The Audio frontend (Audio Scene Description) separates the speech from other sound sources. Spatial Object Identification analyses the arm, hand, and fingers of the human to identify which of the objects the human refers to in a conversation. Audio-Visual Alignment assigns identifiers to audio, visual, and audio-visual objects. Speech Recognition, Language Understanding, and Dialogue Processing are the usual elements of the conversation chain. In MPAI-MMC, however, Dialogue Processing deals with additional information in the form of Object ID and Personal Status of the human. Therefore, Dialogue Processing is requested not only to produce Machine Text in response to the human, but also its own Personal Status. Both Text and Personal Status are provided to the Personal Status Display Composite AIM that provides the full multimodal response as Machine Text, Machine Speech, and Machine Avatar.

What does Multimodal Conversion specify?

  1. Technologies required to analyse the text and/or the speech and other non-verbal components exchanged in human-machine and machine-machine conversation.
  2. Use Cases that apply the technologies:
    • Conversation with Personal Status.
    • Conversation with Emotion.
    • Multimodal Question Answering.
    • Conversation About a Scene.
    • Human-CAV Interaction.
    • Virtual Secretary for Videoconference
    • Text and Speech Translation (one way, two ways, one to many)

Each Use Case normatively defines:

  1. The Functions of the AIW and (Composite) AIMs implementing the Use Case.
  2. The Connections between and among the AIMs.
  3. The Semantics and the Formats of the input and output data of the AIW and the AIMs.
  4. The JSON Metadata of the AIW.

Each AIM normatively defines:

  1. The Functions of the AIM.
  2. The Connections between and among the AIMs in case the AIM is Composite.
  3. The Semantics and the Formats of the input and output data of the AIMs.
  4. The JSON Metadata of the AIMs.

MPAI deals with all aspects of “AI-enabled Data Coding” in a unified way. Some of the data formats required by the MPAI-MMC Use Cases are specified by MPAI-MMC and some data formats by the standards produced by other MPAI groups. For instance, the Face and Body Descriptors are specified by the Avatar Representation and Animation standard and the “Spatial Object Identification” Composite AIM is specified by “Object and Scene Description” (MPAI-OSD).

The MPAI-MMC Version 2 Working Draft (html, pdf) is published with a request for Community Comments. Comments should be sent to the MPAI Secretariat by 2023/09/25T23:59 UTC. MPAI will use the Comments received to develop the final draft planned to be published at the 36th General Assembly (29 September 2023). An online presentation of the WD will be held on September 05 at 08 and 15 UTC. Register here for the 08 UTC and here for the 15 UTC presentations.


The MPAI standards portfolio

1          Introduction

Established in September 2020 with its mission defined by the MPAI Statutes as promotion of efficient data use by developing AI-enabled data coding standards and bridging the gap between MPAI standards and their practical use through Intellectual Property Rights Guidelines, MPAI has developed nine Technical Specifications. Purpose of this document is to provide a short overview of the nine Technical Specifications.

1       Introduction

2       AI Framework (MPAI-AIF)

2.1        Version 1

2.2        Version 2

3       Avatar Representation and Animation (MPAI-ARA

4       Context-based Audio Enhancement (MPAI-CAE)

4.1        Version 1

4.2        Version 2

5       Connected Autonomous Vehicle (MPAI-CAV)

6      Compression and understanding of industrial data (MPAI-CUI)

7       Governance of the MPAI Ecosystem (MPAI-GME)

8       Multimodal Conversation (MPAI-MMC)

8.1        Version 1

8.2        Version 2

9       MPAI Metaverse Model (MPAI-MMM) – Architecture

10          Neural Network Watermarking (MPAI-NNW)

2          AI Framework (MPAI-AIF)

MPAI believes that its standards should enable humans to select machines whose internal operation they understand to some degree, rather than machines that are “black boxes” resulting from unknown training with unknown data. Thus, an implemented MPAI standard breaks up monolithic AI applications, yielding a set of interacting components (AI Modules) with identified data. AIMs are combined in workflows (AIW), and exchange data with a known semantics to the extent possible, improving explainability of AI applications but also promoting a competitive market of components with standard interfaces, possibly with improved performance compared to other implementations.

Artificial Intelligence Framework (MPAI-AIF) is the standard that implements this vision by enabling creation and automation of mixed Artif­icial Intelligence – Machine Learning – Data Processing workflows.

2.1        Version 1

Figure 1 shows the MPAI-AIF V1 Reference Model.

Figure 1 – Reference model of the MPAI AI Framework (MPAI-AIF) V1

The MPAI-AIF Technical Specification V1 and Reference Software is available here.

2.2        Version 2

MPAI-AIF V1 assumed that the AI Framework was secure but did not provide support to developers wishing to execute an AI application in a secure environment. MPAI-AIF V2 responds to this requirement. As shown in Figure 1, the standard defines a Security Abstraction Layer (SAL). By accessing the SAL APIs, a developer can indeed create the required level of security with the desired functionalities.

Figure 2 – Reference model of the MPAI AI Framework (MPAI-AIF) V2

MPAI ha published a Working Draft of Version 2 (html, pdf) requesting Community Comments. Comments should be sent to the MPAI Secretariat by 2023/09/24T23:59 UTC. MPAI-AIF V2 is expected to be published on 29 September 2023.

3          Avatar Representation and Animation (MPAI-ARA)

In most cases the underlying assumption of computer-created objects called “digital humans”, i.e., digital objects that can be rendered to show a human appearance has been that creation, animation, and rendering is done in a closed environment. Such digital humans used to have little or no need for standards. However, in a communication and even more in a metaverse context, there are many cases where a digital human is not constrained within a closed environment. For instance, a client may send data to a remote client that should be able to unambiguously interpret and use the data to reproduce a digital human as intended by the transmitting client.

These new usage scenarios require forms of standardisation. Technical Specification: Avatar Representation and Animation (MPAI-ARA) is a first response to the need of a user wishing to enable their transmitting client to send data that a remote client can interpret to render a digital human, having the intended body movement and facial expression faithfully represented by the remote client.

Figure 2 is the system diagram of the Avatar-Based Videoconference Use Case enabled by MPAI-ARA.

Figure 2 – System diagram of ARA-ABV

Figure 3 is the Reference Model of the Transmitting Client.

Figure 3 – Reference Model of the ARA-ABV Transmitting Client

The MPAI-ARA Working Draft (htmlpdf) is published with a request for Community Comments stage and expected to be published on 29 September 2023.

4          Context-based Audio Enhancement (MPAI-CAE)

Context-based Audio Enhancement (MPAI-CAE) uses AI to improve the user experience for several audio-related applications including entertainment, communication, teleconferencing, gaming, post-production, restoration etc. in a variety of contexts such as in the home, in the car, on-the-go, in the studio etc. using context information to act on the input audio content.

4.1        Version 1

Figure 4 is the reference model of Unidirectional Speech Translation, a Use Case developed by Version 1.

Figure 4 – An MPAI-CAE Use Case: Emotion-Enhanced Speech

The MPAI-AIF Technical Specification V1 and Reference Software is available here.

4.2        Version 2

MPAI has developed the specification of the Audio Scene Description Composite AIM as part of the MPAI-CAE V2 standard (Figure 5 depicts the architecture of the CAE-ASD Composite AIM.

Figure 5 – Audio Scene Description Composite AIM

The MPAI-AIF Technical Specification V2 is available here.

5          Connected Autonomous Vehicle (MPAI-CAV)

Connected Autonomous Vehicle (CAV) is a project addressing the Connected Autonomous Vehicle (CAV) domain and the 5 main operating instances of a CAV:

  1. Human-CAV interaction (HCI) responds to humans’ commands and queries, senses human activities in the CAV passenger compartment and activates other subsystems as required by humans or as deemed necessary under the identified conditions.
  2. CAV-Environment interaction acquires information from the physical environment via a variety of sensors and creates a representation of the environment.
  3. Autonomous Motion Subsystem (AMS) uses different sources of information – ESS, other CAVs, Roadside Units, etc. – to improve the CAV’s understanding of the environment and instructs the CAV how to reach the intended destination.
  4. Motion Actuation Subsystem (MAS) is the subsystem that operates and actuates the motion instructions in the environment.

The interaction of the 4 subsystems is depicted in Figure 7.

Figure 7 – The MPAI-CAV subsystems

The CAV-HCI subsystem (Figure 8) is specified as an MPAI-MMC V2 Use Case.

Figure 8 – Reference Model of the Human-CAV Interaction Subsystem

The MPAI-CAV – Architecture Working Draft (htmlpdf) is published with a request for Community Comments and expected to be published on 29 September 2023.

6          Compression and understanding of industrial data (MPAI-CUI)

Compression and understanding of industrial data (MPAI-CUI) aims to enable AI-based filtering and extraction of key information to predict the performance of a company by applying Artificial Intellig­ence to governance, financial and risk data. This is depicted in Figure 6.

Figure 6 – The MPAI-CUI Company Performance Prediction Use Case

The set of specifications composing the MPAI-CUI standard is available here.

7          Governance of the MPAI Ecosystem (MPAI-GME)

Governance of the MPAI Ecosystem (MPAI-GME) is a foundational MPAI standard specifying the operation of the Ecosystem enabling:

  1. Implementers to develop components and solutions.
  2. Performance Assessors to assess the Performance of an Implementation.
  3. The MPAI Store to Test an Implementation for Conformance and post the Implementation to the MPAI Store website together with the results of Performance Assessment.
  4. End Users to download Implementations and report Experience Scores.

Figure 9 depicts the operation of the MPAI Ecosystem.

Figure 9 – Governance of the MPAI Ecosystem.

8          Multimodal Conversation (MPAI-MMC)

Multi-modal conversation (MPAI-MMC) aims to enable human-machine conversation that emul­ates human-human conversation in completeness and intensity by using AI. Its Use Cases are is an implementation of the MPAI “make available explainable AI applications” vision for human-machine conversation.

8.1        Version 1

Technical Specification: Multimodal Conversation (MPAI-MMC) V1 includes five Use Cases Conversation with emotion, Multimodal Question Answering (QA) and 3 Automatic Speech Translation.

Figure 10 depicts the Reference Model of the Conversation with Emotion Use Case.

Figure 10 – An MPAI-MMC V1 Use Case: Conversation with Emotion

The MPAI-MMC Technical Specification V1.2 is available here.

8.2        Version 2

Extending the role of emotion as introduced in Version 1.2 of the standard, MPAI-MMC V2 introduces Personal Status, i.e., the internal status of humans that a machine needs to estimate and that it artificially creates for itself with the goal of improving its conversation with the human or even with another machine. Personal Status is applied to several new Multi-modal conversation V2 (MPAI-MMC V2) Use Cases: Conversation About a Scene, Virtual Secretary for Videoconference, and Human-Connected Autonomous Vehicle Interaction.

Figure 11 is the reference model of the Conversation About a Scene (CAS) Use Case.

Figure 11 – An MPAI-MMC V2 Use Case: Conversation About a Scene

Figure 12 gives the Reference Model of a second use case: Virtual Secretary (used in the Avatar-Based Videoconference Use Case).

Figure 12 – Reference Model of Virtual Secretary for Videoconference

The MPAI-MMC Version 2 Working Draft (html, pdf) is published with a request for Community Comments and expected to be finally approved on 29 September 2023.

9          MPAI Metaverse Model – Architecture (MPAI-MMM)

MPAI characterises the metaverse as a system that captures data from the real world, processes it, and combines it with internally generated data to create virtual environments that users can interact with.

MPAI Metaverse Model (MMM) targets a series of Technical Reports and Specifications promoting Metaverse Interoperability. Two MPAI Technical Reports. on Metaverse Functionalities and Functionality Profiles, have laid down the groundwork. With the Technical Specification – MPAI Metaverse Model – Architecture, MPAI provides initial Interoperability tools by specifying the Functional Requirements of Processes, Items, Actions, and Data Types that allow two or more Metaverse Instances to Interoperate, possibly via a Conversion Service, if they implement the Technical Specification’s Operation Model and produce Data whose Format complies with the Specification’s Functional Requirements.

Figure 13 depicts one aspect of the Specification where a Process in a Metaverse Instance requests a Process in another Metaverse Instance to perform an Action by relying on the Instances’ Resolution Service.

Figure 13 – Resolution and Conversion Services

The MPAI-MMM – Architecture Working Draft (htmlpdf)  is published with a request for Community Comments level and expected to be finally approved on 29 September 2023.

10     Neural Network Watermarking (MPAI-NNW)

Neural Network Watermarking is a standard whose purpose is to enable watermarking technology providers to qualify their products by providing the means to measure, for a given size of the watermarking payload, the ability of:

  • The watermark inserter to inject a payload without deteriorating the NN performance.
  • The watermark detector to recognise the presence of the inserted watermark when applied to:
    • A watermarked network that has been modified (e.g., by transfer learning or pruning)
    • An inference of the modified model.
  • The watermark decoder to successfully retrieve the payload when applied to:
    • A watermarked network that has been modified (e.g., by transfer learning or pruning)
    • An inference of the modified model.
  • The watermark inserter to inject a payload at a measured computational cost in a given processing environment.
  • The watermark detector/decoder to detect/decode a payload from a watermarked model or from any of its inferences, at a measurable computational cost, e.g., execution time in a given processing environment.

Figure 14 depicts the configuration of one particular use of MPAI-NNW.

Figure 14 – Robustness evaluation of a Neural Network

The Neural Network Watermarking Technical Specification is available here.

 


MPAI issues three Calls for Technologies and publishes five standards for Community Comments

Geneva, Switzerland – 23 August 2023. Today, the international, non-profit, and unaffiliated Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) organisation developing AI-based data coding standards has concluded its 35th General Assembly (MPAI-35) approving the publication of three Calls for Technologies and five Technical Reports with request for Community Comments. The table gives links to the documents, dates of and registration links to online presentations and deadlines for submitting responses to Calls and comments on Technical Specifications.

Call for Technologies Link Presentation Deadline
AI for Health Data (AIH) X Sep 08 08 & 15 Oct 19 23:59
Object and Scene Description (OSD) X Sep 07 09 & 16 Sep 20 23:59
XR Venues – Live Theatrical Stage Performance (XRV) X Sep 12 07 & 17 Nov 20 23:59
Standard for Community Comments   Presentation Deadline
AI Framework (AIF) V2 X Sep 11 08 & 15 Sep 24 23:59
Avatar Representation and Animation (ARA) X Sep 07 08 & 15 Sep 27 23:59
Connected Autonomous Vehicles – Architecture (CAV) X Sep 06 08 & 15 Sep 26 23:59
Multimodal Conversation (MMC) V2 X Sep 05 08 & 15 Sep 25 23:59
MPAI Metaverse Model – Architecture (MMM) X Sep 01 08 & 15 Sep 21 23:59

Additional information about the purpose of the projects can be found here.
Anybody may respond to any of the three Calls for Technologies. However, non-members should join MPAI to participate in the development of the relevant standards.
Anybody can make comments on the Technical Specifications published with a request for Community Comments.
MPAI is continuing its work plan that includes the development of the following Technical Specifications:

  • AIF-DC, the group in charge of AI Framework (MPAI-AIF), is now working on the review of comments made on MPAI-AIF V2, developing the reference software and drafting the conformance testing.
  • Requirements (ARA), the group in charge of Avatar Representation and Animation (MPAI-ARA), is now working on the review of comments made on MPAI-AIF V2, developing the reference software and drafting the conformance testing.
  • MMC-DC, the group in charge of Multimodal Conversation (MPAI-MMC), is now working on the review of comments made on MPAI-AIF V2, developing the reference software and drafting the conformance testing.
  • Requirements (MMM), the group in charge of MPAI Metaverse Model (MPAI-MMM) – Architecture, is now working on the review of comments made on MPAI-AIF V2, developing the reference software and drafting the conformance testing.

The MPAI work plan also includes exploratory activities, some of which are close to becoming standard or technical report projects:

  • AI Health (MPAI-AIH). Targets an architecture where smartphones store users’ health data processed using AI and AI Models are updated using Federated Learning.
  • End-to-End Video Coding (MPAI-EEV). Extends the video coding frontiers using AI-based End-to-End Video coding.
  • AI-Enhanced Video Coding (MPAI-EVC). Improves existing video coding with AI tools for short-to-medium term applications.
  • Server-based Predictive Multiplayer Gaming (MPAI-SPG). Uses AI to train neural networks that help an online gaming server to compensate data losses and detects false data.
  • XR Venues (MPAI-XRV). Identifies common AI Modules used across various XR-enabled and AI-enhanced use cases where venues may be both real and virtual.

Legal entities and representatives of academic departments supporting the MPAI mission and able to contribute to the development of standards for the efficient use of data can become MPAI members.
Please visit the MPAI website, contact the MPAI secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook, Instagram, and YouTube.


Do we need standards for Connected Autonomous Vehicles?

Enabling individuals or groups of people to move independently has been a major achievement that has changed human life for the better. Motor vehicles, however, have created a number of negative consequences such as, accidents causing damages, injuries, and deaths; congestion on the roads, millions of cars carrying a single person for a couple of ours and then staying, unused; air pollution, worsening of urban environments, etc.

Connected autonomous vehicles (CAV) have the potential to eliminate human error replacing it with a rate of machine errors orders of magnitude lower, optimise use of vehicles and infrastructure, give more time to human brains for rewarding activities, optimise traffic management, reduce congestion and pollution, help the elderly or disabled people to have a better life, and more.

Much has been happening since the first 1939 attempt at creating an autonomous vehicle. Today CAVs are technically feasible, and prototypes are driving on public roads and streets. The Society of Automotive Engineers in the USA has published a classification of autonomous vehicles based on levels.

Should we just wait for the industry to produce higher SAE-Level vehicles until one day we will only see CAVs around us? This is an option, but not necessarily the one that will let us reach the CAV holy grail in the most efficient and timely way.

Some 35 years ago, most public authorities, “owners” of their countries’ VHF and UHF bands, realised that digital television would allow them to keep their cherished terrestrial television service while getting a “digital dividend” in the form of VHF and UHF slots and re-assign them to other purposes. Especially in the United States, digital television was a national goal and steps were made to implement it. Some enlightened people understood the value of a global digital television standard (MPEG-2) and thing simply “happened”, not just for terrestrial, but also for ratellite and cable television, and packaged media as well.

Of course, cars are not television sets, but the game-changing role of standards can be the same. Standards can convert today’s niche market of CAVs (if we can call it a “market”) into a mass market. It can accelerate the availability of technology, promote competition, yield better and cheaper products, assuage consumer concerns, and provide tools for regulation.

Artificial Intelligence (AI) is the technology that can provide the solutions we need. MPAI can provide AI-based standards that are explainable.

MPAI intends to publish a standard called Connected Autonomous Vehicle (MPAI-CAV) – Architecture. This will enable component manufacturers to put their standard components on the market and car manufacturers to access an open global market of components with standard functions and interfaces that can be tested for conformance using standard procedures.

Register for one of the two online presentations on July/26 at 8 UTC and 15 UTC or read an overview of MPAI-CAV – Architecture.


MPAI issues Call for Technologies: Connected Autonomous Vehicle – Architecture

Geneva, Switzerland – 12 July 2023. Today, the international, non-profit, and unaffiliated Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) organisation developing AI-based data coding standards has concluded its 34th General Assembly (MPAI-34) approving the Call for Technologies: Connected Autonomous Vehicle (MPAI-CAV) – Architecture. Two online presentations of the Call will be made on 26 July at 8 and 15 UTC. Responses are due by 15 August.

The goal of the MPAI-CAV standard is to promote the development of a CAV industry by specifying components that can be easily integrated into larger subsystems. To achieve this goal, MPAI intends to develop the MPAI-CAV standard as a series of standards each adding more details to enhance CAV component interoperability. The first issue, MPAI-CAV – Architecture, to be developed using the results of the Call, aims to partition CAVs into subsystems and to further partition those subsystems into components. Both subsystems and components are identified by their function and interfaces, i.e., data exchanged between subsystems and components.

Three documents are attached to the Call: the first is Use Cases and Functional Requirements. It includes an initial set of Functionalities that the Architecture should provide.

The second document is the Framework Licence designed to facilitate the timely access to IP that is essential to implement the planned MPAI-CAV – Architecture standard. Finally, the third document is a Template for responses that respondents to the Call may wish to use in their responses.

Anybody may respond to the Call. However, non-members should join MPAI to participate in the development of the MPAI-CAV – Architecture standard.

MPAI is continuing its work plan comprising the development of the following Technical Specifications:

  1. The AI Framework (MPAI-AIF) V2 Technical Specification will enable an implementer to establish a secure AIF environment to execute AI Workflows (AIW) composed of AI Modules (AIM).
  2. The Avatar Representation and Animation (MPAI-ARA) V1 Technical Specification will support creation and animation of interoperable human-like avatar models able to understand and express a Personal Status.
  3. The Multimodal Conversation (MPAI-MMC) V2 Technical Specification will generalise the notion of Emotion by adding Cognitive State and Social Attitude and specify a new data type called Personal Status.
  4. The MPAI Metaverse Model (MPAI-MMM) – Architecture V1 Technical Specification will specify the Operation Model and its components Actions, Items, and Data Types.

The MPAI work plan also includes exploratory activities, some of which are close to becoming standard or technical report projects:

  1. AI Health (MPAI-AIH). Targets an architecture where smartphones store users’ health data processed using AI and AI Models are updated using Federated Learning.
  2. End-to-End Video Coding (MPAI-EEV). Extends the video coding frontiers using AI-based End-to-End Video coding.
  3. AI-Enhanced Video Coding (MPAI-EVC). Improves existing video coding with AI tools for short-to-medium term applications.
  4. Server-based Predictive Multiplayer Gaming (MPAI-SPG). Uses AI to train neural networks that help an online gaming server to compensate data losses and detects false data.
  5. XR Venues (MPAI-XRV). Identifies common AI Modules used across various XR-enabled and AI-enhanced use cases where venues may be both real and virtual.

Legal entities and representatives of academic departments supporting the MPAI mission and able to contribute to the development of standards for the efficient use of data can become MPAI members.

Please visit the MPAI website, contact the MPAI secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook, Instagram, and YouTube.