Moving Picture, Audio and Data Coding
by Artificial Intelligence

Archives: 2022-01-26

An overview of Multimodal Conversation (MPAI-MMC) V2

The goal of the Multimodal Conversation (MPAI-MMC) standard is to provide technologies that enable a human-machine conversation that is more human-like, richer in content, and able to emulate human-human conversation in completeness and intensity.

By learning from human interaction, machines can improve their “conversational” capabilities in the two main phases of conversation: understanding the meaning of an element and generation of a pertinent response.

Multimodal Conversation Version 2 achieves this goal by providing, among other technologies, a new standard data type – Personal Status – that represents the “internal status” of a conversing human conveyed by text, speech, face, and gesture. The same Personal Status data type is used by the machine to represent its own internal status as if it were a human.

Currently, Personal Status is composed of three Personal Status Factors:

  1. Emotion, the coded representation of the internal state resulting from the interaction of a human or avatar with the Environment or subsets of it, such as “Angry”, “Sad”, “Determined”.
  2. Cognitive State, the coded representation of the internal state reflecting the way a human or avatar understands the Environment, such as “Confused”, “Dubious”, “Convinced”.
  3. Social Attitude, the coded representation of the internal state related to the way a human or avatar intends to position vis-à-vis the environment, e.g., “Respectful”, “Confrontational”, “Soothing”.

Each Factor is represented by a standard set of labels and associated semantics with 2 tables:

  1. Table 1: Label Set contains descriptive labels relevant to the Factor in a three-level format:
    1. The CATEGORIES column specifies the relevant categories using nouns (e.g., “ANGER”).
    2. The GENERAL ADJECTIVAL column gives adjectival labels for general or basic labels within a category (e.g., “angry”).
    3. The SPECIFIC ADJECTIVAL column gives more specific (sub-categorised) labels in the relevant category (e.g., “furious”).
  2. Table 2: Label Semantics provides the semantics for each label in the GENERAL ADJECTIVAL and SPECIFIC ADJECTIVAL columns of the Label Set Table. For example, for “angry” the semantic gloss is “emotion due to perception of physical or emotional damage or threat.”

The mission of the international, unaffiliated, non-profit Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) Standards Developing Organisation is to develop AI-enabled data coding standards. MPAI believes that its standards should enable humans to select machines whose internal operation they understand to some degree, rather than machines that are just “black boxes” resulting from unknown training with unknown data. Thus, an implemented MPAI standard breaks up monolithic AI applications, yielding a set of interacting components with identified data whose semantics is known, as far as possible. The AI Framework (MPAI-AIF) standards (html, pdf) specifies an environment where AI Workflows (AIW) composed of AI Modules (AIM) are executed in environments implemented as an AI Framework (AIF).

Figure 1 – Reference Model of AI Framework (AIF)

MPAI-MMC has defined two “Composite AI Modules (AIM)” that achieve the goal of representing the internal state of the conversing human and machine (or machine and machine). The first is called “Personal Status Extraction (PSE)” and is represented by Figure 2.

Figure 2 – Personal Status Extraction

PSE computes the Descriptors of each considered Modality – Text, Speech, Face, and Gesture. The Personal Status embedded in the Modality is obtained by interpreting the Modality Descriptors. The need of several MPAI Use Cases has suggested that Descriptors may be computed outside of the PSE and provided by another AIM, obviously with the same semantics. This is signalled by Input Selection. Personal Status Combination is an AIM that integrates the Personal Status of the four Modalities in a standard Personal Status Format.

The second Composite AIM – “Personal Status Display (PSD)” is used to convert the Text produced by the Machine with the associated Personal Status and is represented in Figure 3.

Figure 3 – Personal Status Display

Machine Text is passed as PSD output and synthesised as Speech using the Personal Status provided by PS-Speech. Face Descriptors are produced using Machine Speech and PS-Face. Body Descriptors are produced using Avatar Model, PS-Gesture, and Text. Avatar Descriptors, the combination of Face Descriptors and Body Descriptors are produced by the Avatar Description AIM. The ready-to-render Machine Avatar is produced by Avatar Synthesis.

Input Selection is used to indicate whether the PSD should produce Avatar Descriptors or ready-to-render Machine Avatar.

Let’s now see how the definition of the PSE and PSD Composite AIMs enables a compact representation of human-machine conversation use cases by considering the reference model of Conversation with Personal Status use case depicted in Figure 4.

Figure 4 – Conversation with Personal Status

The visual frontend (Visual Scene Description) describes the visual scene providing the Descriptors of the Face and the Body of the human and a digital representation of the objects in the scene. The Audio frontend (Audio Scene Description) separates the speech from other sound sources. Spatial Object Identification analyses the arm, hand, and fingers of the human to identify which of the objects the human refers to in a conversation. Audio-Visual Alignment assigns identifiers to audio, visual, and audio-visual objects. Speech Recognition, Language Understanding, and Dialogue Processing are the usual elements of the conversation chain. In MPAI-MMC, however, Dialogue Processing deals with additional information in the form of Object ID and Personal Status of the human. Therefore, Dialogue Processing is requested not only to produce Machine Text in response to the human, but also its own Personal Status. Both Text and Personal Status are provided to the Personal Status Display Composite AIM that provides the full multimodal response as Machine Text, Machine Speech, and Machine Avatar.

What does Multimodal Conversion specify?

  1. Technologies required to analyse the text and/or the speech and other non-verbal components exchanged in human-machine and machine-machine conversation.
  2. Use Cases that apply the technologies:
    • Conversation with Personal Status.
    • Conversation with Emotion.
    • Multimodal Question Answering.
    • Conversation About a Scene.
    • Human-CAV Interaction.
    • Virtual Secretary for Videoconference
    • Text and Speech Translation (one way, two ways, one to many)

Each Use Case normatively defines:

  1. The Functions of the AIW and (Composite) AIMs implementing the Use Case.
  2. The Connections between and among the AIMs.
  3. The Semantics and the Formats of the input and output data of the AIW and the AIMs.
  4. The JSON Metadata of the AIW.

Each AIM normatively defines:

  1. The Functions of the AIM.
  2. The Connections between and among the AIMs in case the AIM is Composite.
  3. The Semantics and the Formats of the input and output data of the AIMs.
  4. The JSON Metadata of the AIMs.

MPAI deals with all aspects of “AI-enabled Data Coding” in a unified way. Some of the data formats required by the MPAI-MMC Use Cases are specified by MPAI-MMC and some data formats by the standards produced by other MPAI groups. For instance, the Face and Body Descriptors are specified by the Avatar Representation and Animation standard and the “Spatial Object Identification” Composite AIM is specified by “Object and Scene Description” (MPAI-OSD).

The MPAI-MMC Version 2 Working Draft (html, pdf) is published with a request for Community Comments. Comments should be sent to the MPAI Secretariat by 2023/09/25T23:59 UTC. MPAI will use the Comments received to develop the final draft planned to be published at the 36th General Assembly (29 September 2023). An online presentation of the WD will be held on September 05 at 08 and 15 UTC. Register here for the 08 UTC and here for the 15 UTC presentations.


The MPAI standards portfolio

1          Introduction

Established in September 2020 with its mission defined by the MPAI Statutes as promotion of efficient data use by developing AI-enabled data coding standards and bridging the gap between MPAI standards and their practical use through Intellectual Property Rights Guidelines, MPAI has developed nine Technical Specifications. Purpose of this document is to provide a short overview of the nine Technical Specifications.

1       Introduction

2       AI Framework (MPAI-AIF)

2.1        Version 1

2.2        Version 2

3       Avatar Representation and Animation (MPAI-ARA

4       Context-based Audio Enhancement (MPAI-CAE)

4.1        Version 1

4.2        Version 2

5       Connected Autonomous Vehicle (MPAI-CAV)

6      Compression and understanding of industrial data (MPAI-CUI)

7       Governance of the MPAI Ecosystem (MPAI-GME)

8       Multimodal Conversation (MPAI-MMC)

8.1        Version 1

8.2        Version 2

9       MPAI Metaverse Model (MPAI-MMM) – Architecture

10          Neural Network Watermarking (MPAI-NNW)

2          AI Framework (MPAI-AIF)

MPAI believes that its standards should enable humans to select machines whose internal operation they understand to some degree, rather than machines that are “black boxes” resulting from unknown training with unknown data. Thus, an implemented MPAI standard breaks up monolithic AI applications, yielding a set of interacting components (AI Modules) with identified data. AIMs are combined in workflows (AIW), and exchange data with a known semantics to the extent possible, improving explainability of AI applications but also promoting a competitive market of components with standard interfaces, possibly with improved performance compared to other implementations.

Artificial Intelligence Framework (MPAI-AIF) is the standard that implements this vision by enabling creation and automation of mixed Artif­icial Intelligence – Machine Learning – Data Processing workflows.

2.1        Version 1

Figure 1 shows the MPAI-AIF V1 Reference Model.

Figure 1 – Reference model of the MPAI AI Framework (MPAI-AIF) V1

The MPAI-AIF Technical Specification V1 and Reference Software is available here.

2.2        Version 2

MPAI-AIF V1 assumed that the AI Framework was secure but did not provide support to developers wishing to execute an AI application in a secure environment. MPAI-AIF V2 responds to this requirement. As shown in Figure 1, the standard defines a Security Abstraction Layer (SAL). By accessing the SAL APIs, a developer can indeed create the required level of security with the desired functionalities.

Figure 2 – Reference model of the MPAI AI Framework (MPAI-AIF) V2

MPAI ha published a Working Draft of Version 2 (html, pdf) requesting Community Comments. Comments should be sent to the MPAI Secretariat by 2023/09/24T23:59 UTC. MPAI-AIF V2 is expected to be published on 29 September 2023.

3          Avatar Representation and Animation (MPAI-ARA)

In most cases the underlying assumption of computer-created objects called “digital humans”, i.e., digital objects that can be rendered to show a human appearance has been that creation, animation, and rendering is done in a closed environment. Such digital humans used to have little or no need for standards. However, in a communication and even more in a metaverse context, there are many cases where a digital human is not constrained within a closed environment. For instance, a client may send data to a remote client that should be able to unambiguously interpret and use the data to reproduce a digital human as intended by the transmitting client.

These new usage scenarios require forms of standardisation. Technical Specification: Avatar Representation and Animation (MPAI-ARA) is a first response to the need of a user wishing to enable their transmitting client to send data that a remote client can interpret to render a digital human, having the intended body movement and facial expression faithfully represented by the remote client.

Figure 2 is the system diagram of the Avatar-Based Videoconference Use Case enabled by MPAI-ARA.

Figure 2 – System diagram of ARA-ABV

Figure 3 is the Reference Model of the Transmitting Client.

Figure 3 – Reference Model of the ARA-ABV Transmitting Client

The MPAI-ARA Working Draft (htmlpdf) is published with a request for Community Comments stage and expected to be published on 29 September 2023.

4          Context-based Audio Enhancement (MPAI-CAE)

Context-based Audio Enhancement (MPAI-CAE) uses AI to improve the user experience for several audio-related applications including entertainment, communication, teleconferencing, gaming, post-production, restoration etc. in a variety of contexts such as in the home, in the car, on-the-go, in the studio etc. using context information to act on the input audio content.

4.1        Version 1

Figure 4 is the reference model of Unidirectional Speech Translation, a Use Case developed by Version 1.

Figure 4 – An MPAI-CAE Use Case: Emotion-Enhanced Speech

The MPAI-AIF Technical Specification V1 and Reference Software is available here.

4.2        Version 2

MPAI has developed the specification of the Audio Scene Description Composite AIM as part of the MPAI-CAE V2 standard (Figure 5 depicts the architecture of the CAE-ASD Composite AIM.

Figure 5 – Audio Scene Description Composite AIM

The MPAI-AIF Technical Specification V2 is available here.

5          Connected Autonomous Vehicle (MPAI-CAV)

Connected Autonomous Vehicle (CAV) is a project addressing the Connected Autonomous Vehicle (CAV) domain and the 5 main operating instances of a CAV:

  1. Human-CAV interaction (HCI) responds to humans’ commands and queries, senses human activities in the CAV passenger compartment and activates other subsystems as required by humans or as deemed necessary under the identified conditions.
  2. CAV-Environment interaction acquires information from the physical environment via a variety of sensors and creates a representation of the environment.
  3. Autonomous Motion Subsystem (AMS) uses different sources of information – ESS, other CAVs, Roadside Units, etc. – to improve the CAV’s understanding of the environment and instructs the CAV how to reach the intended destination.
  4. Motion Actuation Subsystem (MAS) is the subsystem that operates and actuates the motion instructions in the environment.

The interaction of the 4 subsystems is depicted in Figure 7.

Figure 7 – The MPAI-CAV subsystems

The CAV-HCI subsystem (Figure 8) is specified as an MPAI-MMC V2 Use Case.

Figure 8 – Reference Model of the Human-CAV Interaction Subsystem

The MPAI-CAV – Architecture Working Draft (htmlpdf) is published with a request for Community Comments and expected to be published on 29 September 2023.

6          Compression and understanding of industrial data (MPAI-CUI)

Compression and understanding of industrial data (MPAI-CUI) aims to enable AI-based filtering and extraction of key information to predict the performance of a company by applying Artificial Intellig­ence to governance, financial and risk data. This is depicted in Figure 6.

Figure 6 – The MPAI-CUI Company Performance Prediction Use Case

The set of specifications composing the MPAI-CUI standard is available here.

7          Governance of the MPAI Ecosystem (MPAI-GME)

Governance of the MPAI Ecosystem (MPAI-GME) is a foundational MPAI standard specifying the operation of the Ecosystem enabling:

  1. Implementers to develop components and solutions.
  2. Performance Assessors to assess the Performance of an Implementation.
  3. The MPAI Store to Test an Implementation for Conformance and post the Implementation to the MPAI Store website together with the results of Performance Assessment.
  4. End Users to download Implementations and report Experience Scores.

Figure 9 depicts the operation of the MPAI Ecosystem.

Figure 9 – Governance of the MPAI Ecosystem.

8          Multimodal Conversation (MPAI-MMC)

Multi-modal conversation (MPAI-MMC) aims to enable human-machine conversation that emul­ates human-human conversation in completeness and intensity by using AI. Its Use Cases are is an implementation of the MPAI “make available explainable AI applications” vision for human-machine conversation.

8.1        Version 1

Technical Specification: Multimodal Conversation (MPAI-MMC) V1 includes five Use Cases Conversation with emotion, Multimodal Question Answering (QA) and 3 Automatic Speech Translation.

Figure 10 depicts the Reference Model of the Conversation with Emotion Use Case.

Figure 10 – An MPAI-MMC V1 Use Case: Conversation with Emotion

The MPAI-MMC Technical Specification V1.2 is available here.

8.2        Version 2

Extending the role of emotion as introduced in Version 1.2 of the standard, MPAI-MMC V2 introduces Personal Status, i.e., the internal status of humans that a machine needs to estimate and that it artificially creates for itself with the goal of improving its conversation with the human or even with another machine. Personal Status is applied to several new Multi-modal conversation V2 (MPAI-MMC V2) Use Cases: Conversation About a Scene, Virtual Secretary for Videoconference, and Human-Connected Autonomous Vehicle Interaction.

Figure 11 is the reference model of the Conversation About a Scene (CAS) Use Case.

Figure 11 – An MPAI-MMC V2 Use Case: Conversation About a Scene

Figure 12 gives the Reference Model of a second use case: Virtual Secretary (used in the Avatar-Based Videoconference Use Case).

Figure 12 – Reference Model of Virtual Secretary for Videoconference

The MPAI-MMC Version 2 Working Draft (html, pdf) is published with a request for Community Comments and expected to be finally approved on 29 September 2023.

9          MPAI Metaverse Model – Architecture (MPAI-MMM)

MPAI characterises the metaverse as a system that captures data from the real world, processes it, and combines it with internally generated data to create virtual environments that users can interact with.

MPAI Metaverse Model (MMM) targets a series of Technical Reports and Specifications promoting Metaverse Interoperability. Two MPAI Technical Reports. on Metaverse Functionalities and Functionality Profiles, have laid down the groundwork. With the Technical Specification – MPAI Metaverse Model – Architecture, MPAI provides initial Interoperability tools by specifying the Functional Requirements of Processes, Items, Actions, and Data Types that allow two or more Metaverse Instances to Interoperate, possibly via a Conversion Service, if they implement the Technical Specification’s Operation Model and produce Data whose Format complies with the Specification’s Functional Requirements.

Figure 13 depicts one aspect of the Specification where a Process in a Metaverse Instance requests a Process in another Metaverse Instance to perform an Action by relying on the Instances’ Resolution Service.

Figure 13 – Resolution and Conversion Services

The MPAI-MMM – Architecture Working Draft (htmlpdf)  is published with a request for Community Comments level and expected to be finally approved on 29 September 2023.

10     Neural Network Watermarking (MPAI-NNW)

Neural Network Watermarking is a standard whose purpose is to enable watermarking technology providers to qualify their products by providing the means to measure, for a given size of the watermarking payload, the ability of:

  • The watermark inserter to inject a payload without deteriorating the NN performance.
  • The watermark detector to recognise the presence of the inserted watermark when applied to:
    • A watermarked network that has been modified (e.g., by transfer learning or pruning)
    • An inference of the modified model.
  • The watermark decoder to successfully retrieve the payload when applied to:
    • A watermarked network that has been modified (e.g., by transfer learning or pruning)
    • An inference of the modified model.
  • The watermark inserter to inject a payload at a measured computational cost in a given processing environment.
  • The watermark detector/decoder to detect/decode a payload from a watermarked model or from any of its inferences, at a measurable computational cost, e.g., execution time in a given processing environment.

Figure 14 depicts the configuration of one particular use of MPAI-NNW.

Figure 14 – Robustness evaluation of a Neural Network

The Neural Network Watermarking Technical Specification is available here.

 


MPAI issues three Calls for Technologies and publishes five standards for Community Comments

Geneva, Switzerland – 23 August 2023. Today, the international, non-profit, and unaffiliated Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) organisation developing AI-based data coding standards has concluded its 35th General Assembly (MPAI-35) approving the publication of three Calls for Technologies and five Technical Reports with request for Community Comments. The table gives links to the documents, dates of and registration links to online presentations and deadlines for submitting responses to Calls and comments on Technical Specifications.

Call for Technologies Link Presentation Deadline
AI for Health Data (AIH) X Sep 08 08 & 15 Oct 19 23:59
Object and Scene Description (OSD) X Sep 07 09 & 16 Sep 20 23:59
XR Venues – Live Theatrical Stage Performance (XRV) X Sep 12 07 & 17 Nov 20 23:59
Standard for Community Comments   Presentation Deadline
AI Framework (AIF) V2 X Sep 11 08 & 15 Sep 24 23:59
Avatar Representation and Animation (ARA) X Sep 07 08 & 15 Sep 27 23:59
Connected Autonomous Vehicles – Architecture (CAV) X Sep 06 08 & 15 Sep 26 23:59
Multimodal Conversation (MMC) V2 X Sep 05 08 & 15 Sep 25 23:59
MPAI Metaverse Model – Architecture (MMM) X Sep 01 08 & 15 Sep 21 23:59

Additional information about the purpose of the projects can be found here.
Anybody may respond to any of the three Calls for Technologies. However, non-members should join MPAI to participate in the development of the relevant standards.
Anybody can make comments on the Technical Specifications published with a request for Community Comments.
MPAI is continuing its work plan that includes the development of the following Technical Specifications:

  • AIF-DC, the group in charge of AI Framework (MPAI-AIF), is now working on the review of comments made on MPAI-AIF V2, developing the reference software and drafting the conformance testing.
  • Requirements (ARA), the group in charge of Avatar Representation and Animation (MPAI-ARA), is now working on the review of comments made on MPAI-AIF V2, developing the reference software and drafting the conformance testing.
  • MMC-DC, the group in charge of Multimodal Conversation (MPAI-MMC), is now working on the review of comments made on MPAI-AIF V2, developing the reference software and drafting the conformance testing.
  • Requirements (MMM), the group in charge of MPAI Metaverse Model (MPAI-MMM) – Architecture, is now working on the review of comments made on MPAI-AIF V2, developing the reference software and drafting the conformance testing.

The MPAI work plan also includes exploratory activities, some of which are close to becoming standard or technical report projects:

  • AI Health (MPAI-AIH). Targets an architecture where smartphones store users’ health data processed using AI and AI Models are updated using Federated Learning.
  • End-to-End Video Coding (MPAI-EEV). Extends the video coding frontiers using AI-based End-to-End Video coding.
  • AI-Enhanced Video Coding (MPAI-EVC). Improves existing video coding with AI tools for short-to-medium term applications.
  • Server-based Predictive Multiplayer Gaming (MPAI-SPG). Uses AI to train neural networks that help an online gaming server to compensate data losses and detects false data.
  • XR Venues (MPAI-XRV). Identifies common AI Modules used across various XR-enabled and AI-enhanced use cases where venues may be both real and virtual.

Legal entities and representatives of academic departments supporting the MPAI mission and able to contribute to the development of standards for the efficient use of data can become MPAI members.
Please visit the MPAI website, contact the MPAI secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook, Instagram, and YouTube.


Do we need standards for Connected Autonomous Vehicles?

Enabling individuals or groups of people to move independently has been a major achievement that has changed human life for the better. Motor vehicles, however, have created a number of negative consequences such as, accidents causing damages, injuries, and deaths; congestion on the roads, millions of cars carrying a single person for a couple of ours and then staying, unused; air pollution, worsening of urban environments, etc.

Connected autonomous vehicles (CAV) have the potential to eliminate human error replacing it with a rate of machine errors orders of magnitude lower, optimise use of vehicles and infrastructure, give more time to human brains for rewarding activities, optimise traffic management, reduce congestion and pollution, help the elderly or disabled people to have a better life, and more.

Much has been happening since the first 1939 attempt at creating an autonomous vehicle. Today CAVs are technically feasible, and prototypes are driving on public roads and streets. The Society of Automotive Engineers in the USA has published a classification of autonomous vehicles based on levels.

Should we just wait for the industry to produce higher SAE-Level vehicles until one day we will only see CAVs around us? This is an option, but not necessarily the one that will let us reach the CAV holy grail in the most efficient and timely way.

Some 35 years ago, most public authorities, “owners” of their countries’ VHF and UHF bands, realised that digital television would allow them to keep their cherished terrestrial television service while getting a “digital dividend” in the form of VHF and UHF slots and re-assign them to other purposes. Especially in the United States, digital television was a national goal and steps were made to implement it. Some enlightened people understood the value of a global digital television standard (MPEG-2) and thing simply “happened”, not just for terrestrial, but also for ratellite and cable television, and packaged media as well.

Of course, cars are not television sets, but the game-changing role of standards can be the same. Standards can convert today’s niche market of CAVs (if we can call it a “market”) into a mass market. It can accelerate the availability of technology, promote competition, yield better and cheaper products, assuage consumer concerns, and provide tools for regulation.

Artificial Intelligence (AI) is the technology that can provide the solutions we need. MPAI can provide AI-based standards that are explainable.

MPAI intends to publish a standard called Connected Autonomous Vehicle (MPAI-CAV) – Architecture. This will enable component manufacturers to put their standard components on the market and car manufacturers to access an open global market of components with standard functions and interfaces that can be tested for conformance using standard procedures.

Register for one of the two online presentations on July/26 at 8 UTC and 15 UTC or read an overview of MPAI-CAV – Architecture.


MPAI issues Call for Technologies: Connected Autonomous Vehicle – Architecture

Geneva, Switzerland – 12 July 2023. Today, the international, non-profit, and unaffiliated Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) organisation developing AI-based data coding standards has concluded its 34th General Assembly (MPAI-34) approving the Call for Technologies: Connected Autonomous Vehicle (MPAI-CAV) – Architecture. Two online presentations of the Call will be made on 26 July at 8 and 15 UTC. Responses are due by 15 August.

The goal of the MPAI-CAV standard is to promote the development of a CAV industry by specifying components that can be easily integrated into larger subsystems. To achieve this goal, MPAI intends to develop the MPAI-CAV standard as a series of standards each adding more details to enhance CAV component interoperability. The first issue, MPAI-CAV – Architecture, to be developed using the results of the Call, aims to partition CAVs into subsystems and to further partition those subsystems into components. Both subsystems and components are identified by their function and interfaces, i.e., data exchanged between subsystems and components.

Three documents are attached to the Call: the first is Use Cases and Functional Requirements. It includes an initial set of Functionalities that the Architecture should provide.

The second document is the Framework Licence designed to facilitate the timely access to IP that is essential to implement the planned MPAI-CAV – Architecture standard. Finally, the third document is a Template for responses that respondents to the Call may wish to use in their responses.

Anybody may respond to the Call. However, non-members should join MPAI to participate in the development of the MPAI-CAV – Architecture standard.

MPAI is continuing its work plan comprising the development of the following Technical Specifications:

  1. The AI Framework (MPAI-AIF) V2 Technical Specification will enable an implementer to establish a secure AIF environment to execute AI Workflows (AIW) composed of AI Modules (AIM).
  2. The Avatar Representation and Animation (MPAI-ARA) V1 Technical Specification will support creation and animation of interoperable human-like avatar models able to understand and express a Personal Status.
  3. The Multimodal Conversation (MPAI-MMC) V2 Technical Specification will generalise the notion of Emotion by adding Cognitive State and Social Attitude and specify a new data type called Personal Status.
  4. The MPAI Metaverse Model (MPAI-MMM) – Architecture V1 Technical Specification will specify the Operation Model and its components Actions, Items, and Data Types.

The MPAI work plan also includes exploratory activities, some of which are close to becoming standard or technical report projects:

  1. AI Health (MPAI-AIH). Targets an architecture where smartphones store users’ health data processed using AI and AI Models are updated using Federated Learning.
  2. End-to-End Video Coding (MPAI-EEV). Extends the video coding frontiers using AI-based End-to-End Video coding.
  3. AI-Enhanced Video Coding (MPAI-EVC). Improves existing video coding with AI tools for short-to-medium term applications.
  4. Server-based Predictive Multiplayer Gaming (MPAI-SPG). Uses AI to train neural networks that help an online gaming server to compensate data losses and detects false data.
  5. XR Venues (MPAI-XRV). Identifies common AI Modules used across various XR-enabled and AI-enhanced use cases where venues may be both real and virtual.

Legal entities and representatives of academic departments supporting the MPAI mission and able to contribute to the development of standards for the efficient use of data can become MPAI members.

Please visit the MPAI website, contact the MPAI secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook, Instagram, and YouTube.

 

 


An Introduction to the MPAI Metaverse Model Architecture – Part III

In parts I and II of this series of posts, we have highlighted:

  1. The basic elements that enable operation of an M-Instance, especially Processes, Items, Actions and Data Type. In particular, Processes can take the shape of a User (a representative of a human in an M-Instance), a Device (to enable the connection of an M-Instance with the real world, called Universe), and a Service.
  2. The functional requirements of an initial list of Actions that enable a Process to do useful things in an M-Instance.

We are now going to identify the functional requirements of an initial list of Items that enable a Process to do useful things in an M-Instance. For convenience, Items will be grouped in classes.

Remembers the online presentations at 8 and 15 UTC on 23 June where you can know more about the MPAI Metaverse Model Architecture and the Call for Technologies. Register here for the first and here for the second presentation.

Here are Items with a general applicability.

Functional requirements Item
An M-Instance is an abstract entity bearing an Identifier. An M-Instance may expose its Capabilities. M-Instance
An M-Environment is an abstract entity bearing an Identifier.

1.      An M-Environment is hosted by an M-Instance.

2.      An M-Environment may expose its Capabilities.

3.      The Capabilities of an M-Environment may extend the Capabilities of its hosting M-Instance.

M-Environment
An Item or a Process shall bear Identifiers in such a way that:

1.      An Identifier uniquely references an Item or Process.

2.      An Item can have more than one Identifier.

An Item may have a hierarchical structure, such as:

Item: M-InstanceID, M-EnvironmentID, M-LocationID, ItemID.

Process: M-InstanceID, M-EnvironmentID, ProcessID.

Identifier
With the Rights Item we can express the Actions that a Process can perform on Items, at M-/U-Locations, during a period, e.g.,

Action1 Item1 Location1 T11-T12
Action2 Item2 Location2 T21-T22
Rights
Program is Data (and Metadata) that can be executed.

A Program Item shall be executable in the M-Instance.

A Program Item may be subject to certification before being admitted to an M-Instance

Program
Contract is a special Program that can be activated (Executed) by an external entity, e.g., a User or another already activated Contract. The contract shall include:

1.      Offer: Rights.

2.      Acceptance: By both parties.

3.      Consideration: There may be a Transaction.

The terms of the Contract are enforced in the jurisdiction of the M-Instance.

Contract
An M-Instance/M-Environment may show its Capabilities, i.e., an Item describing the characteristics of an M-Instance/M-Environment, including:

1.      Currencies supported.

2.      Items supported with Data Formats.

3.      Data Types supported.

Capabilities

Here are Items related to the interaction between Processes.

Functional requirements Item
Processes may need to exchange application-level Messages. Message
A Process should be able to expose its Capabilities, i.e., an Item containing a description of its characteristics including:

1.        List of Actions that can be performed.

2.        List of Items supported with Data Formats.

3.        List of Data Types supported.

4.        The cost of performing an Action.

5.        Human represented (User)

6.        Apps on board (Device).

Capabilities
When a Process requests another Process to perform an Action on its ehalf, it issues a Request-Action, an Item including:

1.      Time the Request-Action was issued.

2.      The Source ProcessID.

3.      The Destination ProcessID.

4.      The Action requested.

5.      The ItemIDs relevant to the Action.

6.      The Location of the Items.

7.      The Location of the output Items produced by the Request-Action.

8.      The requested Rights on the output Items.

Request-Action
When a Process has received a Request-Action and succeeds in performing it, it provides a Response-Action, an Item containing:

1.      Time the Response-Action was issued.

2.      The Source ProcessID (Source refers to the Process that issued the request).

3.      The Destination ProcessID.

4.      The output Items produced by performing the Request-Action.

Response-Action
An M-Instance is an abstract entity bearing an Identifier. An M-Instance may expose its Capabilities. M-Instance
An M-Environment is an abstract entity bearing an Identifier.

4.      An M-Environment is hosted by an M-Instance.

5.      An M-Environment may expose its Capabilities.

6.      The Capabilities of an M-Environment may extend the Capabilities of its hosting M-Instance.

M-Environment

Here are Items related to the use of an M-Instance.

Functional requirements Item
Account in an Item that uniquely references a human who has Registered.

A User may have more than one Account with one or more M-Instances or M-Environments.

An Account shall include:

1.      The ID of the Registered human.

2.      An M-Instance-specific subset of the Registered human’s User Data.

3.      The Rights held by each Users in the M-Instance/M-Environment.

4.      The IDs of Devices, Apps, and Users, and Personae.

5.      The validity of:

5.1.   Rights.

5.2.   Account.

Account
Activity Data is an Item containing the record of the Actions made by a User at all M-Locations for a period. Therefore, Activity Data shall include a list of Activities and, for each activity:

1.      The M-LocationID the Activity Data refer to.

2.      The duration (t1-t2) the Activity Data refer to.

3.      The list of Action.

Activity Data
Personal Profile is an Item containing the Data about the human represented by a User. It may include:

1.      First Name

2.      Last Name

3.      Address

4.      Country

5.      Age

6.      Biometric data

7.     

Personal Profile
The Manager of an M-Instance sets Rules, an Item expressing the terms and conditions under which Processes operate in the M-Instance. The Rules may express:

1.      The ability of a User to perform Actions on Items for which it has Rights.

2.      The inability of a User to perform Actions on Items for which it has no Rights.

3.      The duty of a User to perform Actions on Items.

4.      The ability of a User to make Transactions on the Rights of Items.

Rules
Social Graph is a representation of a User’s network of connections with Items and Processes representing the following:

1.      The types and the connections with Items and their M-Locations.

2.      The types and the connections with Devices (frequency of use, etc.).

3.      The types and the connections with Services (frequency of use, etc.).

4.      The types and the connections with Users, groups of Users in terms of:

4.1.   Time

4.2.   M-Locations.

4.3.   Declared purpose.

Social Graph
User Data is an Item that collects all the Data related to a human and their Users:

1.      Rights held by the human’s Users in the M-Instance.

2.      The Personal Profile of the human.

3.      The Personae that the human’s Users impersonate.

4.      The Activity Data of the human’s Users.

5.      The Social Graphs of the human’s Users.

User Data should have a representation that allow easy identification, extraction, and sharing of subsets of a User Data.

User Data

Here are Items with a financial impact.

Functional requirements Item
An Item that may be the object of a Transaction is called Asset. An Asset may be:

1.        MM-Embedded at an M-Location.

2.        Posted to a Service.

An Asset shall:

1.      Preserve the Data Formats of the Item that has spawned it.

2.      Include the date it was created.

Asset
It is useful to consider the Ledger associated with a specific Asset. This Item includes the list of all Transactions executed:

1.      On an Asset.

2.      Starting from the first Transaction and including the last.

3.      The Marketplace on which a Transaction was performed.

Ledger
The Provenance Item shall include the list of all Transactions executed:

4.      On an Asset.

5.      Starting from the first Transaction and including the last.

The Marketplace on which a Transaction was performed.Provenance.

Provenance
Transaction is Item representing the changed state of:

1.        The Rights on an Asset held by a seller User and a buyer User.

2.        The Accounts of the Users and of the Service facilitating/enabling the Transaction (Optional).

The Transaction shall represent:

1.      The Time the Transaction is performed.

2.      The Value moving into the Wallet of User 1 (seller).

3.      The Value moved from the Wallet of User2 (buyer).

4.      The Value moved into the Wallet of User 3 (service) – optional.

5.      The Time the Values were moved.

6.      The Rights to Act owned by User1 after Time.

7.      The Rights to Act owned by User2 after Time.

Transaction
Value is expressed by an Amount and the Currency related to the Amount. It

shall have a representation that enables the expression of the Amount and the Currency used to represent the Amount.

Value
A Wallet is a container of Currency units. A Wallet shall enable the representation of:

1.      Each Currency’s Amounts contained in the Wallet for each Currency.

2.      The Transactions performed.

Wallet

Here are Items specifically used to access a group of Services.

Functional requirements Item
To Authenticate an Entity (an Item that can be perceived), a special Item called AuthenticateIn is produced. This contain:

1.      The (ID of the) Entity Authenticated.

2.      (Optionally) information related to the way AuthenticateOut is rendered.

The Entity to be Authenticated can be:

1.      Speech produced by a User.

2.      The visual appearance of a User, etc.

Information on the rendering of InterpretOut is provided by:

1.      Media type (text, speech, image, etc.) used for rendering.

2.      Spatial Attitude of the Object rendering AuthenticateOut.

AuthenticateIn
AuthenticateOut is the Item containing the result of the Service Acting on the Request-Authenticate Item and information about its rendering. It is rendered as requested in AuthenticateIn. AuthenticateOut
To Discover Items, an Item called DiscoverIn is produced that contains:

1.      A description of the Items to be Discovered.

2.      Information related to the rendering of DiscoverOut.

Items candidate for Discovery may be described by:

1.      Verbal/text description.

2.      Similar Items.

3.      Belonging to specific M-Instances/M-Environments/M-Locations.

4.      Belonging to specific sections of Activity Data.

Information on DiscoverOut Rendering may be provided by:

1.      Media type used for rendering.

2.      Spatial Attitude of the Object rendering DiscoverOut.

DiscoverIn
DiscoverOut is the Item containing the result of the Service Acting on the Request-Discover Item and information about its rendering. It is rendered as requested in DiscoverIn. DiscoverOut
To obtain information on an Item, a User produces InformIn, an Item containing:

1.      A description of the Item about which information is requested.

2.      Information related to the rendering of InformOut.

InformIn may refer to:

1.      Item Metadata

2.      Any other information that a Service may have on the Item.

Information on rendering of InformOut may be provided by:

1.      Media type used for rendering.

2.      Spatial Attitude of the InformOut rendered Object.

InformIn
InformOut is the Item containing the result of the Service Acting on the Request-Inform Item and information about its rendering. It is rendered as requested in InformIn. InformOut
To obtain interpretation of an Item, a User produces InterpretIn, An Item containing:

1.      The ID or the Item to be Interpreted.

2.      Information related to the rendering of InterpretOut.

Items candidate for interpretation may be identified by: Item or ItemID.

Information on InterpretOut Rendering may be provided by:

1.      Media type used for rendering.

2.      Spatial Attitude of InterpretOut rendered Object.

InterpretIn
InterpretOut is the Item containing the result of the Service Acting on the Request-Interpret Item and information about its rendering. It is rendered as requested in InterpretIn. InterpretIn

Here are Items producing a perceptible experience.

Functional requirements Item
An Entity is an Item that can be perceived. MPAI introduces the following perceptible Items: Object, Model, Scene, Event, and Experience. Entity
It is useful to introduce Event, an Entity that includes selected Entities at an M-Location and their Animations during a period. Therefore, an Event shall include:

1.      M-LocationID.

2.      Start Time and End Time.

3.      List of Entities, their Animations, and Interactions.

Event
It is also useful to introduce the Entity Experience, comprising selected Entities of an Event and User Interactions with the Entities of the Event. Therefore, an Experience shall include:

1.      Start Time and End Time

2.      EventID

3.      List of selected Entities, their Animations, and User Interactions.

Experience
Object is an Entity representing an object including:

1.      The type(s) of Media (Audio-Visual-Haptic) composing the Model.

2.      The Data representation.

3.      The Data Format used.

Object
Model is an Object representing an object with its features ready to be MM-Animated or UM-Animated. Model
Persona is a Model representing a human. Persona
Scene is a composition of Objects with the following features:

1.      May be hierarchical.

2.      May be MM-Embedded at a specified M-Location.

3.      Represent Objects:

3.1.   With a Spatial Attitude.

3.2.   Animated by a stream or by an autonomous agent.

Scene
A Stream is an Item made by a continuous flow of Data with the following features:

1.      May be scalable in space and time.

2.      May be used to:

2.1.   Animate a Model.

2.2.   Represent a Digitised Object in an M-Instance.

Stream
Interaction is an Item containing the Request-Action issued by a User on an Entity at an M-Locations and the corresponding Time. Interaction
Map is the basic Item of an AR application. It is an Item containing a structure establishing a correspondence between U-Locations with M-Locations. Therefore, A Map shall include:

1.      The M-Instance the Map refers to.

2.      For each U-Location having one correspondence with an M-Location:

2.1.   The ID of the M-Location corresponding to the U-LocationID.

2.2.   Metadata related to the U-LocationID.

2.3.   Metadata related to the M-LocationID.

Map

Here are Item with a spatial impact.

Functional requirements Item
M-Location is an Identifiable delimited spatial portion of an M-Instance, e.g., the place occupied by representation of a human. An M-Location:

1.      Shall define the space of the M-Instance belonging to the M-Location.

2.      May enable the creation of sub-spaces defining sub-M-Locations

M-Location
U-Location is an Identifiable delimited spatial portion of the Universe, e.g., the place occupied by the human. A U-Location shall:

1.      Define the space in the Universe belonging to the U-Location.

2.      Enable the definition of sub-spaces (sub-U-Locations) comprised in the U-Location.

The enforcement of Rights to a U-Location is not intended to be part of the MPAI-MMM Architecture.

U-Location

Of course, more Items can be identified but those introduced above have been tested to cover a significantly large number of use cases in a variety of application domains.


An Introduction to the MPAI Metaverse Model Architecture – Part II

In part I of this series of posts, we have highlighted the basic elements that enable operation of an M-Instance, especially Processes, Items, Actions and Data Type. In particular, Processes can take the shape of a User (a representative of a human in an M-Instance), a Device (to enable the connection of an M-Instance with the real world, called Universe), and a Service.

We are now going to identify the functional requirements of an initial list of Actions that enable a Process to do useful things in an M-Instance.

Functional requirements Actions
To Register with an M-Instance or M-Environment. This Action may only be performed by a human or a legal entity, not by a User Register (human)
To transmit a Request-Action to a Process. The Request-Action should contain: the Action, the Items involved in the Action, the location where the necessary input Items are located, the location where the produced output Items are located, and the Rights that the requesting Process wishes to hold to be able to perform Actions on the Items produced. Request (Request-Action)
To transmit a Response-Action to a Process. The Response-Action should contain the output Items or an error message. Respond (Response-Action)
To enable a User to increase or diminish the Rights of a Process, e.g., because new Rights have been acquired or because a User has not complied with the Rules, we need the Action Change (Rights of a Process)
To confirm that the speech or the face of a human or an object imported into an M-Instance is from a specific human or U-Location (place in the real world), we need the Action. Note: The User requesting Authentication may also request Rights to use the information received, e.g., to publish it. Authenticate (Item).
To disable access to certain Item no longer accessible by all Processes (assuming that Rights have not been irrevocably granted to a Process) we need the Action Note: The Item may be made accessible again depending on the Rights of the User Hiding the Item. Hide (Item).
To create an Item out of Data and Metadata. For instance, a Device may capture Media as Data subject to certain Rights for use in an M-Instance. Create converts them to an Item usable in the M-Instance because an M-Instance can only Act on Items. Identify (Item)
To create a new Item by modifying an original Item with new or partially new Data and Metadata. For instance, a User with Rights on an Item may wish clone and then modify the components of an existing Item. Modify (Item)
To author an Item by calling a Service and providing it with Data and Metadata. Note: An M-Instance can provide a Service, internal or external to the M-Instance, that Users can call to create Items for use in the M-Instance. Author (Item)
To find Items by giving a description of the Items. Comments: An M-Instance can provide a Service that Users can call to find Items or Processes they need. Alternatively, the M-Instance may allow a User to Call an external Service to find Items of interest also outside of the M-Instance. Discover (Item)
To inform about an Item. A User may wish to know more about an Item, starting from its Metadata but potentially including other information the a Service has collected on the Item. Inform (Item)
To interpret an Item. For instance, a User may need the translation of an utterance produced by an avatar, recognise the face of an avatar, have its own message expressed in sign language into a speech segment. Interpret (Item)
To display an Asset. For instance, a User may wish to manifest its intention to surrender (part of) its Rights on an Asset. This can be done by placing the Asset at an M-Location that other Users can see or by posting it to a marketplace. Post (Asset)
To make a Transaction of an object. A User may like to surrender (part of) the Rights to an Asset to another User, possibly recognising the facilitation role of a Service in the Transaction. At the end of the Transaction the parties making the Transaction have different Rights on the Asset and the status of their Wallets may have changed. Transact (Asset)
To place an Entity (an Item that can be perceived) at an M-Location in such a way that other Users may not perceive it. This may be useful when the User needs to add more Entities to the M-Location without showing the preparations. MM-Add (Entity)
To make an Entity perceptible that was not until that moment, e.g., because the User did not want to show the Entity is not at a given time. MM-Enable (Entity)
To stop making an Entity perceptible, e.g., because the User does not want to show the setting of an event when the event is over. MM-Disable (Entity)
To transmit an Item to a Process. This is done, e.g., when a Device sends Data and Metadata coming from the Universe to a Service that Identifies the Item created from Data and Metadata, when a Uses captures an Entity at an M-Location (i.e., it asks the Service to MM-Send the Entity) or when a User transmits an Entity to a Device for rendering in the Universe. MM-Send (Item or Data and Metadata)
To activate a Contract. i.e., a Program and its Metadata stored on a Device and activated by an external entity, e.g., a User, or another activated Contract. Of course, Contracts may be Executed by an underlying Blockchain. Execute (Contract)
To animate a Model, i.e., an Object in the M-Instance representing an object at a U-Location with its features ready to be animated using a Process that is an autonomous agent. MM-Animate (Item)
To animate a Model using a Process that receives a Stream from a U-Location and animates the Model. The Process may be provided by the M-Instance, the human, or a third-party. UM-Animate
To present Media (i.e., Data and Metadata representing perceptible information) available at a Device to a U-Location as an Entity with an associated Spatial Attitude (i.e., Position and Orientation). For instance, a User may request that a Device present the Media is has received as an Entity from the M-Instance via an MM-Send Action. MU-Actuate (Media)
To present an Entity that is at an M-Location to a U-Location as an Entity with an associated Spatial Attitude. This operation is performed in two steps: MM-Send the Entity to a Device and MU-Actuate the Media from the Device. MU-Render (Entity)
To capture a scene at a U-Location as Media. A User may ask a Device to capture a scene at a U-Location as Media. UM-Capture (scene)
To transmit Data and Metadata to a Process. A User may ask a Device to transmit the Data corresponding to the Media and Device Metadata. UM-Send (Data and Metadata)
To present a scene that is at a U-Location to an M-Location as an Entity with an associated Spatial Attitude. This operation is performed in three steps: the Device captures the scene that is at the U-Location as Media (UM-Capture), then it UM-Sends the Data and Device Metadata to a Service that Identifies the Entity and MM-Embeds it at the M-Location. UM-Render (scene)
To store an Item at an Address, i.e., to an Item to a Device or to store an Item at an Address. MU-Send (Item)
To place a Model at an M-Location, animate it with a Stream, and present the animated Model at a U-Location with an associated Spatial Attitude. In other terms, the round trip real-virtual-real is established. Track (Model)
To verify that a Process has Rights to make an Action on an Item, to preserve the integrity of M-Instance operation. Validate (Process)
To convert an Item of a Request-Action or Response-Action to another Data Format. As for other Services, Convert can be a Service offered by the M-Instance or available outside of the M-Instance. Convert (Item)
To transmit a Request-Action to a Resolution Service to enable a ProcessA in M-InstanceA to communicate with ProcessB in a different M-InstanceB. As for other Services, Resolution can be a Service offered by the M-Instance or available outside of the M-Instance. Request (Request-Action)
To transmit a Response-Action to a Resolution Service that has sent a Request-Action. Respond (Response-Action)

Of course, more Actions can be identified but those introduced above have been tested to cover a significantly large number of use cases in a variety of application domains.

To know more about the MPAI Metaverse Model Architecture and the Call for Technologies, join the online presentations at 8 and 15 UTC on 23 June. Register here for the first and here for the second presentation.


An Introduction to the MPAI Metaverse Model Architecture – Part I

This is the first of a series of posts that illustrate the Call for Technologies: MPAI Metaverse Model – Architecture, a document inviting interested parties to submit comments to and proposals for Use Cases and Functional Requirements: MPAI Metaverse Model – Architecture. The goal is to facilitate the task of those who wish to contribute to the first “Metaverse Architecture” standard ever attempted by a standards body.

Before starting, let’s clarify why is MPAI, a standards body developing standards for AI-based data coding, engaged in the “metaverse”? The answer is manifold:

  1. The metaverse will require a range of technologies that deal with data and their transformations, i.e., data coding.
  2. The metaverse is thus and excellent source of valuable standard projects.
  3. MPAI has already a few standards that respond to specific metaverse needs.
  4. The metaverse is thus also an excellent technology integration platform.

Note that the word metaverse is used here to mean the “metaverse notion” while Metaverse Instance (M-Instance) indicated a “specific implementation” of the MPAI Metaverse Model Architecture (in the following: MPAI-MMM). An M-Instance is considered as a set of Processes providing some or all the following functions:

  1. To sense data from U-Locations.
  2. To process the sensed data and produce Data from the sensed data and/or autonomously.
  3. To produce one or more M-Environments populated by Objects that can be either digitised or virtual, the latter with or without autonomy.
  4. To process Objects from the M-Instance or potentially from other M-Instances to affect U- and/or M-Environments using Object in ways that are:
    • Consistent with the goals set for the M-Instance.
    • Effected within:
      • The capabilities of the M-Instance
      • The Rules set for the M-Instance.

The above gives the opportunity to highlight the convention that word beginning with a capital letter are defined here while those beginning with a small letter have the normal meaning of the context.

Some terms are more important than others, so let’s report a few of them here:

  1. A Process is Program and Metadata that can be executed in the M-Instance to perform Actions on Items.
  2. An Action is a supported Functionality that is performed in an M-Instance.
  3. An Item is Data and Metadata supported by the M-Instance where the Item exists.
  4. Metadata adds information on a Process of an Item and may include Rights.
  5. Rights define:
    • The ability of a Process to perform Actions on Items.
    • The possibility that an Item be subjected to an Action by a Process.
  6. An Item may include Rights held by User and Rights that it may be possible to acquire on the Item.
  7. Data Types are data referenced by Actions and Items.

We are now able to list the first Functionalities of an M-Instance:

An M-Instance is a set of Processes providing some or all the following functions:

  1. Senses data from U-Location.
  2. Produces Items autonomously or by processing the sensed data.
  3. Hosts one or more M-Environments populated by Objects that can be either digitised or virtual, the latter with or without autonomy.
  4. Processes Objects from the M-Instance or potentially from other M-Instances to affect U- and/or M-Environments using Objects in ways that are:
    • Consistent with the goals set for the M-Instance.
    • Effected within the capabilities and Rules of the M-Instance, and in accordance with applicable laws and regulations.
  5. Identifies Processes and Items with one or more than one Identifier each of which uniquely refers to one Process or Item.
  6. May contain one or more M-Environments each of which:
    • Includes an Identifier.
    • May include M-Locations with space and time attributes.
    • May require a Registration specific to the M-Environment.
  7. May make available information regarding its Capabilities.
  8. May require Registration for use:
    • A human can request to deploy one or more Users and one or more Personae in an M-Instance.
    • An M-Instance may request a subset of the Personal Profile of the Registering human.
  9. Establishes Rules that human’s Users in the M-Instance shall comply with.
  10. May penalise a User for lack of compliance with the Rules.

MPAI-MMM – Architecture identifies the following types of Process (see Figure 1):

  1. User represents a human rendered as:
    • A Model (Persona) animated by a stream generated by the human or by an autonomous agent. A User may be rendered by one or more Personae.
    • An Object rendering the human.
  2. Device connects User with a human or a U-Location:
    • From the Universe to an M-Instance: captures a scene as Media and Provides Media as Data and Metadata:
    • From an M-Instance to the Universe: receives an Entity and renders it as Media with a Spatial Attitude.
  3. Service provides Functionalities.
  4. App is a Program executed on a Device.

Figure 1 – Relationship of Human-Device-User-Service-Persona

Figure 1 highlights a few basic aspects:

  1. A human is connected to one of its potentially many digital counterparts (Users).
  2. An object has one or more digital correspondents (Objects).
  3. A User can be rendered as a Persona (possibly more than one).
  4. Processes, in particular Users, can interact with one another.

Processes play an important role. Here are some features of a Process:

  1. Performs Actions on Items if it has the Rights to do that.
  2. Can make available information about its Capabilities.
    • The Actions it can Perform.
    • The Items on which Actions can be performed.
    • The time during which they can be performed.
    • The M-Locations where they can be performed.
  3. Can request another Process to perform Actions on Items by transmitting to it a Request-Action Item.
  4. Can be requested to perform an Action and it does so if:
    • The requesting Process has the Rights required to perform that request, e.g., it has made a Transaction to acquire the Rights, the Rights is part of the set of Rights assigned at Registration time, etc.
    • The requested Process has the Rights to perform the requested Action on the Item.
  5. Can respond to the Process requesting an Action with a Response-Action Item (see Figure 2).
  6. Uses a supported format:
    • To request another Process to perform Actions on Items (Request-Action).
    • To respond to another Process that has requested an Action (Respond-Action).
  7. May perform, or to request other Processes to perform, Actions on Items even in the absence of Rights, if the Rules so allow.
  8. May need to be certified by the M-Instance Manager for use in an M-Instance.

Figure 2 – Processes requesting Action and responding to request

An M-Instance is typically administered by a single Manager that makes decisions about the technologies that fit the best with the entity’s needs. Two independent Managers need not make the same technology choices. The following workflow enables interoperability between MetaverseA and MetaverseB when ProcessA in MetaverseA requests ProcessB in MetaverseB to perform Action on an ItemA.1, the following (Note: RS=Resolution Service, CS=Conversion Service).

  1. ProcessA transmits Request-Action1 to R­SA.
  2. RSA transmits Request-Action1 to RSB.
  3. RSB transmits Item1 to CS.
  4. CS produces and transmit Item2 containing Converted Data to RSB.
  5. RSB transmits the new Request-Action2 to ProcessB.
  6. ProcessB
    • Performs the Action specified in Request-Action2 using ItemA.2.
    • Produces Response-Action2.
    • Requests RSB to transmit to RSA Response-Action2 containing ItemA.3 (result of performing Request-ActionA.2).
    • RSB transmits Response-Action2 to RSA.
    • RSA transmits Item3 to CS.
    • CS produces and transmits to RSA Item4, corresponding to ItemA.3 with converted Data.
    • RSA produces and transmits to ProcessA a new Response-Action4 that references ItemA.4.

It should be noted that an M-Instance may allow Processes to communicate directly with Processes in other M-Instances without calling ResolutionServiceA.

Given the above, what is then the scope of the MPAI-MMM Architecture Call for Technologies?

To make comments on, add functionalities to, or propose new elements to the MPAI Metaverse Model Architecture.

Join the online presentation on 23 June at 08 and 15 UTC. Register here for the first or here for the second presentation.

 


MPAI issues MPAI Metaverse Model – Architecture Call for Technologies

Geneva, Switzerland – 14 June 2023. Today, the international, non-profit, and unaffiliated Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) organisation developing AI-based data coding standards has concluded its 33rd General Assembly (MPAI-33) approving the Call for Technologies: MPAI Metaverse Model (MPAI-MMM) – Architecture. Responses are due by 10 July. Two online presentations of the Call will be made on 23 June at 8 and 15 UTC.

After publishing two Technical Reports on Functionalities and Functionality Profiles of the MPAI Metaverse Model, MPAI is now kicking off an ambitious plan to develop a Technical Specification on MPAI Metaverse Model – Architecture. This is a project that no standards body has ever attempted so far.

The first step of the plan is the publication of the Call for Technologies, as mandated by the MPAI standard development process. Note that the Call does not address data formats, only metaverse functionalities.

Three documents are attached to the Call: the first is Use Cases and Functional Requirements. It includes a reference to some thirty metaverse use cases explored by MPAI, a set of Functionalities that the Architecture should provide, and the functional requirements of its key elements: Processes, Items, Actions and Data Types.

The second document is the Framework Licence designed to facilitate the timely access to IP that is essential to implement the planned MPAI-MMM – Architecture standard. Finally, the third document is a Template for responses that respondents to the Call may wish to use in their responses.

Anybody may respond to the Call. However, non-members should join MPAI to participate in the development of the MPAI-MMM – Architecture standard.

MPAI is continuing its work plan comprising the development of the following Technical Specifications:

  1. The AI Framework (MPAI-AIF) V2 Technical Specification will enable an implementer to establish a secure AIF environment to execute AI Workflows (AIW) composed of AI Modules (AIM).
  2. The Avatar Representation and Animation (MPAI-ARA) V1 Technical Specification will support creation and animation of interoperable human-like avatar models able to understand and express a Personal Status.
  3. The Multimodal Conversation (MPAI-MMC) V2 Technical Specification will generalise the notion of Emotion by adding Cognitive State and Social Attitude and specify a new data type called Personal Status.

The MPAI work plan also includes exploratory activities, some of which are close to becoming standard or technical report projects:

  1. AI Health (MPAI-AIH). Targets an architecture where smartphones store users’ health data processed using AI and AI Models are updated using Federated Learning.
  2. Connected Autonomous Vehicles (MPAI-CAV). Targets the Human-CAV Interaction Environment Sensing, Autonomous Motion, and Motion Actuation subsystems implemented as AI Workflows.
  3. End-to-End Video Coding (MPAI-EEV). Extends the video coding frontiers using AI-based End-to-End Video coding.
  4. AI-Enhanced Video Coding (MPAI-EVC). Improves existing video coding with AI tools for short-to-medium term applications.
  5. Server-based Predictive Multiplayer Gaming (MPAI-SPG). Uses AI to train neural networks that help an online gaming server to compensate data losses and detects false data.
  6. XR Venues (MPAI-XRV). Identifies common AI Modules used across various XR-enabled and AI-enhanced use cases where venues may be both real and virtual.

Legal entities and representatives of academic departments supporting the MPAI mission and able to contribute to the development of standards for the efficient use of data can become MPAI members.

Please visit the MPAI website, contact the MPAI secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook, Instagram, and YouTube.

 

 


The MPAI Metaverse Model – Status Report

  1. Introduction

Many use the metaverse word with other people, but it is unlikely that they all mean the same. In general one can say that a metaverse instance is a rather complex communication and interaction environment with features, such as synchronous and persistent experiences and virtual reality features such as avatars that may or may not be controlled by humans or objects of the real world.

MPAI Metaverse Model – MPAI-MMM – is the MPAI project developing technical documents – so far Technical Specifications – that can be applied to as many kinds of metaverse instances as possible and enable varied metaverse implementations to interoperate.

The first document, Technical Report – MPAI Metaverse Model – Functionalities, collects the functionalities that potential metaverse users expect a metaverse instance to provide, rather than trying to define what the metaverse is. It includes definitions, assumptions guiding the project, potential sources of functionalities, an organised list of commented functionalities, and an analysis of some of the main technology areas underpinning the development of the metaverse.

The MPAI-MMM is based on the idea of using the notion of Profiles and Levels that digital media standardisation has successfully employed for three decades to cope with the wide variety of expected application domains. As some metaverse technologies are not yet available, the second document, Technical Report – MPAI Metaverse Model – Functionality Profiles, develops Functionality Profiles, a new notion in standardisation that defines profiles for what they do (“functionalities”) rather than for how they do it (“technologies”).

The second document reaches another important milestone by:

  1. Extending the existing collection of definitions.
  2. Developing a functional metaverse operation model based on Sources requesting Destinations to perform Actions on Items both containing Data Types.
  3. Specifying the Actions that Sources request Destinations to perform on Items and the responses of Destinations.
  4. Specifying the Metadata of the Items but not their Data Formats, in line with the Functionality approach.
  5. Developing nine Use Cases to test the suitability of Actions and Items.
  6. Developing four Functionality Profiles.
  7. MPAI-MMM Functional Operation Model

As it is hard to describe the many terms defined in the document, we will rely on the common meaning of the words. When in doubt about the meaning of a term (starting with a capital letter), please use the search window.

Figure 1 shows a simple example of the connection between the real world (right-hand side, called Universe) and the representations of U-Environments in M-Instances on the left-hand side. Green indicates that the User/Objects represents real-world humans/objects. Users are visualised as Personae: light blue indicates that a Persona or Object is driven by an autonomous agent and brick red that the Persona moves according to its real twin’s movements.

Figure 1 – An example of Metaverse Scenario

An M-Instance is populated by Processes, e.g., a real or virtual Persona is driven by a Process. A Process may request another Process to perform an Action by sending it a Request-Action and receiving a Response-Action. The Request-Action is an Item, i.e., Data and Metadata, possibly with Rights. The Item contains the Time the request was issued and Source Process, Destination Process, Action requested, InItems provided as input and their InLocations, OutLocations of the output Items, and requested OutRights to Act on the produced Items.

Figure 2 – Processes interacting within and without M-Instances

So far, the following elements have been identified and specified:

  1. 4 Processes: App, Device, Service, and User.
  2. 27 Actions, such as Authenticate an Entity (an Item that can be perceived), Discover (request a Service to find Items responding to certain criteria), MM-Embed (place and make perceptible an Entity at an M-Location), UM-Animate (animate an Entity with data from the real world), etc.
  3. 33 Items such as Account, Asset (an Item that can be Transacted), Map (a list of connections between U-Locations and M-Locations), Model (a representation of an object ready to be UM-Animated by a Stream or to be MM-Animated by an autonomous agent), Rights (a description of what Actions can be done on an Item), etc.
  4. 13 Data Types such as, Currency, Emotion, Spatial Attitude, Time, etc.
  1. Use Cases

Nine use cases have been developed. Here a simple use case showing the descriptive capabilities of the MPAI-MMM scene description language.

Figure 3 – The Virtual lecture use case

Here is a description of the workflow.

  1. The meeting manager authors and embeds a virtual classroom.
  2. The student
    1. Connects its place in the M-Instance (“home”) with the place where the human is.
    2. Pays for the right to attend the lecture and save the Experience of the lecture.
    3. Places its Persona in the virtual classroom and stops the rendering of the Persona at home.
  3. The teacher
    1. Does likewise (but does not pay),
    2. Places a 3D Model used in the lecture and animates it.
  4. The student
    1. Moves close to the teacher’s desk without changing the display of its Persona to feels the audio, visual, and haptic components of the 3D Model.
    2. Saves the lecture how they experienced it.
  5. The meeting manager pays lecture fees to the teacher.
  6. Both student and teacher go back home.

The other use cases are: Virtual Meeting, Hybrid Working, eSports Tournament, Virtual Performance, AR Tourist Guide, Virtual Dance, Virtual Car Showroom, and Drive a Connected Autonomous Vehicle.

  1. Functionality Profiles

The structure of the Metaverse Functionality Profiles is derived from the Use Cases and includes hierarchical Profiles and independent Profiles. Profiles may have Levels. As depicted in Figure 3, the currently identified Profiles are Baseline, Management, Finance, and High. The currently identified Levels for Baseline, Management, and High Profiles are Audio only, Audio-Visual, and Audio-Visual-Haptic. The Finance Profile does not have Levels.

Figure 4 – The currently identified Functionality Profiles

  1. What is next

MPAI has now laid down the basic elements and can start from the development of the Technical Specification – Metaverse Architecture. This will contain the main components of an M-Instance, their interconnections and the types of data exchanged. It will also contain the APIs called by the Processes to enable implementation of M-Instances.