All posts Archives - Page 10 of 18

Leonardo Chiariglione
2022-08-24

MPAI adds documents and clarification to its currently open three Calls for Technologies

Geneva, Switzerland – 23 August 2022. Today the international, non-profit, unaffiliated Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) standards developing organisation has concluded its 23^rd General Assembly (MPAI-23). Among the outcomes are three documents produced to facilitate the task of drafting a response to the currently open Calls for Technologies and one document that will facilitate the identification and positioning of the technologies defined in the Multimodal Conversation Use Cases and Functional Requirements V2. MPAI-23 has also decided to extend the deadline for submitting responses to the Calls until the 24^th of October. The link to all documents relevant to the Calls can be found on the MPAI website.

MPAI-23 has also been informed that a group of individuals active in MPAI has decided to establish the MPAI Store, the entity envisaged by the Governance of the MPAI Ecosystem to have the task to assign identifiers to implementers of MPAI standards, receive implementations of MPAI standards, verify their security, test their conformance to an MPAI standard or to one of its use cases, make available for download implementations labelled with their interoperability level and publish reviews of implementation user experiences. The MPAI blog provides a description of the MPAI Store mission in the context of the Governance of the MPAI Ecosystem.

MPAI develops data coding standards for applications that have AI as the core enabling technology. Any legal entity supporting the MPAI mission may join MPAI, if able to contribute to the development of standards for the efficient use of data.

So far, MPAI has developed 5 standards (not italic in the list below), is currently engaged in extending 2 approved standards (underlined) and is developing another 10 standards (italic).

Name of standard	Acronym	Brief description
AI Framework	MPAI-AIF	Specifies an infrastructure enabling the execution of implementations and access to the MPAI Store.
Context-based Audio Enhancement	MPAI-CAE	Improves the user experience of audio-related applications in a variety of contexts.
Compression and Understanding of Industrial Data	MPAI-CUI	Predicts the company’s performance from governance, financial, and risk data.
Governance of the MPAI Ecosystem	MPAI-GME	Establishes the rules governing the submission of and access to interoperable implementations.
Multimodal Conversation	MPAI-MMC	Enables human-machine conversation emulating human-human conversation.
Avatar Representation and Animation	MPAI-ARA	Specifies descriptors of avatars impersonating real humans.
Connected Autonomous Vehicles	MPAI-CAV	Specifies components for Environment Sensing, Autonomous Motion, and Motion Actuation.
End-to-End Video Coding	MPAI-EEV	Explores the promising area of AI-based “end-to-end” video coding for longer-term applications.
AI-Enhanced Video Coding	MPAI-EVC	Improves existing video coding with AI tools for short-to-medium term applications.
Integrative Genomic/Sensor Analysis	MPAI-GSA	Compresses high-throughput experiments’ data combining genomic/proteomic and other data.
Mixed-reality Collaborative Spaces	MPAI-MCS	Supports collaboration of humans represented by avatars in virtual-reality spaces.
Neural Network Watermarking	MPAI-NNW	Measures the impact of adding ownership and licensing information to models and inferences.
Visual Object and Scene Description	MPAI-OSD	Describes objects and their attributes in a scene.
Server-based Predictive Multiplayer Gaming	MPAI-SPG	Trains a network to compensate data losses and detects false data in online multiplayer gaming.
XR Venues	MPAI-XRV	XR-enabled and AI-enhanced use cases where venues may be both real and virtual.

Please visit the MPAI website, contact the MPAI secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook, Instagram, and YouTube.

Most importantly: please join MPAI, share the fun, build the future.

No Comments InAll posts

Leonardo Chiariglione
2022-08-20

The MPAI 2022 Calls for Technologies – Part 1 (AI Framework)

A foundational element of the MPAI architecture is the fact that monolithic AI applications have some characteristics that make them undesirable. For instance, they are single-use, i.e., it is hard to reuse technologies used by the application in another application and they are obscure, i.e., it is hard to understand why a machine has produced a certain output when subjected to a certain input. The first characteristic means that it is hard to make complex applications because an implementer must possess know-how of all features of the applications and the second is that they often are “unexplainable”.

MPAI launched AI Framework (AIF), its first official standardisation activity in December 2020, less than 3 months after its establishment. AIF is a standard environment where it is possible to execute AI Workflows (AIW) composed of AI Modules (AIM). Both AIWs and AIMs are defined by their function and their interfaces. AIF is unconcerned by the technology used by an AIM but needs to know the topology of an AIW.

Ten months later (October 2021)the MPAI-AIF standard was approved. Its structure is represented by Figure 1.

Figure 1 – The MPAI-AIF Reference Model

MPAI’s AI Framework (MPAI-AIF) specifies the architecture, interfaces, protocols, and Application Programming Interfaces (API) of the AI Framework (AIF), an environment specially designed for execution of AI-based implementations, but also suitable for mixed AI and traditional data processing workflows.

The AIF, the AIW and the AIMs are represented by JSON Metadata. The User Agent and the AIMs call the Controller through a set of standard APIs. Likewise, the Controller calls standard APIs to interact with Communication (a service for inter-AIM communication), Global Storage (a service for AIMs to store data for access by other AIMs) and the MPAI Store (a service for downloading AIMs required by an AIW). Access represents access to application-specific data.

Through the JSON Metadata, an AIF with appropriate resources (specified in the AIF JSON Metadata) can execute an AIW requiring AIMs (specified in the AIF JSON Metadata) that can be downloaded from the MPAI Store.

The MPAI-AIF standard has the following main features:

Independence of the Operating System.
Modular component-based architecture with specified interfaces.
Encapsulation of component interfaces to abstract them from the development environment.
Interface with the MPAI Store enabling access to validated components.
Component can be implemented as software, hardware or mixed hardware-software.
Components: execute in local and distributed Zero-Trust architectures, can interact with other implementations operating in proximity and support Machine Learning functionalities.

The MPAI-AIF standard achieves much of the original MPAI vision because AI applications:

Need not be monolithic but can be composed of independently developed modules with standard interfaces
Are more explainable
Can be found in an open market.

Feature #6 above is a requirement, but the standard does not provide practical means for an application developer to ensure that the execution of the AIW takes place in a secure environment. Version 2 of MPAI-AIF intends to provide exactly that. As MPAI-AIF V1 does not specify any trusted service that an implementer can rely on, MPAI-AIF V2 identifies specific trusted services supporting the implementation of a Trusted Zone meeting a set of functional requirements that enable AIF Components to access trusted services via APIs, such as:

AIM Security Engine.
Trusted AIM Model Services
Attestation Service.
Trusted Communication Service.
Trusted AIM Storage Service
Encryption Service.

Figure 1 represents the Reference Models of MPAI-AIF V2.

Figure 2 – Reference Models of MPAI-AIF V2

The AIF Components shall be able to call Trusted Services APIs after establishing the developer-specified security regime based on the following requirements:

The AIF Components shall access high-level implementation-independent Trusted Services API to handle:
1. Encryption Service.
2. Attestation Service.
3. Trusted Communication Service.
4. Trusted AIM Storage Service including the following functionalities:
  1. AIM Storage Initialisation (secure and non-secure flash and RAM)
  2. AIM Storage Read/Write.
  3. AIM Storage release.
5. Trusted AIM Model Services including the following functionalities:
  1. Secure and non-secure Machine Learning Model Storage.
  2. Machine Learning Model Update (i.e., full, or partial update of the weights of the Model).
  3. Machine Learning Model Validation (i.e., verification that the model is the one that is expected to be used and that the appropriate rights have been acquired).
6. AIM Security Engine including the following functionalities:
  1. Machine Learning Model Encryption.
  2. Machine Learning Model Signature.
  3. Machine Learning Model Watermarking.
The AIF Components shall be easily integrated with the above Services.
The AIF Trusted Services shall be able to use hardware and OS security features already existing in the hardware and software of the environment in which the AIF is implemented.
Application developers shall be able to select the application’s security either or both by:
1. Level of security that includes a defined set of security features for each level, i.e., APIs are available to either select individual security services or to select one of the standard security levels available in the implementation.
2. Developer-defined security, i.e., a combination of a developer-defined set of security features.
The specification of the AIF V2 Metadata shall be an extension of the AIF V1 Metadata supporting security with either or both standardised levels and a developer-defined combination of security features.
MPAI welcomes the submission of use cases and their respective threat models.

MPAI has rigorously followed its standard development process in producing the Use Cases and Functional Requirements summarised in this post. MPAI has additionally produced The Commercial Requirements (Framework Licence) and the text of the Call for Technologies.

Below are a few useful links for those wishing to know more about the MPAI-AIF V2 Call for Technologies and how to respond to it:

The “About MPAI-AIF” web page provides some general information about MPAI-AIF.
The MPAI-AIF V1 standard can be downloaded from here.
The 1 min 20 sec video (YouTube and (non-YouTube) concisely illustrates the MPAI-AIFV2 Call for Technologies.
The slides and the video recording of the online presentation (YouTube, non-YouTube) made at the 11 July online presentation give a complete overview of MPAI-AIF V2.

The MPAI secretariat shall receive the responses to the MPAI-AIF V2 Call for Technologies by 10 October 2022 at 23:39 UTC. For any need, please contact the MPAI secretariat.

No Comments InAll posts

Leonardo Chiariglione
2022-08-08

Personal Status in human-machine conversation

MPAI has a Development Committee in the area of human-machine conversation (MMC-DC). In September 2021, MMC-DC has produced its first standard titled Multimodal Conversation (MPAI-MMC). That standard provides a standard way to represent Emotion with the following syntax:

{

“$schema”:”http://json-schema.org/draft-07/schema”,

“definitions”:{

“emotionType”:{

“type”:”object”,

“properties”:{

“emotionDegree”:{

“enum”: [“High”, “Medium”, “Low”]

“emotionName”:{

“type”:”number”

“emotionSetName”:{

“type”:”string”

}

“type”:”object”,

“properties”:{

“primary”:{

“$ref”:”#/definitions/emotionType”

“secondary”:{

“$ref”:”#/definitions/emotionType”

}

The semantics is given by:

Name	Definition
emotionType	Specifies the Emotion that the input carries.
emotionDegree	Specifies the Degree of Emotion as one of “Low,” “Medium,” and “High.”
emotionName	Specifies the ID of an Emotion listed in Table 2.
emotionSetName	Specifies the name of the Emotion set which contains the Emotion. Emotion set of Table 2 is used as a baseline, but other sets are possible.

Table 1 gives some examples of the MPAI standardised three-level Basic Emotion Set partly based on:

Table 1 – Basic Emotion Set

EMOTION CATEGORIES	GENERAL ADJECTIVAL	SPECIFIC ADJECTIVAL
ANGER	angry	furious irritated frustrated
APPROVAL, DISAPPROVAL	admiring/approving disapproving indifferent	awed contemptuous
AROUSAL	aroused/excited/energetic	cheerful playful lethargic sleepy
ATTENTION	attentive	expectant/anticipating thoughtful distracted/absent-minded vigilant hopeful/optimistic
BELIEF	credulous	sceptical
CALMNESS	calm	peaceful/serene resigned

The semantics of somr elements in Table 1 is provided by Table 2.

Table 2 – Semantics of the Basic Emotion Set

ID	Emotion	Meaning
1	admiring/approving	emotion due to perception that others’ actions or results are valuable
2	amused	positive emotion combined with interest (cognitive)
3	anger	emotion due to perception of physical or emotional damage or threat
4	anxious/uneasy	low or medium degree of fear, often continuing rather than instant
5	aroused/excited/energetic	cognitive state of alertness and energy
6	arrogant	emotion communicating social dominance
7	astounded	high degree of surprised
8	attentive	cognitive state of paying attention
9	awed	approval combined with incomprehension or fear
10	bewildered/puzzled	high degree of incomprehension
11	bored	not interested
12	calm	relative lack of emotion

In July MPAI has issued a call for technologies to extend the MPAI-MMC standard. One of the technologies requested is Personal Status defined as “The ensemble of information internal to a person, including Emotion, Cognitive State, and Attitude”. The 3 components are defined

Attitude	An element of the internal status related to the way a human or avatar intends to position vis-à-vis the Environment or subsets of it, e.g., “Respectful”, “Confrontational”, “Soothing”.
Cognitive State	An element of the internal status reflecting the way a human or avatar understands the Environment, such as “Confused”, “Dubious”, “Convinced”.
Emotion	An element of the internal status resulting from the interaction of a human or avatar with the Environment or subsets of it, such as “Angry”, “Sad”, “Determined”.

The Personal Status is conveyed by one or more Modalities, currently, Text, Speech, Face and Gesture.

Respondents to the call are requested to propose the following:

A Personal Status format capable of describing the evolution of Personal Status over time.
A Fused Personal Status format supporting the requirements to:
1. Include the Emotion, Cognitive Status, and Attitude making up a Personal Status.
2. Retain information on the measured values of the different factors in a Personal Status conveyed by the different Modalities.
3. Describe the evolution of Personal Status over time.

A Personal Status standard can be used as a standard component in human-machine conversation. One such component is Personal Status Extraction, depicted in Figure 2.

Figure 2 –Personal Status Extraction

Another component is Personal Status Display depicted in Figure 3.

Figure 3 – Personal Status Display

No Comments InAll posts

Leonardo Chiariglione
2022-07-23

MPAI 101

The 19^th of July 2022 was the second anniversary of the launch of the MPAI idea. After two years of existence, it is useful to have a summary of MPAI’s vision, mission, processes, achievements, plans, and the sister organisation MPAI Store. Those in a hurry can have a look at a 2 min video about MPAI (YouTube – non-YouTube).

Vision. The MPAI idea was driven by the impact digital media standards had on the media industry. While traditionally not very inclined to adopt “official” standards, that industry has seen relentless development in the last 1/3 of a century since digital media standards came to the fore and the industry began adopting them.

The state of Artificial Intelligence today is like the state of digital media some 1/3 of a century ago. Many players hold many technologies, but none has the power alone to create a level playing field where different players can deploy interoperable products, services, and applications.

Mission. The international, non-profit, and unaffiliated MPAI organisation develops standards for AI-based data coding and seeks to play the role of enabling that level playing field. 1/3 of a century ago the blocking factor was the high amount of data generated by the digitisation of analogue media. Today this remains an issue, but Artificial Intelligence can also be applied to all sorts of data when it is convenient to transform it from one format into another format.

Processes. Developing standards is a challenging business because standards are often based on sophisticated technologies that result from large research investments and have the potential to be used by millions of people. MPAI takes the following approach:

Anybody should be allowed to propose standards and contribute to the definition of their functional requirements.
Before the development of a standard starts users should know as many details of functional and commercial requirements as legally possible.
Investments that have produced good research results should be remunerated.
Once approved, the terms and conditions for using a standard should be known in a timely and simple fashion.

MPAI is developing its standards using a process that accommodates such requirements:

Anybody can propose standards, attend online meetings, and develop functional requirements.
MPAI Principal Members develop and approve the Framework Licence of a standard. Unlike Fair, Reasonable and Non-Discriminatory (FRAND) declarations, the Framework Licence includes terms and conditions without values (dollars, percentages, rates, dates, etc.) and a declaration that:
1. The licence will be issued before commercial implementations are available on the market.
2. The total cost will be in line with the total cost of the licenses for similar data coding technologies.
3. The market value of the specific standardised technology will be considered.
MPAI issues Calls for Technologies requesting proposals satisfying functional and commercial requirements.
Anybody can respond to a Call and participate in the integration of technologies for a standard on the condition of membership in MPAI and acceptance of the Framework Licence for proposals submitted.

Achievements. MPAI has developed 4 technical specifications and 1 standard, i.e., the full set of technical specification, reference software, conformance testing and performance assessment:

AI Framework (MPAI-AIF) enables the creation of environments (AIF) that execute AI Workflows (AIW) composed of basic components called AI Modules (AIM). It is a foundational MPAI standard on which other MPAI application standards are built.
Context-based Audio Enhancement (MPAI-CAE) uses AI to improve the user experience for audio-related entertainment, teleconferencing, restoration, and other applications in contexts such as in the home, in the car, on the go, in the studio, etc.
Compression and Understanding of Industrial Data (MPAI-CUI) uses AI to handle financial data for such purposes as assessing adequacy of governance and predicting the default and business discontinuity probabilities of a company.
Multimodal Conversation (MPAI-MMC) uses AI to enable conversation between humans and machines emulating human-human conversation in completeness and intensity.

MPAI has also developed Governance of the MPAI Ecosystem (MPAI-GME), a foundational standard laying down the rules that govern the submission of and access to MPAI standard implementations with attributes of Reliability, Robustness, Replicability, and Fairness, available from the MPAI Store.

Plans. MPAI is engaged in 3 projects which have just reached the Call for Technologies stage and aim at:

Providing the AI Framework standard with a security infrastructure so that AIF V2 components can access security services. Please have a look at the 1 min 20 sec video about the MPAI-AIF V2 Call for Technologies (YouTube – non-YouTube); the slides presented at the online meeting on 2022/07/11; the video recording of the online presentation (Youtube, non-YouTube) made at that 11 July presentation; and the Call for Technologies, Use Cases and Functional Requirements, and Framework Licence.
Extending the Multimodal Conversation standard. Please have a look at the 2 min video (YouTube ) and video (non YouTube) illustrating MPAI-MMC V2; the slides presented at the online meeting on 2022/07/12; the video recording of the online presentation (Youtube, non-YouTube) made at that 12 July presentation; and the Call for Technologies, Use Cases and Functional Requirements, and Framework Licence. MPAI-AIF V2 calls for a range of technologies, such as:
1. Extraction of Personal Status, a set of internal characteristics from a person or avatar, currently Emotion, Cognitive State, and Attitude, conveyed by Modalities: Text, Speech, Face, and Gesture.
2. Generation of a speaking avatar from Text and Personal Status, typically generated by a machine conversing with a human.
3. Audio-Visual Scene Description to describe the structured composition of the audio-visual objects in a scene.
4. Avatar Model to describe a static avatar from the waist up displaying movements in face and gesture.
5. Avatar Descriptors to represent the instantaneous alterations of the face, head, arms, hands, and fingers of an Avatar Model.
6. Extraction of Speech and Face Descriptors for remote authentication.
Developing the Neural Network Watermarking (MPAI-NNW) standard providing the means to measure the performance of a neural network watermarking technology. Please have a look at the 1 min 30 sec video (YouTube ) and video (non YouTube) illustrating MPAI-MMC V2; the slides presented at the online meeting on 2022/07/12; the video recording of the online presentation (Youtube, non-YouTube) made at that 12 July presentation[ and the Call for Technologies, Use Cases and Functional Requirements, and Framework Licence.

MPAI is also engaged in several other projects which have not reached the Call for Technologies stage:

AI Health (MPAI-AIH): addresses users equipped with an AIF-enabled smartphone who collect, process, and license health data to a central service which satisfies data processing requests from third parties in line with the data licence. Improved neural network models are shared and improved via federated learning.
Avatar Representation and Animation (MPAI-ARA): addresses the extraction of visual human features to animate a speaking avatar which accurately reproduces the features and the movements of a human.
Connected Autonomous Vehicle (MPAI-CAV): addresses the AI Modules and AI Workflows of a CAV, i.e., a system capable of moving autonomously based on the analysis of the data produced by a range of sensors exploring the environment and the information transmitted by other sources in range.
AI-based End-to-End Video Coding (MPAI-EEV): seeks to reduce the number of bits required to represent 2D video by exploiting AI-based end-to-end data coding technologies without being constrained by how data coding has traditionally been used for video coding.
AI-Enhanced Video Coding (MPAI-EVC): aims at substantially enhancing the performance of a traditional video codec (MPEG-5 EVC) by improving or replacing traditional tools with AI-based tools.
Integrative Genomic/Sensor Analysis (MPAI-GSA): aims at understanding and compressing the result of high-throughput experiments combining genomic/proteomic and other data, e.g., from video, motion, location, weather, and medical sensors.
Mixed-Reality Collaborative Spaces (MPAI-MCS): addresses virtual spaces where humans and avatars collaborate to achieve common goals, such as Conversation About a Scene (CAS) and Avatar-Based Videoconference (ABV). These are two use cases enabled by MPAI-MMC V2.
Visual Object and Scene Description (MPAI-OSD): addresses use cases sharing the goal of describing visual objects and locating them in space. Scene description includes the description of objects, their attributes in a scene and their semantic description.
Server-based Predictive Multiplayer Gaming (MPAI-SPG): aims to mitigate the gameplay discontinuities caused by high latency or packet losses in online and cloud gaming applications and to detect game players who are getting an unfair advantage by manipulating the data generated by their game client.
XR Venues (MPAI-XRV) addresses use cases enabled by AR/VR/MR (XR) and enhanced by Artificial Intelligence technologies. Examples are eSports, Experiential retail/shopping, and Immersive art experiences.

MPAI Store: Standards are about interoperability, but what is MPAI Interoperability? MPAI defines it as the ability to replace an Implementation of an AI Workflow or an AI Module with a functionally equivalent and conforming Implementation. MPAI defines 3 Interoperability Levels of an AIW executed in an AIF:

Level 1 – The AIW is implementer-specific and satisfies the MPAI-AIF Standard.

Level 2 – The AIW is specified by an MPAI Application Standard.

Level 3 – The AIW is specified by an MPAI Application Standard and validated by a Performance Assessor.

Implementations should be labelled so as not to confuse users. The Governance of the MPAI Ecosystem assigns this task to the MPAI Store, a not-for-profit organisation that verifies the security of implementations, tests the claimed conformance to an MPAI technical specification, records the result of a Performance Assessor, and makes the implementation available for download. The MPAI Store also manages a reputation system recording reviews of MPAI implementation.

MPAI offers Users access to the promised benefits of AI with a guarantee of increased transparency, trust and reliability as the Interoperability Level of an Implementation moves from Level 1 to 3.

No Comments InAll posts

Leonardo Chiariglione
2022-07-20

The second round of MPAI standardisation begins

On 19 July 2020 – two years ago – the wild idea of an organisation dedicated to the development of AI-based data coding standards was made public. What has happened in these two years?

MPAI was established in September 2020.
Four Calls for Technologies were published in December 2020, and January-February 2021.
The corresponding four Technical Specifications were published in September-November-December 2021:
1. AI Framework (MPAI-AIF, a standard environment to execute AI workflows composed of AI Modules),
2. Compression and Understanding of Industrial Data (MPAI-CUI, standard AI-based financial data processing technologies and their application to Company Performance Prediction).
3. Multimodal Conversation (MPAI-MMC, standard AI-based human-machine conversation technologies and their application to 5 use cases),
4. Context-based Audio Enhancement (MPAI-CAE, standard AI-based audio experience-enhancement technologies and their application to 4 use cases)
Completion of the set of specifications composing an MPAI standard, namely: Reference Software, Conformance Testing and Performance Assessment in addition to Technical Specification. So far this has been partly achieved.
IEEE adoption without modification of the Technical Specifications. The first MPAI technical specification converted to an IEEE standard is expected to be approved in the second half of September 2022.
Publication of three Calls for Technologies and associated Functional and Commercial Requirements for data formats and technologies:
1. The extended AI Framework standard (MPAI-AIF V2) will retain the functionalities specified by Version 1 and will enable the components of the Framework to access security functionalities.
2. The extended Multimodal Conversation standard (MPAI-MMC V2) will enable a variety of new use cases such as separation and location of audio-visual objects in a scene (e.g., human beings, their voices and generic objects); the ability of a party in metaverse1 to import an environmental setting and a group of avatars from metaverse2; representation and interpretation of the visual features of a human to extract information about their internal state (e.g., emotion) or to accurately reproduce the human as an avatar.
3. The Neural Network Watermarking standard (MPAI-NNW) will provide the means to assess if the insertion of a watermark deteriorates the performance of a neural network; how well a watermark detector can detect the presence of a watermark and a watermark decoder can retrieve the payload; and how to quantify the computational cost to inject, detect, and decode a payload.
Finally, MPAI has decided to establish the MPAI Store. This is the place where implementations of MPAI technical specifications will be submitted, validated, tested, and made available for download.

A short life with many results. Much more to accomplish.

No Comments InAll posts

Leonardo Chiariglione
2022-07-19

MPAI calls for technologies supporting three new standards

Geneva, Switzerland – 19 July 2022. Today the international, non-profit, unaffiliated Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) standards developing organisation has concluded its 22^nd General Assembly. Among the outcomes is the publication of three Calls for Technologies supporting the Use Cases and Functional Requirements identified for extensions of two existing standards – AI Framework and Multimodal Conversation – and for a new standard – Neural Network Watermarking.

Each of the three Calls is accompanied by two documents. The first document identifies the Use Cases whose implementation the standard is intended to enable and the Functional Requirements that the proposed data formats and associated technologies are expected to support.

The extended AI Framework standard (MPAI-AIF V2) will retain the functionalities specified by Version 1 and will enable the components of the Framework to access security functionalities.

The extended Multimodal Conversation (MPAI-MMC V2) will specify a variety of new technologies such as separation and location of audio-visual objects in a scene (e.g., human beings, their voices and generic objects); the ability of a party in metaverse1 to import an environmental setting and a group of avatars from metaverse2; representation and interpretation of the visual features of a human to extract information about their internal state (e.g., emotion) or to accurately reproduce the human as an avatar.

Neural Network Watermarking (MPAI-NNW) will provide the means to assess if the insertion of a watermark deteriorates the performance of a neural network; how well a watermark detector can detect the presence of a watermark and a watermark decoder can retrieve the payload; and how to quantify the computational cost to inject, detect, and decode a payload.

The second document accompanying a Call for Technologies is the Framework Licence for the standard that will be developed from the technologies submitted in response to the Call. The Framework Licence is a licence without critical data such as cost, dates, rates etc.

The document packages of the Calls can be found on the MPAI website.

Those intending to respond to the Calls should do so by submitting their responses to the MPAI secretariat by 23:39 UTC on 10 October 2022.

So far, MPAI has developed 5 standards (not italic in the list below), is currently engaged in extending 2 approved standards (underlined) and is developing another 10 standards (italic).

Name of standard	Acronym	Brief description
AI Framework	MPAI-AIF	Specifies an infrastructure enabling the execution of implementations and access to the MPAI Store.
Context-based Audio Enhancement	MPAI-CAE	Improves the user experience of audio-related applications in a variety of contexts.
Compression and Understanding of Industrial Data	MPAI-CUI	Predicts the company’s performance from governance, financial, and risk data.
Governance of the MPAI Ecosystem	MPAI-GME	Establishes the rules governing the submission of and access to interoperable implementations.
Multimodal Conversation	MPAI-MMC	Enables human-machine conversation emulating human-human conversation.
Avatar Representation and Animation	MPAI-ARA	Specifies descriptors of avatars impersonating real humans.
Connected Autonomous Vehicles	MPAI-CAV	Specifies components for Environment Sensing, Autonomous Motion, and Motion Actuation.
End-to-End Video Coding	MPAI-EEV	Explores the promising area of AI-based “end-to-end” video coding for longer-term applications.
AI-Enhanced Video Coding	MPAI-EVC	Improves existing video coding with AI tools for short-to-medium term applications.
Integrative Genomic/Sensor Analysis	MPAI-GSA	Compresses high-throughput experiments’ data combining genomic/proteomic and other data.
Mixed-reality Collaborative Spaces	MPAI-MCS	Supports collaboration of humans represented by avatars in virtual-reality spaces.
Neural Network Watermarking	MPAI-NNW	Measures the impact of adding ownership and licensing information to models and inferences.
Visual Object and Scene Description	MPAI-OSD	Describes objects and their attributes in a scene.
Server-based Predictive Multiplayer Gaming	MPAI-SPG	Trains a network to compensate data losses and detects false data in online multiplayer gaming.
XR Venues	MPAI-XRV	XR-enabled and AI-enhanced use cases where venues may be both real and virtual.

Most importantly: please join MPAI, share the fun, build the future.

No Comments InAll posts

Leonardo Chiariglione
2022-07-08

What is new in MPAI Multimodal Conversation

The MPAI project called Multimodal Conversation (MPAI-MMC), one of the earliest MPAI projects, has the ambitious goal of using AI to enable forms of conversation between humans and machines that emulate the conversation between humans in completeness and intensity. An important element to achieving this goal is the leveraging of all modalities used by a human when talking to another human: speech, but also text, face, and gesture.

In the Conversation with Emotion use case standardised in Version 1 (V1) of MPAI-MMC, the machine activates different modules, that MPAI calls AI Modules (AIM) that produce data in response to the data generated by a human:

AI Module	Produces	What data	From what data
Speech Recognition (Emotion)	Extracts	Text Human speech emotion	Speech
Language Understanding	Produces	Refined text	Recognised text
	Extracts	Meaning Text emotion	Recognised text
Video Analysis	Extracts	Face emotion	Face Object
Emotion Fusion	Produces	Fused emotion	Text Emotion Speech Emotion Face Emotion
Dialogue Processing	Produces	Machine text Machine emotion	Meaning Refined Text Fused Emotion
Speech Synthesis (Emotion)	Produces	Machine speech with Emotion	Text Emotion
Lips Animation	Produces	Machine Face with Emotion	Speech Emotion

This is graphically depicted in Figure 1 where the green blocks correspond to the AIMs.

Figure 1 – Conversation with Emotion (V1)

Multimodal Conversation Version 2 (V2), for which a Call for Technologies is planned to be issued on 19 July 2022, intends to improve MPAI-MMC V1 by extending the notion of Emotion with the notion of Personal Status. This is the ensemble of personal information that includes Emotion, Cognitive State, and Attitude. The former two – Emotion and Cognitive State – result from the interaction with the environment, while the last – Attitude – is the stance that will be taken for new interactions based on the achievedEmotion and Cognitive State.

Figure 2 shows the composite AI Module introduced in MPAI-MMC V2: Personal Status Extraction (PSE). This contains specific AIMs that describe the individual text, speech, face and gesture modalities and interpret descriptors. PSE plays a fundamental role in the human-machine conversation as we will see soon.

Figure 2 – Personal Status Extraction

A second fundamental component – Personal Status Display (PSD) – is depicted in Figure 3. Its role is to enable the machine to manifest itself to the party it is conversing with. The manifestation is driven by the words generated by the machine and by the Personal Status it intends to attach to its speech, face, and gesture.

Figure 3 – Personal Status Display

Is there a reason why the word “party” has been used in lieu of “human”. Yes, there is. The Personal Status Display can be used to manifest a machine to a human, but potentially to another avatar. The same can be said of Personal Status Extraction which can extract the Personal Status of a human, but could do that on an avatar as well. MPAI-MMC V2 has examples of both.

Figure 4 shows how can we can leverage the Personal Status Extraction and Personal Status Display AIMs to enhance the performance of Conversation with Emotion – pardon – Conversation with Personal Status.

Figure 4 – Conversation with Personal Status V2.0

In Figure 4 speech recognition extracts the text from speech. Language Understanding Question and Dialogue Processing can do a better job because they have access to Personal Status. Finally, the Personal Status Display is a re-usable component that generates a speaking avatar from text and the Personal Status conveyed by the three speech, face, and gesture modalities.

Figure 4 assumes that the outside world provides clean speech, face and gesture. Most often, unfortunately, this is not the case. There is no single speech and, even if there is just one, it is embedded in all sorts of sounds surrounding us. The same can be said of face and gesture. There may be more than one person, and extracting the face or the head, arms, hands, and finger making up the gesture of a human is anything but simple. Figure 5 introduces two critical components Audio Scene Description (ASD) and Visual Scene Description (VSD).

Figure 5 – Conversation with Personal Status and Audio-Visual Scene Description

The task of Audio-Visual Scene Description (AVSD) can be described as “digitally describe a portion of the world with a level of clarity and precision achievable by a human”. The goal expressed in this form can be both unattainable with today’s technology because description of “any” scene is too general. On the other hand, it can also be not sufficient for some purposes because very often the world can be described by using sensors a human does not have.

The scope of Multimodal Conversation V2, however, is currently limited to 3 use cases:

A human has a conversation with a machine about the objects in a room.
A group of humans has a conversation with a Connected Autonomous Vehicle (CAV) outside and inside it (in the cabin).
Groups of humans have a videoconference where humans are individually represented by avatars having a high similarity with the humans they represent.

VSD should provide a description of the visual scene as composed of visual objects classified as human and generic objects. The human object should be decomposable in face, head, arm, hand, and finger objects and should have position and velocity information. The ASD should provide a description of the speech sources as audio objects with their position and velocity.

The first use case is well represented by Figure 6.

Figure 6 – Conversation About a Scene

The machine sees the human as a human object. The Object Identification ID uses the Gesture Descriptors to understand where the finger of the human points at. If at that position there is an object, the Object Identification AIM uses the Physical Object Descriptors to assign an ID to the object. The machine also feeds Face Object and Human Object into the Personal Status Extraction AIM to understand what the human’s Emotion, Cognitive State and Attitude in order is to enable the Question and Dialogue Processing AIM to fine tune its answer.

Is this all we have to say about Multimodal Conversation V2.0? Well, no, this is the beginning. So, stay tuned for more news or, better, attend the MPAI-MMC V2 online presentation on Tuesday 12 July 2022 at 14 UTC. Please register here to attend.

No Comments InAll posts

Leonardo Chiariglione
2022-07-04

An introduction to MPAI Multimodal Conversation V2

The MPAI project called Multimodal Conversation (MPAI-MMC) has the ambitious goal to use AI to enable forms of human-machine conversation that emulate human-human conversation in completeness and intensity. This means that MMC will leverage all modalities that a human uses when talking to another human: of course, speech, but also text, face and gesture.

In the Conversation with Emotion use case of MMC V1 the machine activates different modules (in italic) to produce data (underlined) in response to a human:

Speech Recognition (Emotion) extracts text and speech emotion.
Language Understanding produces refined text, and extracts meaning and text emotion.
Video Analysis extracts face emotion.
Emotion Fusion fuses the 3 emotions into fused emotion.
Dialogue Processing produces machine text and machine emotion.
Speech Synthesis (Emotion) produces speech with machine emotion.
Lips Animation produces machine face (an avatar) with facial emotion and lips in sync with speech.

This is depicted in Figure 1.

Multimodal Conversation Version 2 (V2) intends to substantially improve MPAI-MMC V2 by adding Personal Cognitive State and Attitude to Emotion. The combination of the three is called Personal Status, the ensemble of information internal to a person. Emotion and Cognitive State are the result of an interaction with the environment, while Attitude is the stance for new interactions.

Figure 1 shows one component – Personal Status Extraction (PSE) – identified for MPAI-MMC V2. PSE, a Composite AIM containong other specific AIMs that describe modalities and interpret derscriptors, plays a fundamental role in human-machine conversation

Figure 1 – Personal Status Extraction

A second fundamental component – Personal Status Display – is depicted in Figure 2.

Figure 2 – Personal Status Description

No Comments InAll posts

Leonardo Chiariglione
2022-06-22

Functional requirements for 3 new standards published

Geneva, Switzerland – 22 June 2022. Today the international, non-profit, unaffiliated Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) standards developing organisation has concluded its 21^st General Assembly. Among the outcomes is the approval of three Use Cases and Functional Requirements documents for AI Framework V2, Multimodal Conversation V2 and Neural Network Watermarking V1.

This milestone is important because MPAI Principal Members intending to participate in the development of the standards can develop the Framework Licences of the three planned standards. The Framework Licence has been devised by MPAI to facilitate the practical availability of approved standards (see here for an example). It is a licence without critical data such as cost, dates, rates etc. MPAI is now drafting the Calls for Technologies for the 3 standards and plans to adopt and publish them on 2022/07/19, the 2^nd anniversary of the launch of the MPAI project.

AI Framework (MPAI-AIF) V1 specifies an infrastructure enabling the execution of implementations and access to the MPAI Store. V2 will add security support to the framework and is the next step following today’s release of the MPAI-AIF V1 Reference Software.

Multimodal Conversation (MPAI-MMC) V1 Enables human-machine conversation emulating human-human conversation. V2 will specify technologies supporting 5 new use cases:

Personal Status Extraction: provides an estimate of the Personal Status (PS) – of a human or an avatar – conveyed by Text, Speech, Face, and Gesture. PS is the ensemble of information internal to a person, including Emotion, Cognitive State, and Attitude.
Personal Status Display: generates an avatar from Text and PS that utters speech with the intended PS while the face and gesture show the intended PS.
Conversation About a Scene: a human holds a conversation with a machine about objects in a scene. While conversing, the human points their fingers to indicate their interest in a particular object. The machine is helped by the understanding of the human’s PS.
Human-Connected Autonomous Vehicle (CAV) Interaction: a group of humans converse with a CAV which understands the utterances and the PSs of the humans it converses with and manifests itself as the output of a Personal Status Display.
Avatar-Based Videoconference: avatars representing humans with a high degree of accuracy participate in a videoconference. A virtual secretary (VS) represented as an avatar displaying PS creates an online summary of the meeting with a quality enhanced by the virtual secretary’s ability to understand the PS of the avatar it converses with.

Neural Network Watermarking (MPAI-NNW): will provide the means to measure, for a given size of the watermarking payload, the ability of 1) the watermark inserter to inject a payload without deteriorating the NN performance, 2) the watermark detector to recognise the presence and the watermark decoder to successfully retrieve the payload of the inserted watermark, 3) the watermark inserter to inject a payload and the watermark detector/decoder to detect/decode a payload from a watermarked model or from any of its inferences at a measured computational cost.

MPAI will hold four online presentations of the documents on the following dates:

Title	Acronym	Day of July	Time	Note
AI Framework V2	MPAI-AIF	11	15:00 UTC	Register
Multimodal Conversation V2	MPAI-MMC	07	14:00 UTC	Register
Multimodal Conversation V2	MPAI-MMC	12	14:00 UTC	Register
Neural Network Watermarking	MPAI-NNW	12	15:00 UTC	Register

MPAI-MMC will be presented in two sessions because of the number and scope of the use cases and of the supporting technologies.

Those intending to attend a presentation event are invited to register at the link above.

So far, MPAI has developed 5 standards (normal font in the list below), is currently engaged in extending two approved standards (underlined) and is developing other 9 standards (italic).

Name of standard	Acronym	Brief description
AI Framework	MPAI-AIF	Specifies an infrastructure enabling the execution of implementations and access to the MPAI Store.
Context-based Audio Enhancement	MPAI-CAE	Improves the user experience of audio-related applications in a variety of contexts.
Compression and Understanding of Industrial Data	MPAI-CUI	Predicts the company performance from governance, financial, and risk data.
Governance of the MPAI Ecosystem	MPAI-GME	Establishes the rules governing the submission of and access to interoperable implementations.
Multimodal Conversation	MPAI-MMC	Enables human-machine conversation emulating human-human conversation.
Server-based Predictive Multiplayer Gaming	MPAI-SPG	Trains a network to compensate data losses and detects false data in online multiplayer gaming.
AI-Enhanced Video Coding	MPAI-EVC	Improves existing video coding with AI tools for short-to-medium term applications.
End-to-End Video Coding	MPAI-EEV	Explores the promising area of AI-based “end-to-end” video coding for longer-term applications.
Connected Autonomous Vehicles	MPAI-CAV	Specifies components for Environment Sensing, Autonomous Motion, and Motion Actuation.
Avatar Representation and Animation	MPAI-ARA	Specifies descriptors of avatars impersonating real humans.
Neural Network Watermarking	MPAI-NNW	Measures the impact of adding ownership and licensing information to models and inferences.
Integrative Genomic/Sensor Analysis	MPAI-GSA	Compresses high-throughput experiments data combining genomic/proteomic and other.
Mixed-reality Collaborative Spaces	MPAI-MCS	Supports collaboration of humans represented by avatars in virtual-reality spaces.
Visual Object and Scene Description	MPAI-OSD	Describes objects and their attributes in a scene.

Visit the MPAI website, contact the MPAI secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook, Instagram, and YouTube.

Most importantly: join MPAI, share the fun, build the future.

No Comments InAll posts

Leonardo Chiariglione
2022-06-04

MPAI wants to do it again

On the 30^th of September 2021, on the first anniversary of its incorporation, MPAI approved Version 1 of its Multimodal Conversation standard (MPAI-MMC). The standard included 5 use cases: Conversation with Emotion, Multimodal Question Answering and e Automatic Speech Translation Use Cases. Three months later, MPAI approved Version 1 of Context-based Audio Enhancement (MPAI-CAE). The standard included 4 use cases: Emotion-Enhanced Speech, Audio Recording Preservation, Speech Restoration System and Enhanced Audioconference Experience.

A lot more has happened in MPAI beyond these two standards, even before the approval of the two standards, and now MPAI is ready to launch a new project that includes 5 use cases:

Personal Status Extraction (PSE).
Personal Status-driven Avatar (PSA).
Conversation About a Scene (CAS).
Human-CAV (Connected Autonomous Vehicle) Interaction (HCI).
Avatar-Based Videoconference (ABV).

This article will give a brief introduction to the 5 use cases.

Personal Status Extraction (PSE). Personal Status is a set of internal characteristics of a person, currently, Emotion, Cognitive State, and Attitude. Emotion and Cognitive State result from the interaction of a human with the Environment. Cognitive State is more rational (e.g., “Confused”, “Dubious”, “Convinced”). Emotion is less rational (e.g., “Angry”, “Sad”, “Determined”). Attitude is the stance that a human takes when s/he has reached an Emotion and Cognitive State (e.g., “Confrontational”, “Respectful”, “Soothing”). The PSE use case is about how Personal Status can be extracted from its Manifestations: Text, Speech, Face and Gesture.
Personal Status-driven Avatar (PSA). In Conversation with Emotion (MPAI-MMC V1) a machine was represented by an avatar whose speech and face displayed an emotion congruent with the emotion displayed by a human the machine is conversing with. The PSA use case is about the interaction of a machine with humans in different use cases. The machine is represented by an avatar whose text, speech, face, and gesture display a Personal Status congruent with the Personal Status manifested by the human the machine is conversing with.
Conversation About a Scene (CAS): A human and a machine converse about the objects in a room with little or no noise. The human uses a finger to indicate their interest in a particular object. The machine understands the Personal Status shown by the human in their speech, face, and gesture, e.g., the human’s satisfaction because the machine understands their question. The machine manifests itself as the head-and-shoulders of an avatar whose face and gesture (head) convey the machine’s Personal Status resulting from the conversation in a way that is congruent with the speech it utters.
Human-CAV (Connected Autonomous Vehicle) Interaction (HCI): a group of humans converse with a Connected Autonomous Vehicle (CAV) on a domain-specific subject (travel by car). The conversation can be held both outside of the CAV when the CAV recognises the humans to let them into the CAV or inside when the humans are sitting in the cabin. The two Environments are assumed to be noisy. The machine understands the Speech, and the human’s Personal Status shown on their Text, Speech, Face, and Gesture. The machine appears as the head and shoulders of an avatar whose Text, Speech, Face, and Gesture (Head) convey a Personal Status congruent with the Speech it utters.
Avatar-Based Videoconference (ABV). Avatars representing geographically distributed humans participate in a videoconference reproducing the movements of the upper part of the human participants (from the waist up) with a high degree of accuracy. Some locations may have more than one participant. A special participant in the Virtual Environment where the Videoconference is held can be the Virtual Secretary. This is an entity displayed as an avatar not representing a human participant whose role is to: 1) make and visually share a summary of what other avatars say; 2) receive comments on the summary; 3) process the vocal and textual comments taking into account the avatars’ Personal Status showing in their text, speech, face, and gesture; 4) edit the summary accordingly; and 5) display the summary. A human participant or the meeting manager composes the avatars’ meeting room and assigns each avatar’s position and speech as they see fit.

These use cases imply a wide range of technologies (more than 40). While the requirements for these technologies and the full description of the use cases are planned to be approved at the next General Assembly (22 June), MPAI is preparing the Framework Licence and the Call for Technologies. The latter two are planned to be approved at the next-to-next General Assembly on 19 July. MPAI gives respondents about 3 months to complete their submissions.

More information about the MPAI process and the Framework Licence is available on the MPAI website.

No Comments InAll posts

Cookie	Duration	Description
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Technical".
CookieLawInfoConsent	1 year	The cookie is set by the GDPR Cookie Consent plug-in and is used to store whether the user has consented to the use of cookies or not. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_pk_id.6.08a8	13 months	Used to store a few details about the user such as the unique visitor ID
_pk_ses.6.08a8	30 minutes	Short lived cookies used to temporarily store data for the visit

Category All posts

Notice