1        Introduction

MPAI’s standards development is based on projects evolving through a workflow extending on 6 + 1 stages.

# Acr Name Description
0 IC Interest Collection Collection and harmonisation of use cases proposed
1 UC Use cases Proposals of use cases, their description and merger of compatible use cases
2 FR Functional Reqs Identification of the functional requirements that the standard should satisfy
3 CR Commercial Reqs Development and approval of the framework licence of the standard
4 CfT Call for Technologies Preparation and publication of a document calling for technologies supporting the requirements
5 SD Standard development Development of the standard in a specific Development Committee (DC)
6 MS MPAI standard The standard has been successfully completed and all Members have made the appropriate declarations

A project progresses from one stage to the next by resolution of the General Assembly.

The stages of currently active MPAI projects (MPAI-11)  are graphically represented by Figure 1.

Figure 1 – Snapshot of the MPAI work plan

2        Areas at stage 5 (SD)

2.1       MPAI-AIF

Artificial Intelligence Framework (MPAI-AIF) enables creation and automation of mixed ML-AI-DP processing and inference workflows for the application areas work currently considered at stages 1, 2 and 3 of the MPAI work plan. MPAI-AIF will be extended to support new applications areas if the need will arise.

The said areas of work share the notion of an environment (the Framework) that includes 6 com­ponents – Management and Control, Execution, AI Modules (AIM), Communication, Storage and Access. AIMs are connected in a variety of topologies and executed under the super­vision of Management and Control. AIMs expose standard interfaces that make them re-usable in different applications. Figure 2 shows the general MPAI-AIF Reference Model.

Figure 2 – Reference model of the MPAI AI Framework

See the MPAI-AIF web page.

Stage 6 is expected to be reached in October 2021.

2.2       MPAI-CAE

Context-based Audio Enhancement (MPAI-CAE) improves the user experience for several audio-related applications including entertainment, communication, teleconferencing, gaming, post-production, restoration etc. in a variety of contexts such as in the home, in the car, on-the-go, in the studio etc. using context information to act on the input audio content using AI, processing such content via AIMs, and may deliver the processed output via the most appropriate protocol.

So far, MPAI-CAE has been found applicable to 11 usage examples, for 4 of which the definition of AIM interfaces is at an advanced stage: Emotion enhanced speech, Audio Recording Preservation, Enhanced Audioconference Experience and Audio-on-the-go. Figure 3 addresses the Emotion enhanced speech Use Case.

Figure 3 An MPAI-CAE Use Case: Emotion-enhanced speech

See the MPAI-CAE web page.

Stage 6 is expected to be reached in October 2021.

2.3       MPAI-MMC

Multi-modal conversation (MPAI-MMC) aims to enable human-machine conversation that emulates human-human conversation in completeness and intensity by using AI.

So far, 3 Use Cases have been identified for MPAI-MMC: Conversation with emotion, Multimodal Question Answering (QA) and Personalized Automatic Speech Translation.

Figure 4 addresses the Conversation with emotion Use Case.

Figure 4 An MPAI-MMC Use Case: Conversation with emotion

See the MPAI-MMC web page.

Stage 6 is expected to be reached in September2021.

2.4       MPAI-CUI

Compression and understanding of industrial data (MPAI-CUI) aims to enable AI-based filtering and extraction of key information to predict company performance by applying Artificial Intellig­ence to governance, financial and risk data.

MPAI-CUI requires standardisation of all data formats to be fed into an AI machine to extract information that is relevant to the intended use. Converted data undergo a further conversion and are then fed to specific neural networks. This is depicted in Figure 5.

Figure 5 The MPAI-CUI Use Case

See the MPAI-CUI web page.

Stage 6 is expected to be reached in September2021.

3        Areas at stage 2 (FR)

3.1       MPAI-SPG

Server-based Predictive Multiplayer Gaming (MPAI-SPG) aims to minimise the audio-visual and gameplay discontinuities caused by high latency or packet losses during an online real-time game. In case information from a client is missing, the data collected from the clients involved in a particular game are fed to an AI-based system that predicts the moves of the client whose data are missing.

Figure 7 depicts the MPAI-SPG reference model connected to a cloud gaming server.

Figure 7 – MPAI-SPG standardisation area (left)

See the MPAI-SPG web page.

3.2       MPAI-EVC

AI-Enhanced Video Coding (MPAI-EVC) is a video compression stan­dard that substantially en­hances the performance of a traditional video codec by improving or replacing traditional tools with AI-based tools. Two approaches – Horizontal Hybrid and Vertical Hybrid – are envisaged. The Horizontal Hybrid approach introduces AI based algorithms combined with trad­itional image/video codec, trying to replace one block of the traditional schema with a machine learn­ing-based one. This case can be described by Figure 8 where green circles represent tools that can be replaced or enhanced with their AI-based equivalent.

Figure 8 A reference diagram for the Horizontal Hybrid approach

The Vertical Hybrid approach envigaes an AVC/HEVC/EVC/VVC base layer plus an enhanced machine learning-based layer. This case can be represented by Figure 7.

Figure 9 – A reference diagram for the Vertical Hybrid approach

See the MPAI-SPG web page.

3.3       MPAI-GSA

Integrative Genomic/Sensor Analysis (MPAI-GSA) uses AI to understand and compress the res­ult of high-throughput experiments combining genomic/proteomic and other data, e.g., from video, motion, location, weather, medical sensors.

So far, MPAI-GSA has been found applicable to 4 Use Areas (collections of compatible Use Cases):

  1. Integrative analysis of ‘omics datasets
  2. Smart Farming
  3. Genomics and phenotypic/spatial data
  4. Genomics and behaviour

Figure 6 addresses the Use Case Smart Farming.

Figure 6 An MPAI-GSA Use Case: Smart Framing

See the MPAI-GSA web page.


Connected Autonomous Vehicles (CAV) is a standard project seeking to standardise thel components that enable the implementation of a Connected Autonomous Vehicle (CAV), i.e., a mechanical system capable of executing the com­mand to move its body auronomously – save for the exceptional intervention of a human – based on the analysis of the data produced by a range of sensors exploring the environment and the information transmitted by other sources in range, e.g., CAVs and roadside units (RSU).

Figure 10 The MPAI-CAV subsystems

See the MPAI-CAV web page.


Mixed-Reality Collaborative Spaces is a standard project that uses Artificial intelligence throughout Mixed-Reality Collaborative Space (MCS) systems for immersive presence, spatial maps (e.g. Lidar scans, inside-out tracking) rendering, and multiuser synchronization) etc.

The current focus is on Avatar videoconference in a local 3D audio-visual space of which Figure 11 is a subsystem.

Figure 11 – The Avatar videoconference transmission side

See the MPAI-MCS web page.

4        Areas at stage 1 (UC)

4.1       MPAI-OSD

Visual object and scene description is a collection of Use Cases sharing the goal of describe visual object and locate them in the space. Scene description includes the usual des­cription of objects and their attributes in a scene and the semantic description of the objects.

Unlike proprietary solutions that address the needs of the use cases but lack interoperability or force all users to adopt a single technology or application, a standard representation of the ob­jects in a scene allows for better satifaction of the requirements.

Approved MPAI document supporting the MPAI-OSD work area is:

  1. MPAI Application Note #8 – MPAI-OSD, N158 [19]

5        Areas at stage 0 (IC)

5.1 Vision-to-Sound Transformation

It is possible to give a spatial representation of an image that visually impaired people can hear with two headphones as a localization and description medium. It is a conversion (compression) technique from one space to a different interpretation space.

6        Other possible areas

Several potential areas for standardisation are likely to emerge from [22].

6.1       Anomalous service access

A machine that has learnt “typical” service access values for a particular service provider can detect attempts beyond “typical” values.

6.2       Anomalous vibrations

A machine learns from the data generated by inertial sensors (accelerometer with gyroscope) to distinguish between regular and anomalous vibrations.