MPAI develops its standard using a 6-stage workflow
|1||UC||Use cases||Proposals of use cases, their description and merger of compatible use cases|
|2||FR||Functional Reqs||Identification of the functional requirements that the standard should satisfy|
|3||CR||Commercial Reqs||Development and approval of the framework licence of the standard|
|4||CfT||Call for Technologies||Preparation and publication of a document calling for technologies supporting the requirements|
|5||SD||Standard development||Development of the standard in a specific Development Committee (DC)|
|6||MS||MPAI standard||The standard has been successfully completed and all Members have made the appropriate declarations|
2 Areas at stage 3 (CR)
Artificial Intelligence Framework (MPAI-AIF) enables creation and automation of mixed ML-AI-DP processing and inference workflows at scale for the areas of work currently considered at stage 2 of the MPAI work plan.
The said areas of work share the notion of an environment (the Framework) that includes 6 components – Management and Control, Execution, AI Modules (AIM), Communication, Storage and Access. AIMs are connected in a variety of topologies and executed under the supervision of Management and Control. AIMs expose standard interfaces that make them re-usable in different applications. Figure 1 shows the general MPAI-AIF Reference Model.
Approved MPAI documents supporting the MPAI-AIF project are  and .
Figure 1 – Reference model of the MPAI AI Framework
3 Areas at stage 2 (FR)
Context-based Audio Enhancement (MPAI-CAE) improves the user experience for several audio-related applications including entertainment, communication, teleconferencing, gaming, post-production, restoration etc. in a variety of contexts such as in the home, in the car, on-the-go, in the studio etc. using context information to act on the input audio content using AI, processing such content via AIMs, and may deliver the processed output via the most appropriate protocol.
So far, MPAI-CAE has been found applicable to 11 usage examples, for 4 of which the definition of AIM interfaces is under way: Enhanced audio experience in a conference call, Audio-on-the-go, Emotion enhanced synthesized voice and AI for audio documents cultural heritage. Figure 2 addresses the Audio-on-the-go usage example.
Approved MPAI documents supporting the MPAI-CAE work area are  and .
Figure 2 – An MPAI-CAE usage example: Audio-on-the-go
Integrative Genomic/Sensor Analysis (MPAI-GSA) uses AI to understand and compress the results of high-throughput experiments combining genomic/proteomic and other data, e.g. from video, motion, location, weather, medical sensors.
So far, MPAI-GSA has been found applicable to 7 usage examples ranging from personalised medicine to smart farming, for 2 of which the definition of AIM interfaces is under way: Personalised and Integrative Genomics and Automated Analysis of Animal Behaviour.
Approved MPAI documents supporting the MPAI-GSA work area are  and .
Figure 3 addresses the usage example Smart Farming
Figure 3 – An MPAI-GSA usage example:
Multi-modal conversation (MPAI-MMC) aims to enable human-machine conversation that emulates human-human conversation in completeness and intensity by using AI.
So far, MPAI-GSA has been found applicable to 3 usage examples: Conversation with emotion, Multimodal Question Answering (QA) and Personalized Automatic Speech Translation.
Approved MPAI documents supporting the MPAI-GSA work area are  and .
Figure 4 – An MPAI-CAE usage example: Multimodal Question Answering (QA)
AI-Enhanced Video Coding (MPAI-EVC) is a video compression standard that substantially enhances the performance of a traditional video codec by improving or replacing traditional tools with AI-based tools. Two approaches – Horizontal Hybrid and Vertical Hybrid – are envisaged. The Horizontal Hybrid approach introduces AI based algorithms combined with traditional image/video codec, trying to replace one block of the traditional schema with a machine learning-based one. This case can be described by Figure 5 where green circles represent tools that can be replaced or enhanced with their AI-based equivalent.
Figure 5 – A reference diagram for the Horizontal Hybrid approach
The Vertical Hybrid approach envigaes an AVC/HEVC/EVC/VVC base layer plus an enhanced machine learning-based layer. This case can be represented by Figure 6.
Figure 6 – A reference diagram for the Vertical Hybrid approach
MPAI is engaged in the MPAI-EVC Evidence Project seeking to find evidence that AI-based technologies provide sufficient improvement to the Horizontal Hybrid approach. A second project on the Vertical Hybrid approach is being considered.
Approved MPAI documents supporting the MPAI-EVC work area are , ,  and .
Server-based Predictive Multiplayer Gaming (MPAI-SPG) aims to minimise the audio-visual and gameplay discontinuities caused by high latency or packet losses during an online real-time game. In case information from a client is missing, the data collected from the clients involved in a particular game are fed to an AI-based system that predicts the moves of the client whose data are missing.
Figure 7 – Identification of MPAI-SPG standardisation area
Approved MPAI document supporting the MPAI-EVC work area is .
Compression and understanding of industrial data (MPAI-CUI) aims to enable AI-based filtering and extraction of key information from the flow of data that combines data produced by companies and external data (e.g., data on vertical risks such as seismic, cyber etc.)
MPAI-CUI requires standardisation of all data formats to be fed into an AI machine to extract information that is relevant to the intended use. Because the data formats are so diverse, an intermediate format to which any type of data generated by companies in different industries and different countries need to be converted seems to be the only practical solution.
This is depicted in Figure 8.
Figure 8 – A reference diagram for MPAI-CUF
Approved MPAI document supporting the MPAI-CUI work area is .
4 Other possible areas
Several potential areas for standardisation are likely to emerge from . A selection of such promising area has been derived from 
Visual object and scene description addresses the “scene description” components of several use cases (Multiplayer online gaming ME.MP-09, Person matching ME.MP-11, Tracking game player’s movements ME.MP-12, AI-assisted driving TP.MP-01, Correct Posture HC.MP-02, Integrative genomic/video experiments ST.OD-06). Scene description includes the usual description of objects and their attributes in a scene and the semantic description of the objects.
Unlike proprietary solutions that address the needs of the use cases but lack interoperability or force all users to adopt a single technology or application, a standard representation of the objects in a scene allows for better satisfaction of the requirements.
4.2 Anomalous service access
A machine that has learnt “typical” service access values for a particular service provider can detect attempts beyond “typical” values.
4.3 Anomalous vibrations
A machine learns from the data generated by inertial sensors (accelerometer with gyroscope) to distinguish between regular and anomalous vibrations.
4.4 Vision-to-Sound Transformation
It is possible to give a spatial representation of an image that visually impaired people can hear with two headphones as a localization and description medium. It is a conversion (compression) technique from one space to a different interpretation space.
 MPAI Use Case Rev2.0