1       Introduction

2       Use Cases. 1

2.1       Context-based Audio Enhancement (MPAI-CAE) 2

2.2       Integrative Genomic/Sensor Analysis (MPAI-GSA) 3

2.3       AI-Enhanced Video Coding (MPAI-EVC) 6

2.4       Server-based Predictive Multiplayer Gaming (MPAI-SPG) 7

2.5       Multi-Modal Conversation (MPAI-MMC) 8

2.6       Compression and Understanding of Industrial data (MPAI-CUI) 9

3       Architecture. 10

4       Requirements. 11

4.1       Component requirements. 11

4.2       Systems requirements. 11

4.3       General requirements. 12

5       Conclusions. 12

6       References. 12

1        Introduction

Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) is an international association with the mission to develop AI-enabled data coding standards. Artificial Intelligence (AI) technologies have shown they can offer more efficient data coding than existing technol­ogies.

MPAI has analysed six use cases covering applic­ation areas benefiting from AI technol­ogies. Even though use cases are disparate, each of them can be implemented with a combination of processing modules performing functions that concur to achieving the inten­ded result.

MPAI has assessed that, leaving it to the market to develop individual implementations, would multiply costs and delay adoption of AI technologies, while modules with standard interfaces, combined and executed within the MPAI-specified AI Framework, will favour the emergence of horizontal markets where proprietary and competing module implemen­tations exposing standard interfaces will reduce cost, promote adoption and incite progress of AI technologies. MPAI calls these modules AI Modules (AIM).

MPAI calls the planned AI Framework standard as MPAI-AIF. As AI is a fast-moving field, MPAI expects that MPAI-AIF will be extended as new use cases will bring new requirements and new technologies will reach maturity.

To avoid the deadlock experienced in other high-technology fields, before engaging in the development of the the MPAI-AIF standard, MPAI will develop a Frame­work Licence (FWL) associated with the MPAI-AIF Architecture and Functional Requir­ements defined in this document. The FWL, essentially the business model that standard essential patent (SEP) holders will apply to monetise their Intellectual Properties (IP), but without values such as the amount or percentage of royalties or dates due, will act as Commercial Requirements for the standard and provide a clear IPR licensing framework.

This document contains a summary description of the six use cases (Section 2) followed by a section describing the architecture expected to become normative (Section 3). Section 4 lists the normative requirements identified so far.

2        Use Cases

The six use cases considered cover a broad area of application. Therefore, it is expected that the MPAI-AIF architecture can support a wide range of use cases of practical interest.

Each case is identified by its name and the acronym identifying the future MPAI standard. More information about MPAI-AIF can be found in [1].

2.1       Context-based Audio Enhancement (MPAI-CAE)

The overall user experience quality is highly dependent on the context in which audio is used, e.g.

  1. Entertainment audio can be consumed in the home, in the car, on public transport, on-the-go (e.g. while doing sports, running, biking) etc.
  2. Voice communications: can take place in the office, in the car, at home, on-the-go etc.
  3. Audio and video conferencing can be done in the office, in the car, at home, on-the-go etc.
  4. (Serious) gaming can be done in the office, at home, on-the-go etc.
  5. Audio (post-)production is typically done in the studio
  6. Audio restoration is typically done in the studio

By using context information to act on the content using AI, it is possible substantially to improve the user experience.

The following examples describe how MPAI-CAE can make the difference.

  1. Enhanced audio experience in a conference call

Often, the user experience of a video/audio conference can be marginal. Too much background noise or undesired sounds can lead to participants not understanding what participants are saying. By using AI-based adaptive noise-cancellation and sound enhancement, MPAI-CAE can virtually eliminate those kinds of noise without using complex microphone systems to capture environment characteristics.

  1. Pleasant and safe music listening while biking

While biking in the middle of city traffic, AI can process the signals captured from the environment by the microphones available in many earphones and earbuds (for active noise cancellation), adapt the sound rendition to the acoustic environment, provide an enhanced audio experience (e.g. performing dynamic signal equalization), improve battery life and selectively recognize and allow relevant environment sounds (i.e. the horn of a car). The user enjoys a satisfactory listening experience without losing contact with the acoustic surroundings.

  1. Emotion enhanced synthesized voice

Speech synthesis is constantly improving and finding several applications that are part of our daily life (e.g. intelligent assistants). In addition to improving the ‘natural sounding’ of the voice, MPAI-CAE can implement expressive models of primary emotions such as fear, happiness, sad­ness, and anger.

  1. Efficient 3D sound

MPAI-CAE can reduce the number of channels (i.e. MPEG-H 3D Audio can support up to 64 loudspeaker channels and 128 codec core channels) in an automatic (unsupervised) way, e.g. by mapping a 9.1 to a 5.1 or stereo (radio broadcasting or DVD), maintaining the musical touch of the composer.

  1. Speech/audio restoration

Audio restoration is often a time-consuming process that requires skilled audio engineers with specific experience in music and recording techniques to go over manually old audio tapes. MPAI-CAE can automatically remove anomalies from recordings through broadband denoising, declicking and decrackling, as well as removing buzzes and hums and performing spectrographic ‘retouching’ for removal of discrete unwanted sounds.

  1. Normalization of volume across channels/streams

Eighty-five years after TV has been first introduced as a public service, TV viewers are still strug­gling to adapt to their needs the different average audio levels from different broadcasters and, within a program, to the different audio levels of the different scenes.

MPAI-CAE can learn from user’s reactions via remote control, e.g. to a loud spot, and control the sound level accordingly.

  1. Automotive

Audio systems in cars have steadily improved in quality over the years and continue to be integrated into more critical applications. Today, a buyer takes it for granted that a car has a good automotive sound system. In addition, in a car there is usually at least one and sometimes two microphones to handle the voice-response system and the hands-free cell-phone capability. If the vehicle uses any noise cancellation, several other microphones are involved. MPAI-CAE can be used to improve the user experience and enable the full quality of current audio systems by reduc­ing the effects of the noisy automotive environment on the signals.

  1. Audio mastering

Audio mastering is still considered as an ‘art’ and the prerogative of pro audio engineers. Normal users can upload an example track of their liking (possibly obtained from similar musical content) and MPAI-CAE analyzes it, extracts key features and generate a master track that ‘sounds like’  the example track starting from the non-mastered track.  It is also possible to specify the desired style without an example and the original track will be adjusted accordingly.

More details on MPAI-CAE are found in [2,7]

2.2       Integrative Genomic/Sensor Analysis (MPAI-GSA)

Most experiment in quantitative genomics consist of a setup whereby a small amount of metadata – observable clinical score or outcome, desirable traits, observed behaviour – is correlated with, or modelled from, a set of data-rich sources. Such sources can be:

  1. Biological experiments – typically sequencing or proteomics/metabolomics data
  2. Sensor data – coming from images, movement trackers, etc.

All these data-rich sources share the following properties:

  1. They produce very large amounts of “primary” data as output
  2. They need “primary”, experiment-dependent, analysis, in order to project the primary data (1) onto a single point in a “secondary”, processed space with a high dimensionality – typically a vector of thousands of values
  3. The resulting vectors, one for each experiment, are then fed to some machine or statistical learning framework, which correlates such high-dimensional data with the low-dimensional metadata available for the experiment. The typical purpose is to either model the high-dimensional data in order to produce a mechanistic explanation for the metadata, or to produce a predictor for the metadata out of the high-dimensional data.
  4. Although that is not typically necessary, in some circumstances it might be useful for the statistical or machine learning algorithm to be able to go back to the primary data (1), in order to extract more detailed information than what is available as a summary in the processed high-dimensional vectors produced in (2).

It would be extremely beneficial to provide a uniform framework to:

  1. Represent the results of such complex, data-rich, experiments, and
  2. Specify the way the input data is processed by the statistical or machine learning stage

Although the structure above is common to a number of experimental setups, it is conceptual and never made explicit. Each “primary” data source can consist of heterogeneous information represented in a variety of formats, especially when genomics experiments are considered, and the same source of information is usually represented in different ways depending on the analysis stage – primary or secondary. That results in data processing workflows that are ad-hoc – two experiments combining different sets of sources will require two different workflows able to process each one a specific combination of input/output formats. Typically, such workflows will also be layered out as a sequence of completely separated stages of analysis, which makes it very difficult for the machine or statistical learning stage to go back to primary data when that would be necessary.

MPAI-GSA aims to create an explicit, general and reusable framework to express as many different types of complex integrative experiments as possible. That would provide (I) a compressed, optimized and space-efficient way of storing large integrative experiments, but also (II) the possibility of specifying the AI-based analysis of such data (and, possibly, primary analysis too) in terms of a sequence of pre-defined standardized algorithms. Such computational blocks might be partly general and prior-art (such as standard statistical algorithms to perform dimensional reduction) and partly novel and problem-oriented, possibly provided by commercial partners. That would create a healthy arena whereby free and commercial methods could be combined in a number of application-specific “processing apps”, thus generating a market and fostering innovation. A large number of actors would ultimately benefit from the MPAI-GSA standard – researchers performing complex experiments, companies providing medical and commercial services based on data-rich quantitative technologies, and the final users who would use instances of the computational framework as deployed “apps”.

The following examples describe typical uses of the MPAI-GSA framework.

  1. Medical genomics – sequencing and variant-calling workflows

In this use case, one would like to correlate a list of genomic variants present in humans and having a known effect on health (metadata) with the variants present in a specific individual (secondary data). Such variants are derived from sequencing data for the individual (primary data) on which some variant calling workflow has been applied. Notably, there is an increasing number of companies doing just that as their core business. Their products differ by: the choice of the primary processing workflow (how to call variants from the sequencing data for the individual); the choice of the machine learning analysis (how to establish the clinical importance of the variants found); and the choice of metadata (which databases of variants with known clinical effect to use). It would be easy to re-deploy their workflows as MPAI-GSA applications.

  1. Integrative analysis of ‘omics datasets

In this use case, one would like to correlate some macroscopic variable observed during a biological process (e.g. the reaction to a drug or a vaccine – metadata) with changes in tens of thousands of cell markers (gene expression estimated from RNA; amount of proteins present in the cell – secondary data) measured through a combination of different high-throughput quantitative biological experiments (primary data – for instance, RNA-sequencing, ChIP-sequencing, mass spectrometry). This is a typical application in research environments (medical, veterinary and agricultural). Both primary and secondary analysis are performed with a variety of methods depending on the institution and the provider of bioinformatics services. Reformulating such methods in terms of MPAI-GSA would help reproducibility and standardisation immensely. It would also provide researchers with a compact way to store their heterogeneous data.

  1. Single-cell RNA-sequencing

Similar to the previous one, but in this case at least one of the primary data sources is RNA-sequencing performed at the same time on a number (typically hundred of thousands) of different cells – while bulk RNA sequencing mixes together RNAs coming from several thousands of different cells, in single-cell RNA sequencing the RNAs coming from each different cell are separately barcoded, and hence distinguishable. The DNA barcodes for each cell would be metadata here. Cells can then be clustered together according to the expression patterns present in the secondary data (vectors of expression values for all the species of RNA present in the cell) and, if sufficient metadata is present, clusters of expression patterns can be associated with different types/lineages of cells – the technique is typically used to study tissue differentiation. A number of complex algorithms exist to perform primary analysis (statistical uncertainty in single-cell RNA-sequencing is much bigger than in bulk RNA-sequencing) and, in particular, secondary AI-based clustering/analysis. Again, expressing those algorithms in terms of MPAI-GSA would make them much easier to describe and much more comparable. External commercial providers might provide researchers with clever modules to do all or part of the machine learning analysis.

  1. Experiments correlating genomics with animal behaviour

In this use case, one wants to correlate animal behaviour (typically of lab mice) with their genetic profile (case of knock-down mice) or the previous administration of drugs (typically encountered in neurobiology). Hence primary data would be video data from cameras tracking the animal; secondary data would be processed video data in the form of primitives describing the animal’s movement, well-being, activity, weight, etc.; and metadata would be a description of the genetic background of the animal (for instance, the name of the gene which has been deactivated) or a timeline with the list and amount of drugs which have been administered to the animal. Again, there are several companies providing software tools to perform some or all of such analysis tasks – they might be easily reformulated in terms of MPAI-GSA applications.

  1. Spatial metabolomics

One of the most data-intensive biological protocols nowadays is spatial proteomics, whereby in-situ mass-spec/metabolomics techniques are applied to “pixels”/”voxels” of a 2D/3D biological sample in order to obtain proteomics data at different locations in the sample, typically with sub-cellular resolution. This information can also be correlated with pictures/tomograms of the sample, to obtain phenotypical information about the nature of the pixel/voxel. The combined results are typically analysed with AI-based technique. So primary data would be unprocessed metabolomics data and images, secondary data would be processed metabolomics data and cellular features extracted from the images, and metadata would be information about the sample (source, original placement within the body, etc.). Currently the processing of spatial metabolomics data is done through complex pipelines, typically in the cloud – having these as MPAI-GSA applications would be beneficial to both the researchers and potential providers of computing services.

  1. Smart farming

During the past few years, there has been an increasing interest in data-rich techniques to optimise livestock and crop production (so called “smart farming”). The range of techniques is constantly expanding, but the main ideas are to combine molecular techniques (mainly high-throughput sequencing and derived protocols, such as RNA-sequencing, ChIP-sequencing, HiC, etc.; and mass-spectrometry – as per the ‘omics case at point 2) and monitoring by images (growth rate under different conditions, sensor data, satellite-based imaging) for both livestock species and crops. So this use case can be seen as a combination of cases 2 and 4. Primary sources would be genomic data and images; secondary data would be vectors of values for a number of genomic tags and features (growth rate, weight, height) extracted from images; metadata would be information about environmental conditions, spatial position, etc. A growing number of companies are offering services in this area – again, having the possibility of deploying them as MPAI-GSA applications would open up a large arena where academic or commercial providers would be able to meet the needs of a number of customers in a well-defined way.

More details on MPAI-GSA are found in [3,8].

2.3       AI-Enhanced Video Coding (MPAI-EVC)

MPAI has carried out an investigation on the performance improvement of AI-enhanced HEVC, AI-enhanced VVC and End-to-end AI-based video coding. Preliminary evidence offered by the investigation suggests that by replacing and/or enhancing existing sel­ected HEVC and VVC coding tools with AI-based tools, the objectively measured compres­sion performance may be im­proved by up to around 30%. These results were obtained by combining somewhat heterog­en­eous data from experiments reported in the liter­ature.

The reported initial results, however, do indicate that AI can bring significant im­prov­ements to existing video coding technologies. Therefore, MPAI is investigating the feasibility of improving the coding efficiency by about 25% to 50% over an existing standard with an acceptable increase in complexity using technologies reported in the literature. If the investigation is successful, MPAI will develop a standard called MPAI AI-Enhanced Video Coding (MPAI-EVC).

The investigation showed that encouraging results can be obtained from new types of AI-based coding schemes – called end-to-end. These schemes, while promising, still need substantial more research.

MPAI is also aware of ongoing research targeted at hybrid schemes where AI-based technologies are added to the existing codecs as an enhancement layer without making any change to the base-layer codec itself, thus providing backward-compatible solutions.

At this stage MPAI conducts two parallel activities

  1. Collaborative activity targeting a scientifically sound assessment of the improvements achieved by state-of-the-art research. To the extent possible this should be done with the participation of the authors of major improvements
  2. Thorough development of requirements that the MPAI-EVC should satisfy.

The choice of the starting point (the existing codec), starting from which an AI-enhanced video codec should be developed, is an issue because high-performance video codecs have typically many essential patents (SEP) holders. They should all be convinced to allow MPAI to extend the selected starting point with AI-based tools that satisfy the – still to be defined – MPAI-EVC framework licence. As the result of such an endeavour is not guaranteed, MPAI is planning to pick Essential Video Coding (MPEG-5 EVC) as the starting point. EVC baseline is reported not to be encumbered by IPR.  Additionally, EVC Main Profile is reported to have a limited number of SEP holders. As an EVC patent holder has announced the release of a full implem­entation of EVC as Open Source Software (OSS), the choice of EVC as the starting point would also make available a working code base. The choice between the EVC baseline and main profile is TBD.

The following figures represent the block diagrams of 3 potential configurations to be adopted by the MPAI-EVC standard.

The green circles of Figure 1 indicate traditional video coding tools that could be enhanced or replaced by AI-enabled tools. This will be taken as the basis of the collaborative activity men­tioned above.

Figure 1 – A reference diagram for the Horizontal Hybrid approach

In Figure 2 a traditional video codec is enhanced by an AI Enhancement codec.

Figure 2 – A reference diagram for the Vertical Hybrid approach

More details on MPAI-EVC are found in [4,10.11.12]

2.4       Server-based Predictive Multiplayer Gaming (MPAI-SPG)

There are two basic approaches to online gaming:

  1. Traditional online gaming: the server receives a sequence of data from the client(s) and sends an input-dependent sequence of data to the client(s) which use the data to create appropriate video frames.
  2. Cloud gaming: the server receives a sequence of data from the client(s) and sends an input-dependent sequence of video frames to the client(s). In a cloud gaming scenario, all clients run in the cloud on virtual machines.

In case the connection has temporary high latency or packet loss (in the following called network disruption) two strategies may be used to make up for missing information

  1. Client-side prediction when information from the client does not reach the server or from the server does not reach the client
  2. Server-side prediction when information from the client does not reach the server

In a game a finite state Game machine calculates the logic of the game from the inputs received from the game controller. The client reacts to such user input before the server has acknowledged the input and updated its Game state. If an updated Game state from the server is missing, the client predicts the Game state locally and produces a video frame that is potentially wrong. When the right information from the server reaches the client, the client Game state is reconciliated with the server Game state.

For example, in a first-person shooter game where a fast pace shooting takes place, player A aims their weapon at the position that player B held some milliseconds in the past. Therefore, when Player A fires, player B is long gone. In order to remediate this, when the server gets the information regarding the shooting, which is precise because it carries timestamps, it knows exactly where player A’s the weapon was aiming at and the past position of player B. The server thus processes the shot at that past time, reconciles all the states and updates the clients. In spite of the belated reconciliation, however. the situation is not satisfactory because player B was shot in the past and, in the few milliseconds of differ­ence, player B may have taken cover.

AI and ML can provide more efficient solutions than available today to compensate network disruption and improve user experience to both traditional online gaming and cloud-based gaming.

An AI machine can collect data from multiple clients, perform a much better prediction of each move of each participant and perform sophisticated reconcil­iations. Information from the game engine – inputs from clients and reconciliation info – can be used in the video encoding process to improve the encoding (i.e. motion estimation), thus making possible encoding at higher frame rates for a high-quality gaming exper­ience even on low performing hardware.

Here are two examples of known games that illustrate how MPAI standards can feasibility and user experience.

Example 1: Racing games

During an online racing game, players can see lagging reactions to their moves and an overall low-quality presentation because of network disruption. Usually the information on the screen predic­ted by a client is wrong if the online client information cannot reach the server and the clients involved in the online game on time. In a car racing game, the player may see at time t0 a vehicle going straight to the wall when it is reaching a curve. At time t1, after some seconds, the same vehicle is “teleported” to the actual position yielding an awful player experience.

AI can mitigate this issue, offering a better game experience to players. Data from the different online games are collected and used to predict a meaningful path or the correct behaviour in the time information does not reach the clients.

Example 2: Zombie games

In some traditional online video games, specific information is displayed differently on different clients because it is too onerous to compute all the outcomes of players’ actions in a physically consistent way. An example is provided by zombie games: the result of killing hordes of zombies in each client is visually different from client to client.

A server-based predictive input can support the online architecture and enable it to provide an equal outcome on all clients. In a massive multiplayer hack&slash game, the result of the different combats among players yields the same live visual online experience to each player.

More details on MPAI-SPG are found in [5]

2.5       Multi-Modal Conversation (MPAI-MMC)

A useful application of AI is in the conversational partner which provides the user with information, entertains, chats and answers questions through the speech interface. However, an application should include more than just a speech interface to provide a better service to the user. For example, emotion recognizer and gesture interpreter are needed for better multi-modal interfaces.

Multi-modal conversation (MPAI-MMC) aims to enable human-machine conversation that emulates human-human conversation in completeness and intensity by using AI.

The example of MMC is the conversation between a human user and a computer/robot as in the following list. The input from the user can be voice, text or image or combination of different inputs. Considering emotion of the human user, MMC will output responses in a text, speech, music depending on the user’s needs.

  • Chats: “I am bored. What should I do now?” – “You look tired. Why don’t you take a walk?”
  • Question Answering: “Who is the famous artist in Barcelona?” – “Do you mean Gaudi?”
  • Information Request: “What’s the weather today?” – “It is a little cloudy and cold.”
  • Action Request: “Play some classical music, please” – “OK. Do you like Brahms?”

More details on MPAI-MMC are found in [6]

2.6       Compression and Understanding of Industrial data (MPAI-CUI)

Most economical organizations, e.g., companies, etc., produce large quantities of data, often because these are required by regulation. Users of these data maybe the company itself or Fintech and Insurtech services who need to access the flow of company data to assess and mon­itor financial and organizational performance, as well as the impact of vertical risks (e.g., cyber, seismic, etc.).

The sheer amount of data that need to be exchanged is an issue. Analysing those data by humans is typically on­erous and may miss vitally important information. Artificial intelligence (AI) may help reduce the amount of data with a controlled loss of information and extract the most relevant information from the data. AI is considered the most promising means to achieve the goal.

Unfortunately, the syntax and semantics of the flow of data is high dependent on who has produced the data. The format of the date is typically a text file with a structure not designed for indexing, search and ex­traction. Therefore, in order to be able to apply AI technologies to meaningfully reduce the data flow, it is necessary to standardize the formats of the components of the data flow and make the data “AI friendly”.

Recent regulations are imposing a constant monitoring (ideally monthly). Thus, there is the pos­sibility to have similar blocks of data in temporally consecutive sequences of data.

The company generating the flow data may need to perform compression for its own need (e.g., identifying core and non-core data). Subsequent entities may perform further data compres­sion.

In general, compressed data should allow for easy data search and extraction.

MPAI-CUI may be used in a variety of contexts

  1. To support the company’s board in deploying efficient strategies. A company can analyse its financial performance, identifying possible clues to the crisis or risk of bankruptcy years in advance. It may help the board of directors and decision-makers to make the proper decisions to avoid these situations, conduct what-if analysis, and devise efficient strategies.
  2. To assess the financial health of companies that apply for funds/financial help. A financial institution that receives a request for financial help from a troubled company, can access its financial and organizational data and make an AI-based assessment of that company, as well as a prediction of future performance. This aids the financial institution to take the right decision in funding or not that company, having a broad vision of its situation.
  3. To assess the risk in different fields considering non-core data (e.g., non-financial data). Accurate and targeted sharing of core and non-core data that ranges from the financial and organizational information to other types of risks that affect the business continuity (e.g., environmental, seismic, infrastructure, and cyber).
  4. To analyse the effects of disruptions on the national economy, e.g., performance evaluation by pre/post- pandemic analysis.

3        Architecture

The normative MPAI-AIF architecture enables the creation and automation of mixed ML-AI-DP processing and inference workflows at scale for the use cases considered above. It includes six basic normative elements of the Architecture called Components addressing different modalities of operation – AI, Machine Learning (ML) and Data Processing (DP), data pipelines jungles and computing resource allocations including constrained hardware scenarios of edge AI devices.

The normative reference diagram of MPAI-AIF is given by the following figure where APIs be­tween different Components at different level are shown.

Figure 3 – Proposed normative MPAI-AIF Architecture

  1. Management and Control

Management concerns the activation/disactivation/suspensions of AIMs, while Control supports complex application scenarios.

Management and Control handles simple orchestration tasks (i.e. represented by the execution of a script) and much more complex tasks with a topology of networked AIMs that can be syn­chronised according to a given time base and full ML life cycles.

  1. Execution

The environment where AIMs operate. It is interfaced with Management and Control and with Communication and Storage. It receives external inputs and produces the requested outputs both of which are application specific.

  1. AI Modules (AIM)

AIMs are units comprised of at least the following 3 functions:

  1. The processing element (ML or traditional DP)
  2. Interface to Communication and Storage
  3. Input and output interfaces (function specific)

AIMs can implement auto-configuration or reconfiguration of their ML-based computational models.

  1. Communication

Communication is required in several cases and can be implemented accordingly, e.g. by means of a service bus. Components can communicate among themselves and with outputs and Storage.

The Management and Control API implements one- and two-way signalling for computational workflow initialisation and control.

  1. Storage

Storage encompasses traditional storage and is referred to a variety of data types, e.g.:

  1. Inputs and outputs of the individual AIMs
  2. Data from the AIM’s state, e.g. with respect to traditional and continuous learning
  3. Data from the AIM’s intermediary results
  4. Shared data among AIMs
  5. Information used by Management and Control.
  6. Access

Access represents the access to static or slowly changing data that are required by the application such as domain knowledge data, data models, etc.

4        Requirements

4.1       Component requirements

  1. The MPAI-AIF standard shall include specifications of the interfaces of 6 Components
    1. Management and Control
    2. Execution
    3. AI Modules (AIM)
    4. Communication
    5. Storage
    6. Access
  2. MPAI-AIF shall support configurations where Components are distributed in the cloud and at the edge
  3. Management and Control shall enable operations on the general Machine Learning and/or traditional Data Processing life cycle of
    1. Single AIMs, e.g. instantiation-configuration-removal, internal state dumping/retrieval, start-suspend-stop, train-retrain-update, enforcement of resource limits
    2. Combinations of AIMs, e.g. initialisation of the overall computational model, instan­tiation-removal-configuration of AIMs, manual, automatic, dynamic and adaptive configuration of interfaces with Components.
  4. Management and Control shall support
    1. Architectures that allow application-scenario dependent hierarchical execution of workflows, i.e. a combination of AIMs into computational graphs
    2. Supervised, unsupervised and reinforcement-based learning paradigms
    3. Computational graphs, such as Direct Acyclic Graph (DAG) as a minimum
    4. Initialisation of signalling patterns, communication and security policies between AIMs
  5. Storage shall support protocols to specify application-dependent requirements such as access time, retention, read/write throughput
  6. Access shall provide
    1. Static or slowly changing data with standard formats
    2. Data with proprietary formats

4.2       Systems requirements

The following requirements are not intended to apply to the MPAI-AIF standard, but should be used for assessing technologies

  1. Management and Control shall support asynchronous and time-based synchronous operation depending on application
  2. The Architecture shall support dynamic update of the ML models with seamless or minimal impact on its operation
  3. ML-based AIMs shall support time sharing operation enabling use of the same ML-based AIM in multiple concurrent applications
  4. AIMs may be aggregations of AIMs exposing new interfaces
  5. Complexity and performance shall be scalable to cope with different scenarios, e.g. from small MCUs to complex distributed systems
  6. The Architecture shall support workflows of a mixture of AI/ML-based and DP technology-based AIMs.

4.3       General requirements

The MPAI-AIF standard may include profiles for specific (sets of) requirements

5        Conclusions

When the definition of the MPAI-AIF Framework Licence will be completed, MPAI will issue a Call for Technologies that support the AI Framework with the requirem­ents given in this document.

Respondents will be requested to state in their submissions their intention to adhere to the Frame­work Licence developed for MPAI-AIF when licensing their technologies if they have been inc­luded in the MPAI-AIF standard.

The MPAI-AIF Framework Licence will be developed, as for all other MPAI Framework Licences, in compliance with the gener­ally accepted principles of competition law.

6        References

[1] MPAI Application Note#4 – MPAI-AIF Artificial Intelligence Framework

[2] MPAI Application Note#1 R1 – MPAI-CAE Context-based Audio Enhancement

[3] MPAI Application Note#2 R1 – MPAI-GSA Integrative Genomic/Sensor Analysis

[4] MPAI Application Note#3 R1 – MPAI-EVC AI-Enhanced Video Coding

[5] MPAI Application Note#5 R1 – MPAI-SPG Server-based Predictive Multiplayer Gaming

[6] MPAI Application Note#6 R1 – MPAI-MMC Multi-Modal Conversation

[7] MPAI-CAE Functional Requirements work programme

[8] MPAI-GSA Functional Requirements work programme

[9] MPAI-MMC Functional Requirements work programme

[10] MPAI-EVC Use Cases and Requirements

[11] Collaborative Evidence Conditions for MPAI-EVC Evidence Project R1

[12] Operational Guidelines for MPAI-EVC Evidence Project