Moving Picture, Audio and Data Coding
by Artificial Intelligence

MPAI: where it is, where it is going

Some 50 days after having been announced, MPAI was established as a not-for-profit organisation with the mission to develop data coding standards with an associated mechanism designed to facilitate the creation of licences.

Some 150 days have passed since its establishment. Where is MPAI in its journey to accomplish its missions?

Creating an organisation that would execute the mission in 50 days was an achievement, but the next goal of giving the organisation the means to accomplish its mission was difficult but was successfully achieved.

MPAI has defined five pillars on which the organisation rests.

Pillar #1: The standard development process.

MPAI is an open organisation not in words, but in practice.

Anybody can bring proposals – Interest Collection – and help merge their proposal with others into a Use Case. All can participate in the development of ed Functional Requirements. Once the functional requirements are defined, MPAI Principal Members develop the Commercial Requirements, all MPAI Members develop the Call for Technologies, review the submissions and start the Standard Development. Finally, MPAI Principal Members approve the MPAI standard.

The progression of a proposal from a stage to the next is approved by the General Assembly.

Pillar #2: AI Modules.

This pillar is technical in nature, but has far reaching implications. MPAI defines basic units called AI Modules (AIM) that perform a significant task and develops standards for them. The AIM in the figure processes the input video signal (a human face) to provide the emotion expressed by the face, the meaning (question, affirmation etc.) of what the human is saying.

MPAI confines the scope of standardisation to the format of input and output data of an AIM. It is silent on the inside of the AIM (the green box) which can use ML, AI or data processing technologies and can be implemented in hardware or software. In the figure the Emotion and Meaning Knowledge Base is required when the AIM is implemented with legacy technologies.

Pillar #3: AI Framework

It is clear that, just by itself, AIMs will have limited use. Practical applications are more complex and require more technologies. If each of these technologies are implemented as AIMs, how can they be connected and executed?

The figure depicts hiw the MPAI AI Framework (AIF) solves the problem. The MPAI AI Framework has the function of managing the life cycle of the individual AIMs, and of creating and executing the workflows. The Communication and Storage functionality allows possibly distributed AIMs to implement different forms of communication.

Pillar #4: Framework Licence

The process of Pillar #1 mentions the development of “Commercial Requirements”. Actually, MPAI Principal Members develop a specific document called Framework Licence, the patent holders’ business model to monetise their patent in a standard without values: dollars, percentage, dates etc. It is developed and adopted by Active Principal Members and is attached to the Call for Technologies. All submissions must contain a statement that the submitter agrees to licence their patents according to the framework licence.

MPAI Members have already developed 4 Framework Licences. Some of the interesting elements of them are:

  1. The License will be free of charge to the extent it is only used to evaluate or demo solutions or for technical trials.
  2. The License may be granted free of charge for particular uses if so decided by the licensors.
  3. A preference will be expressed on the entity that should administer the patent pool of patent holders.

Pillar #5: Conformance

Guarantee of a good performance of an MPAI standard implementation is a necessity for a user. From an implementer’s viewpoint, a measurable performance is a desirable characteristic because users seek assurance about performance before buying or using an implementation.

Testing an MPAI Implementations for conformance means to make available the tools, procedures, data sets etc. specific of an AIMs and of a complete MPAI Implementation. MPAI will not perform tests an MPAI Implementation for conforman­ce, but only provide testing tools that enable a third party to test the conformance of an AIM and of the set of AIMs that make up a Use Case.

The MPAI standards

MPAI has issued

  1. A call for technologies to enable the development of the AI Framework (MPAI-AIF) standard. Responses we received on the 17th of February and MPAI is busy developing the standard.
  2. Two calls for the Context-based Audio Enhancement (MPAI-CAE) and Multimodal Conversation (MPAI-MMC) standards. Responses are due on the 12th of April.
  3. A call for the Compression and Understanding of Industrial Data (MPAI-CUI) standard. Responses are due on the 10th of May.

Calls for technologies for a total of 9 Use Cases have been issued. One is being developed. Th Use cases described above are depicted in the figure below.

MPAI continues the development of Functional Requirements for the Server-based Multiplayer Gaming (MPAI-SPG) standard, the Integrated Genome/Sensor Analysis (MPAI-GSA) standard and the AI-based Enhanced Video Coding (MPAI-EVC).

In the next months will be busy producing its first standards. The first one – MPAI-AIF – is planned to be released on the 19th of July.


MPAI tackles AI-based risk analysis standard in a new Call for Technologies

Geneva, Switzerland – 17 March 2021. At its 6th General Assembly, the international, unaffiliated Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) standards association has promoted its 4th standard project Compression and understanding of industrial data (MPAI-CUI) to the Call for Technologies stage. The standard aims to enable Artificial Intellig­ence (AI)-based filtering and extraction of information from a company’s governance, financial and risk data enabling prediction of company performance, e.g., organisational adequacy or default probability.

All parties, including non-MPAI members, who believe they have relevant technologies satisfying all or most of the MPAI-CUI Functional Requirements are invited to submit proposals for consid­eration by MPAI. The MPAI-CUI Call for Technologies requests that technologies proposed, if accepted for inclusion in the standard, be released according to the MPAI-CUI Framework Licence to facilitate eventual definition of the final licence by patent holders.

The content of the Call for Technologies will be introduced at two online conferences. Interested parties are welcome to attend.

MPAI is continuing the development of its AI Framework standard, nicknamed MPAI-AIF. The goal of the standard is to enable creation and automation of mixed Mach­ine Learning (ML) – Artificial Intelligence (AI) – Data Proces­sing (DP) and inference workflows, implemented as soft­ware, hardware, or mixed software and hardware. A major MPAI-AIF feature is to offer enhanced explainability to applications conforming to MPAI standards. MPAI retains its intention to release the standard in July 2021.

At its previous General Assembly (MPAI-5), MPAI has issued two Calls for Technologies supporting two new standards:

  1. Context-based Audio Enhancement (MPAI-CAE) covering four instances: adding a desired emotion to a speech without emotion, preserving old audio tapes, improving the audiocon­ference experience and removing unwanted sounds while keeping the relevant ones to a user walking in the street.
  2. The Multimodal Conversation (MPAI-MMC) covering three instances: an audio-visual conversation with a machine impersonated by a synthesised voice and an animated face, a request for information about an object while displaying it, a human sentence translated using a synthetic voice that preserves the human speech features.

The MPAI web site provides information about the other AI-based standards being developed: AI-Enhanced Video Coding (MPAI-EVC) will improve the performance of existing video codecs, Integrative Genomic/ Sensor Analysis (MPAI-GSA) will compress and understand the res­ults of combining genomic experiments with those produced by related devices/sensors, and Server-based Predictive Multiplayer Gaming (MPAI-SPG) will improve the user experience of online multiplayer games.

MPAI develops data coding standards for applications that have AI as the core enabling technology. Any legal entity who supports the MPAI mission may join MPAI if it is able to contribute to the development of standards for the efficient use of data.

Visit the MPAI web site and contact the MPAI secretariat for specific information.


New age – new video coding technologies

The last 30 years of video coding have been productive because, over the years, compression rate has improved enabling digital video to take over all existing and extend to new video applications. The coding algorithm has been tweaked over and over, but it is still based on the same original scheme.

In the last decade, the “social contract” that allowed inventors to innovate, innovation to be brought into standards, standards to be used and inventors to be remunerated has stalled. The HEVC standard is used but the entire IP landscape is clouded by uncertainty.

The EVC standard, approved in 2020, 7 years after the HEVC was approved, has shown that even with an inflow of technologies from a reduced number of sources can provide outstanding results, as shown in the figure:

The EVC baseline, a profile that uses 20+ years old technologies, reaches the performance of EVC. The main profile offers a bitrate reduction of 39% over HEVC, a whisker away from the performance of the lasted VVC standard.

In 1997 the match between IBM Deep Blue and the (human) chess champion of the time made headlines: IBM Deep Blue beat Garry Kasparov. It was easy to herald the age when machines will overtake human non just in keeping accounts, but also in one of the noblest intellectual activities: chess.

This was achieved by writing a computer program that explored more alternatives that a human could reasonably do, although a human’s intuition can look far into the future. In that sense, Deep Blue operated much like MPEG video coding.

Google DeepMind’s AlphaGo did the same in 2015 by beating the Go champion Sedol Lee. The Go rules are simpler than chess rules, but the alternatives in Go are way more numerous. There is only one type of piece (the stone) onstead of six (king, queen, bishop, knight, rook and pawn), but the Go board has 19×19 boxes instead of 8×8 of chess. While DeepBlue made a chess move by brute-force exploring future moves, AlphaGo made go moves relying on neural networks which had learned moves.

That victory signalled a renewed interest in a 3/4 of a century old technology – neural networks.

In neural networks data are processed by different layers that extract essential information until a compressed representation of the input data is achieved (left-hand side). At the decoder, the inverse process takes place.

MPAI has established the AI-Enhanced Video Coding (MPAI-EVC) standard project. This is based on an MPAI study collecting published results where individual HEVC coding tools have been replaced by neural networks (in the following we call then AI tools). By summing up all the published gains and improvement of 29% is obtained.

This is an interesting, but far from being a scientifically acceptable result because the different tools used were differently trained. Therefore, MPAI is currently engaged in the MPAI-EVC Evidence Project that can be exemplified by the following figure:

Here all coding tools have been replaced by AI tools. We intend to train these new tools with the same source material (a lot of it) and assess the improvement obtained.

We expect to obtain an objectively measured improvement of at least 25%.

After this MPAI will engage in the actual development of MPAI-EVC. We expect to obtain an objectively measured improvement of at least 35%. Our experience suggests that the subjectively measured improvement will be around 50%.

Like in Deep Blue, old tools had a priori statistical knowledge is modelled and hardwired in the tools, but in AI, knowledge is acquired by learning the statistics.

 

This is the reason why AI tools are more promising than traditional data processing tools.

For a new age you need new tools and a new organisation tuned to use those new tools.

 

 


MPAI receives technologies for its AI framework standard and calls for technologies supporting audio and human-machine conversation

Geneva, Switzerland – 17 february 2021. At its 5th General Assembly, Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI), an international, unaffiliated standards association

  1. Has kicked off work on its AI Framework (MPAI-AIF) standard after receiving substantial proposed technologies.
  2. Is calling for technologies to develop two standards related to audio (MPAI-CAE) and multimodal conversation (MPAI-MMC).
  3. Will soon be developing the Framework Licence for the next maturing project “Compression and Understanding of Industrial Data” (MPAI-CUI).

MPAI has reviewed responses to the call issued 2 months ago for tech­nologies supporting its AI Framework (MPAI-AIF) standard. The goal is to enable creation and automation of mixed Mach­ine Learning (ML) – Artificial Intelligence (AI) – Data Proces­sing (DP) and inference workflows, implemented as software, hardware or mixed software and hardware. MPAI-AIF will offer extended explainability to applications conforming to MPAI standards. The submissions received are enabling MPAI to develop the inten­ded standard whose publication is planned for July 2021.

MPAI has issued two Calls for Technologies supporting two new standards:

  1. The Context-based Audio Enhancement (MPAI-CAE) standard will improve the user exper­ien­ce for several audio-related applications in a variety of contexts such as in the home, in the car, on-the-go, in the studio etc. Examples of use are adding a desired emotion to a speech without emotion, preserving old audio tapes, improving the audioconference experience and removing unwanted sounds while keeping the relevant ones to a user walking in the street.
  2. The Multimodal Conversation (MPAI-MMC) standard will enable a human-mach­ine conver­sation that emulates human-human conversation in completeness and intensity by using AI. Examples of use are an audio-visual conversation with a machine where the machine is imper­sonated by a synthesised voice and an animated face, a request for information about an object while displaying it, a human question to a machine translated using a voice preserving the speech features of the human.

The content of the two Calls for Technologies will be introduced at two online conferences. Attendance is open to interested parties.

MPAI has developed the functional requirements for the Compression and Understanding of In­dustrial Data (MPAI-CUI) and, by decision of the General Assembly, MPAI Active Members may now develop the Framework Licence to facilitate the actual licences – to be developed outside of MPAI. The standard will be used to assess the risks faced by a company by using information from the flow of data produced.

The MPAI web site provides information about other AI-based standards being developed: AI-Enhanced Video Coding (MPAI-EVC) will improve the performance of existing video codecs, Integrative Genomic/Sensor Analysis (MPAI-GSA) will compress and understand the res­ults of combining genomic experiments with those produced by related devices/sensors, and Server-based Predictive Multiplayer Gaming (MPAI-SPG) will improve the user experience of online multiplayer game players.

MPAI develops data coding standards for applications that have AI as the core enabling technology. Any legal entity who supports the MPAI mission may join MPAI if it is able to contribute to the development of standards for the efficient use of data.

Visit the MPAI web site and contact the MPAI secretariat for specific information.

 


Having a conversation with a machine

Like it or not, we do what the title says regularly, with mixed results. Searching for that file in your computer should be easy but it is often not, finding that an email is a hassle, especially when retrieving it is so important, and talking with an information service is often challenge to your nervous system.

I have no intention to criticise the – difficult – work that others have done, but the results of human-machine conversation are far from satisfactory.

Artificial Intelligence promises to do a better job.

Since a few months, MPAI has started working on an area called Multimodal Conversation (MMC). The intention is to use a plurality of communication means to improve the ability of humans to talk to machines. Currently, the area includes 3 Use Cases:

  1. Conversation with Emotion (CWE)
  2. Multimodal Question Answering (MQA)
  3. Personalized Automatic Speech Translation (PST).

In this article I would like to present the first, CWE.

The driver of this use case is that, to improve the ability of a machine to have a conversation with a human, it is important for the machine to have information about the words that are used by the human but also on other entities such as the emotional state of the human. If a human is asking a question about the quality of a telephone line, it is important for the machine to know if the emotion of the speaker is neutral – the question is likely purely informational – or altered – the question likely implies a dissatisfaction with the telephone service.

In CWE the machine uses speech, text from a keyboard and video to make the best assessment about the conversation partner’s emotional state. This is shown by Figure 1 where you can see that three blocks are dedicated to extracting emotion in 3 modes. The 3 estimated emotions are feed to another block which is tasked to make a final decision.

Figure 1 – Emotion extraction from text, speech, and video

The first three blocks starting from the left-hand side can be implemented as neural networks. But MPAI does not wish to disenfranchise those who have invested for years in traditional data processing solutions and have produced state-of-the-art technologies. MPAI seeks to define standards that are, as far as possible, technology independent. Therefore, in a legacy context Figure 1 morphes to Figure 2

Figure 2 – Emotion extraction using legacy technologies

 In the system of Figure 2, each of the 3 initial blocks extracts features from the input data and uses a vector of features to query an appropriate Knowledge Base that responds with one or more candidate emotions.

Actually, Video analysis and Language understanding do more than just providing emotion information. This is seen in the following Figure 3 where the two blocks additionally provide Meaning, i.e., information extracted from the text and video such as question, statement, exclamation, expression of doubt, request, invitation etc.

 

Figure 3 – Emotion and meaning enter Dialogue processing

 

Meaning and emotion are fed into the Dialogue processing component. Note that in a legacy implementation Dialogue processing, too, needs access to a Dialogue Knowledge Base. From now on, however, we will assume to deal with a full AI-based implementation.

Dialogue processing produces two streams of data, as depicted in Figure 4. The result is composed by:

  1. a data stream to drive speech synthesis expressed either as “Text with emotion” and “Concept with emotion”.
  2. a data stream to drive face animation in tune with the speech.

Figure 4 – End-to-end multimodal conversation with emotion

 

The last element, to move from theory to practice, is that you need an environment where you can place the blocks (that MPAI calls AI Modules – AIM) establish all connections, activate all timings, and execute the chain. One could even want to train or retrain the individual neural networks.

The technology that makes this possible by the MPAI AI Framework (MPAI-AIF) for which a Call for Technologies has been published on 2020/12/16 and whose responses are due on 2021/02/15. The full scheme of Multimodal conversation with emotion in the MPAI AI Framework is represented by Figure 5

Figure 5 – Multimodal conversation with emotion in the MPAI AI Framework

The six components of MPAI-AIF are:

  • Management and Control, in charge of the workflow
  • Execution, the environment where the workflow is executed.
  • AI Modules (AIM), the basic blocks of the system
  • Communication, to handle both internal and external communication (e.g., Dialogue processing could be located on the cloud)
  • Storage, for exchanging data between AIMs.
  • Access, to access static or slowly changing external data.

What does MPAI intend to standardise in Conversation with emotion? In a full AI-based implementation

  1. The format of the Text leaving Speech recognition.
  2. Representation of Emotion
  3. Representation of Meaning
  4. Format of Reply: Text with Emotion or Concept with Emotion
  5. Format of Video anmation

In case of a legacy implementation, in addition to the above we need:

  1. Emotion KB (video) query format with Video features
  2. Emotion KB (speech) query format with Speech features
  3. Emotion KB (text) query format with Text features
  4. Dialoue KB query format

As you see MPAI standardisation is minimal, in tune with the requirement of standardisation to specify the minimum that is necessary for interoperability. In the MPAI case the minimum is what is required to assemble a working system using AIMs from independent sources.


Opacity or transparency?

By publishing its Manifesto, MPAI has made clear in a few sentences its strategic analysis of the AI industry, what will be its action points and why that will provide benefits.

#1 Applications using AI are extending in scope and performance. the AI industry is one of the fastest growing industries.

#2 The industry is not developing as fast as it could because there are hurdles. The first is the fact that the AI application development model is based on frameworks that tend to create the well-known walled garden effect. Importing applications is easy but exporting applications is difficult. We are missing the seamless interactions that could propel the industry to new heights.

The second hurdle is the fact that most AI applications are monolithic and opaque. If these two adjectives apply to an application that changes a punctured tyre, you may not care, but if it is an application that selects relevant news, I do care.

#3 MPAI believes that AI interoperability standards will have the same beneficial effects that digital media standard had on the media industry starting 30 years ago and continuing today. AI and media may very well be different beasts, but the MPAI notion of coding as the transformation of data from one representation to an equivalent one more suited to a specific application shows that there are more commonalities than differences.

#4 The video and audio coding and decoding chips – minuscule entities compared to the size of digital media services they enabled – are mirrored by the MPAI notion of AI module. As much as the digital media standards defined the syntax and semantics of the data coming into the decoder, the AI modules are defined by the syntax and semantics of the data coming into the AI Module (AIM). However, because an AIM is typically connected to other AIMs, the syntax, and semantics of the output data of an AIM are also standardised. As much as the “digital media decoder” became a basic component available on the open market, MPAI expects that the “AI Modules” will become basic components available on the open market.

#5 The fact that AIMs are meant to be connected in a variety of topologies shows that MPAI has an additional problem to solve. The MPAI “AI Framework” (AIF), for which MPAI has issued a Call for Technologies due 15 February 2021 will enable creation, execution, composition, and update of AIM-based workflows.

#6 The elements described above show that MPAI standards will benefit all actors involved.

  • Technology providers will be able to offer their conforming AIMs with different technologies – AI, ML, legacy data processing – and different levels of performance to an open market.
  • Application developers will find on the open market the AIMs needed by their applications and will thus be able to develop more ambitious applications that they could otherwise develop.
  • The fact that there will be a race in a level play field among providers of standard AIM will fuel innovation.
  • Consumers will have a wider choice of better AI applications from a competitive market of application providers who will be able to draw state-of-the-art technologies (AIMs) from the open market.
  • Society will be able to lift the veil of opacity from large, monolithic AI-based applications because atomic AIMs will tell a lot of what is the logic that runs the application.

#7 The very fact that so much investment is being made on AI by Academia and Industry – e.g., representation learning, transfer learning, edge AI, and reproducibility of performance today and more in the future – means that innovations will characterise AI on which MPAI standards are based for the years to come.

#8 One of the founding drivers of MPAI is the realisation that Fair, Reasonable and Non-Discriminatory (FRAND) declarations are no longer a match to today’s technology and business complexity. Society is deprived of the use of valuable technologies and IP holders are deprived of a fair remuneration of their investment. By setting in advance IP holders’ IPR guidelines MPAI’s Framework Licences will facilitate users of MPAI standards. A first instance of an MPAI Framework Licence for MPAI-AIF has already been developed.

#9 MPAI is proud of being a technical body, but even more proud of being aware of the revolutionary impact AI will have on the future of human society and that technology and society should not be antagonists in addressing the revolution. MPAI will create opportunities for its standards developers to interact with high-level thinkers. Instead of addressing expected problems from AI in abstract terms, MPAI will propose actual cases and seek advice on the impact its standards will have on society.

#10 This is not wishful thinking. Cutting monolithic AI applications in smaller pieces is not a dream but a concrete reality underpinned by the MPAI AI Framework and the MPAI AI Modules. It is debatable whether an AIM/AIF-based solution will be more efficient than a monolithic AI solution. It is not debatable that an AIM/AIF-based solution will be less onerous to design and make and that it will be more transparent to users.


MPAI addresses new standards for Context-based Audio Enhancement and Multimodal Conversation

Geneva, Switzerland – 20 January 2021. Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI), an international unaffiliated standards association, is engaging in two new areas for standardisation: Context-based Audio Enhancement and Multimodal Conversation.

Standards in the Context-based Audio Enhancement (MPAI-CAE) work area will improve the user experience for several audio-related applications in a variety of contexts such as in the home, in the car, on-the-go, in the studio etc. Examples are:

  1. adding emotions to a speech without emotion
  2. preserving old audio tapes
  3. improving the audioconference experience
  4. removing unwanted sounds while keeping relevant sounds to a user walking in the street.

Standards for Multimodal Conversation (MPAI-MMC) work area will enable a human-machine conversation that emulates human-human conversation in completeness and intensity by using AI. Examples are focused on machines

  1. producing speech and animated face that have a level of emotion consistent with the emotion contained in text, speech and face of the human who is talking to the machines;
  2. responding to a question asked by a human who is showing an object;
  3. translating to another language a question asked by a human using a voice with features similar to those of the human.

MPAI plans on publishing Calls for Technologies for these two standards at its next General Assembly on 2021/02/17. The current draft are available here and here.

At its last General Assembly (MPAI-3), MPAI issued a Call for Technologies for its planned AIF Framework (MPAI-AIF) standard. Submissions are due 2020/02/15 for review and action at its next General Assembly (MPAI-5) on 2020/02/17.

The MPAI web site provides information about other MPAI standards being developed: MPAI-CUI uses AI to compress and understand industrial data, MPAI-EVC to improve the performance of existing video codecs, MPAI GSA to understand and compress the res­ults of combining genomic experiments with those produced by related devices/sensors, e.g. video, motion, location, weather, medical sensors, and MPAI-SPG to improve the user experience of online multiplayer games.

MPAI develops data coding standards for applications that have AI as the core enabling technology. Any legal entity that supports the MPAI mission may join MPAI if it is able to contribute to the development of standards for the efficient use of data.

Visit the MPAI home page and contact the MPAI secretariat for specific information.

 

 


A vision for AI-based data coding standards

Use of technologies based on Artificial Intelligence (AI) is extending to more and more applic­ations yielding one of the fastest-grow­ing markets in the data analysis and service sector.

However, industry must overcome hurdles for stakeholders to fully exploit this historical oppor­tunity: the current framework-based development model that makes applic­ation redep­loyment difficult, and monolithic and opaque AI applications that generate mistrust in users.

MPAI – Moving Picture, Audio and Data Coding by Artificial Intelligence – believes that univer­sally accessible standards can have the same positive effects on AI as digital media stan­dards and has identified data coding as the area where standards can foster development of AI tech­nologies, promote use of AI applications and contribute to the solution of existing problems.

MPAI defines data coding as the transformation of data from a given representation to an equiv­alent one more suited to a specific application. Examples are compression and semantics extraction.

MPAI considers AI module (AIM) and its interfaces as the AI building block. The syntax and semantics of interfaces determine what AIMs should per­form, not how. AIMs can be implemented in hardware or software, with AI or Machine Learning legacy Data Processing.

MPAI’s AI framework enabling creation, execution, com­pos­ition and update of AIM-based work­flows (MPAI-AIF) is the cornerstone of MPAI standardisation because it enables building high-com­plexity AI solutions by interconnecting multi-vendor AIMs trained to specific tasks, operating in the standard AI framework and exchanging data in standard formats.

MPAI standards will address many of the problems mentioned above and benefit various actors:

  • Technology providers will be able to offer their conforming AIMs to an open market
  • Application developers will find on the open market the AIMs their applications need
  • Innovation will be fuelled by the demand for novel and more performing AIMs
  • Consumers will be offered a wider choice of better AI applications by a competitive market
  • Society will be able to lift the veil of opacity from large, monolithic AI-based applications.

Focusing on AI-based data coding will also allow MPAI to take advantage of the results of emer­ging and future research in representation learning, transfer learning, edge AI, and reproducibility of perfor­mance.

MPAI is mindful of IPR-related problems which have accompanied high-tech standardisation. Unlike standards developed by other bodies, which are based on vague and contention-prone Fair, Reasonable and Non-Discriminatory (FRAND) declarations, MPAI standards are based on Frame­work Licences where IPR holders set out in advance IPR guidelines.

Finally, although it is a technical body, MPAI is aware of the revolutionary impact AI will have on the future of human society. MPAI pledges to address ethical questions raised by its technical work with the involvement of high-profile external thinkers. The initial significant step is to enable the understanding of the inner working of complex AI systems.


FRAND forever? Or are there other business models possible?

What is the FrameWork License (FWL)

In the context of the development of new technology standards, Intellectual Property Rights (IP Rights) are the engine that ensure and sustain technology innovation. FWL intends to move past FRAND assurances which haven’t reduced friction between innovators and implementers.

In fact, the question of the implementation of FRAND assurances has created a diversity of interpretations, including different decisions taken by the courts. So much so that the recent judgment of the UK Supreme Court affirms a comprehensive principle of the meaning of FRAND: “This is a single, composite obligation, not three distinct obligations that the license terms should be fair, and separately, reasonable, and separately, non-discriminatory “.

So, the FRAND assurance, made during the standardization process of a new technology, has become a “headache” not only for the courts, but also for those who have to operate on the basis of what happened during the standardization process.

This is one reason why a new international and unaffiliated standards association called MPAI (Moving Picture, audio and data coding by Artificial Intelligence) has been established. MPAI has adopted a new management model of the IP Rights associated with the works done within a standardization body. This new industrial property management model is called FrameWork License (FWL). This model intends to overcome all the uncertainties generated by the FRAND declaration, because guidelines on how the future licenses relating to the Standard Essential Patents (SEPs) should be applied are already established at the outset of the standardization work.

With these more precise guidelines already decided in the course of the standardization process, MPAI plans to help both the holders of Standard Essential Patents (SEP) and the implementers of the new standardized technologies find an agreement for the use of SEPs, avoiding the frictions that we have sometimes seen.

As a consequence of the standardization works, a Call for Technologies supporting the MPAI AI Framework (AIF) standard was recently issued, along with AIF Framework License for its potentially essential IPRs.

The technical goal of MPAI-AIF is to enable the set up and the execution of mixed processing and inference workflows made of Machine Learning, Artificial Intelligence and legacy Data Processing components called AI Modules (AIM).

The MPAI AI Framework standard will facilitate integration of AI and legacy data processing components through standard interfaces and methods. MPAI experts have already validated MPAI’s innovative approach in a sample micro controller-based implementation that is synergistic with MPAI-AIF standard development.

The Framework License.

Access to the standard will be granted in a non-discriminatory fashion in compliance with the generally accepted principles of competition law and agreed upon before a standard is developed.

MPAI has replaced FRAND assurances with FWLs defined as the set of voluntary terms to use in a license, without monetary values. FWLs are developed by a committee (the IPR Support Advisory Committee) of MPAI members who are experts in the field of IP .

Practically, the FWL is the business model to remunerate IPRs in the standard that does not bear values: no $, no %, no dates etc. At most, the FWL could provide that in individual cases there is a CAP for the royalties to be paid, or an initial grace period where no royalties are paid to foster the adoption of the technology by the market and so on. Furthermore, the FWL states that the total cost of the licenses issued by IPR holders will be in line with the total cost of the licenses for similar standardized technologies and will take into account the value on the market of the specific standardized technology.

Only when the future standards developed by MPAI will be adopted by the marked and the FWLs will operate as guidelines for licensing the technologies compliant with the standard, it will be possible to really understand if the FWL is useful to help close the gap between the licensors and implementers. At that point, we might simply put the current FRAND declaration concept in the attic.

The full text of the FWL associated with the MPAI-AIF standard can be found on at this link.

The guidelines for the subsequent licenses to the AIF-FWL are listed in the following:

Conditions of use of the License

  1. The License will be in compliance with generally accepted principles of competition law and the MPAI Statutes
  2. The License will cover all of Licensor’s claims to Essential IPR practiced by a Licensee of the MPAI-AIF standard.
  3. The License will cover Development Rights and Implementation Rights
  4. The License will apply to a baseline MPAI-AIF profile and to other profiles containing additional technologies
  5. Access to Essential IPRs of the MPAI-AIF standard will be granted in a non-discriminatory fashion.
  6. The scope of the License will be subject to legal, bias, ethical and moral limitations
  7. Royalties will apply to Implementations that are based on the MPAI-AIF standard
  8. Royalties will not be based on the computational time nor on the number of API calls
  9. Royalties will apply on a worldwide basis
  10. Royalties will apply to any Implementation
  11. An MPAI-AIF Implementation may use other IPR to extend the MPAI-AIF Implementation or to provide additional functionalities
  12. The License may be granted free of charge for particular uses if so is decided by the licensors
  13. The Licenses will provide:
  14. a threshold below which a License will be granted free of charge and/or
  15. a grace period during which a License will be granted free of charge and/or
  16. an annual in-compliance royalty cap applying to total royalties due on worldwide revenues for a single Enterprise
  17. A preference will be expressed on the entity that should administer the patent pool of holders of Patents Essential to the MPAI-AIF standard
  18. The total cost of the Licenses issued by IPR holders will be in line with the total cost of the Licenses for similar technologies standardized in the context of Standard Development Organizations
  19. The total cost of the Licenses will take into account the value on the market of the AI Framework technology Standardized by MPAI.

By Robero Dini, member


What is the state of MPAI work?

Introduction

This article responds to the question:  where is MPAI today, 1st day of 2021, 3 months after its foundation, with its mission and plans?

Converting a mission into a work plan

Looking back, the MPAI mission “Moving Picture, Audio and Data Coding by Artificial Intelligence” looked very attractive, but the task of converting that nice-looking mission into a work plan wasdaunting. Is there anything to standardise in Artificial Intelligence (AI)? Thousands of companies use AI but do not need standards. Isn’t it so that AI signals the end of media and data coding standardisation?

The first answer is that we should first agree on a definition of standard. One is “the agreement reached by a group of individuals who recognise the advantage of all doing certain things in an agreed way”. There is, however, an older definition of standard that says “the agreement that permits large production runs of component parts that are readily fitted to other parts without adjustment”.

Everybody knows that implementing an MPEG audio or video codec means following a minutely prescribed procedure implied by definition #1. But what about an MPAI “codec”?

In the AI world, a neural network does the job it has been designed for and the network designer does not have to share with anyone else how his neural network works. This is true for the “simple” AI applications, like using AI to recognise a particular object, and for some of the large-scale AI applications that major OTTs run on the cloud.

The application scope of AI is expanding, however, and application developers do not necessarily have the know-how, capability or resources to develop all the pieces needed to make a complete AI application. Even if they wanted to, they could very well end up with an inferior solution because they would have to spread their resources across multiple technologies instead of concentrating on those they know best and acquire the others from the market.

MPAI has adopted the definition of standard as “the agreement that permits large production runs of component parts that are readily fitted to other parts without adjustment”. Therefore, MPAI standards target components, not systems, not the inside of the components, but the outside of the components. The goal is, indeed, to ensure standard users that the components will be “readily fitted to other parts without adjustment”.

The MPAI definition of standard appeared in an old version of the Encyclopaedia Britannica. Probably that the definition was inspired decades before, at the dawn of industrial standards and spearheaded by the British Standards Institute, the first modern industry standard association, when drilling, reaming and threading were all the rage in the industry of the time.

Drilling, reaming and threading in AI

AI has nothing to do with drilling, reaming and threading (actually, it could, but this is not a story for today). However, MPAI addresses the problem of standards in the same way a car manufacturer addresses the problem of procuring nuts and bolt.

Let us consider an example AI problem, a system that allows a machine to have a more meaningful dialogue with a human than it is possible today. Today, with speech recognition and synthesis technologies, it is already possible to have a meaningful man-machine dialogue. However, if you are offering a service and you happen to deal with an angry customer, it is highly desirable for the machine to understand the customer’s state of mind, i.e., her “emotion” and reconfigure the machine’s answers appropriately, lest the customer gets angrier. In yet another level of complexity, if your customer is having an audio-visual conversation with the machine, it would be useful for the machine to extract the person’s emotions from her face traits.

Sure, some companies can offer complete systems, full of neural networks designed to do the job. There is a problem, though, what control do you, as a user, have on the way AI is used in this big black box? The answer is unfortunately none, and this is one of the problems of mass use of AI where millions and in the future billions of people will deal with machines that show levels of intelligence, without people knowing how that (artificial) intelligence has been programmed before being injected in a machine or a service.

MPAI does not have in its mission nor can it offer a full solution to this problem. However, MPAI standards can offer a path that may lead to a less uncontrolled deployment of AI. This is exemplified by Figure 1 below.

Figure 1 – Human-machine conversation with emotion

Each of the six modules in the figure can be neural networks that have been trained to do a particular job. If the interfaces of the “Speech recognition” module, i.e., the AI equivalent of “mechanical threading”, are respected, the module can be replaced by another having the same interfaces. Eventually you can have a system with the same functionality but, possibly, with different performance. Individual modules can be tested in appropriate testing environments to assess how well a module does the job it claims it does.

It is useful to compare this approach with the way we understand the human brain operates. Our brain is not a network of variously connected 100 billion neurons. It is a system of “modules” whose nature and functions have been researched for more than a century. Each “module” is made of smaller components. All “modules” and their connections are implemented with the same technology: interconnected neurons.

Figure 2, courtesy of Prof. Wen Gao of Pengcheng Lab, Shenzhen, Guangdong, China, shows the processing steps of an image in the human brain until the content of the image is “understood” and the“push a button” action is ordered.

Figure 2 – The path from the retina to finger actuation in a human

A module of the figure is the Lateral Geniculate Nucleus (LGN). This connects the optic nerve to the occipital lobe. The LGN has 6 layers, kind of sub-modules, each of which performs distinct functions. Likewise for the other modules crossed by the path.

Independent modules need an environment

WE do not know what “entity” in the human brain controls the thousands of processes that take place in it, but we know that without an infrastructure governing the operation  we cannot make the modules of Figure 1 to operate and produce the desired results.

The environment where “AI modules” operate is clearly a target for a standard and MPAI has already defined the functional requirements for what it calls AI Framework, depicted in Figure 3. A Call for Technologies has been launched and submissions are due 2021/02/15.

Figure 3 – The MPAI AI Framework model (MPAI-AIF)

The inputs at the left-hand side correspond to the visual information from the retina in Figure 2, the outputs correspond to the activation of the muscle. One AI Module (AIM)  could correspond to the LGN and another to the V1 visual cortex, Storage could correspond to the short-term memory, Access to the long-term memory and Communication to the structure of axons connecting the 100 billion neurons. The AI Framework model considers the possibility to have distributed instances of AI Frameworks, something for which we have no correspondence, unless we believe in the possibility for a human to hypnotise another human and control their actions 😉.

The other element of the AI Framework that has no correspondence with the human brain – until proven otherwise, I mean – is the Management and Control component. In MPAI this plays clearly a very important role as demonstrated by the MPAI-AIF Functional Requirements.

Implementing human-machine conversation with emotion

Figure 1 is a variant of an MPAI Use Case called Conversation with Emotion, one of the 7 Use Cases that have reached the Commercial Requirements stage in MPAI. An implementation using the AI Framework can be depicted as in Figure 4.

Figure 4 – A fully AI-based implementation of human-machine conversation with emotion

If the six AIMs are implemented according to the emerging MPAI-AIF standard, then they can be individually obtained from an open “AIM market” and added to or replaced in Figure 4. Of course, a machine capable to have a conversation with a human can be implemented in many ways. However, a non standard system must be designed and implemented in all its components, and users have less visibility of how the machine works.

One could ask: why should AI Modules be “AI”? Why can’t they be simply Modules, i.e., implemented with legacy data processing technologies? Indeed, data processing in this and other fields has a decade-long history. While AI technologies are fast maturing, some implementers may wish to re-use some legacy Modules they have in their stores.

The AI Framework is open to this possibility and Figure 5 shows how this can be implemented. AI Modules contain the necessary intelligence in the neural networks inside the AIM, while legacy modules typically need Access to an external Knowledge Base.

Figure 5 – A mixed AI-legacy implementation of human-machine conversation with emotion

Conclusions

This article has described how MPAI is implementing its mission of developing standards in the Moving Picture, Audio and Data Coding by Artificial Intelligence domain. The method described blends the needs to have a common reference reference (the “agreement”, as called above) with the need to leave ample room to competition between actual implementations of MPAI standards.

The subdivision of a possibly complex AI system in elementary blocks – AI Modules – not only promotes the establishment of a competitive market of AI Modules, but gives users an insight on how the components of the AI system operate, hence giving back to humans more control on AI systems. It also lowers the threshold to the introduction of AI spreading its benefits to a larger number of people.