Moving Picture, Audio and Data Coding
by Artificial Intelligence

New age – new video coding technologies

The last 30 years of video coding have been productive because, over the years, compression rate has improved enabling digital video to take over all existing and extend to new video applications. The coding algorithm has been tweaked over and over, but it is still based on the same original scheme.

In the last decade, the “social contract” that allowed inventors to innovate, innovation to be brought into standards, standards to be used and inventors to be remunerated has stalled. The HEVC standard is used but the entire IP landscape is clouded by uncertainty.

The EVC standard, approved in 2020, 7 years after the HEVC was approved, has shown that even with an inflow of technologies from a reduced number of sources can provide outstanding results, as shown in the figure:

The EVC baseline, a profile that uses 20+ years old technologies, reaches the performance of EVC. The main profile offers a bitrate reduction of 39% over HEVC, a whisker away from the performance of the lasted VVC standard.

In 1997 the match between IBM Deep Blue and the (human) chess champion of the time made headlines: IBM Deep Blue beat Garry Kasparov. It was easy to herald the age when machines will overtake human non just in keeping accounts, but also in one of the noblest intellectual activities: chess.

This was achieved by writing a computer program that explored more alternatives that a human could reasonably do, although a human’s intuition can look far into the future. In that sense, Deep Blue operated much like MPEG video coding.

Google DeepMind’s AlphaGo did the same in 2015 by beating the Go champion Sedol Lee. The Go rules are simpler than chess rules, but the alternatives in Go are way more numerous. There is only one type of piece (the stone) onstead of six (king, queen, bishop, knight, rook and pawn), but the Go board has 19×19 boxes instead of 8×8 of chess. While DeepBlue made a chess move by brute-force exploring future moves, AlphaGo made go moves relying on neural networks which had learned moves.

That victory signalled a renewed interest in a 3/4 of a century old technology – neural networks.

In neural networks data are processed by different layers that extract essential information until a compressed representation of the input data is achieved (left-hand side). At the decoder, the inverse process takes place.

MPAI has established the AI-Enhanced Video Coding (MPAI-EVC) standard project. This is based on an MPAI study collecting published results where individual HEVC coding tools have been replaced by neural networks (in the following we call then AI tools). By summing up all the published gains and improvement of 29% is obtained.

This is an interesting, but far from being a scientifically acceptable result because the different tools used were differently trained. Therefore, MPAI is currently engaged in the MPAI-EVC Evidence Project that can be exemplified by the following figure:

Here all coding tools have been replaced by AI tools. We intend to train these new tools with the same source material (a lot of it) and assess the improvement obtained.

We expect to obtain an objectively measured improvement of at least 25%.

After this MPAI will engage in the actual development of MPAI-EVC. We expect to obtain an objectively measured improvement of at least 35%. Our experience suggests that the subjectively measured improvement will be around 50%.

Like in Deep Blue, old tools had a priori statistical knowledge is modelled and hardwired in the tools, but in AI, knowledge is acquired by learning the statistics.

 

This is the reason why AI tools are more promising than traditional data processing tools.

For a new age you need new tools and a new organisation tuned to use those new tools.

 

 


MPAI receives technologies for its AI framework standard and calls for technologies supporting audio and human-machine conversation

Geneva, Switzerland – 17 february 2021. At its 5th General Assembly, Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI), an international, unaffiliated standards association

  1. Has kicked off work on its AI Framework (MPAI-AIF) standard after receiving substantial proposed technologies.
  2. Is calling for technologies to develop two standards related to audio (MPAI-CAE) and multimodal conversation (MPAI-MMC).
  3. Will soon be developing the Framework Licence for the next maturing project “Compression and Understanding of Industrial Data” (MPAI-CUI).

MPAI has reviewed responses to the call issued 2 months ago for tech­nologies supporting its AI Framework (MPAI-AIF) standard. The goal is to enable creation and automation of mixed Mach­ine Learning (ML) – Artificial Intelligence (AI) – Data Proces­sing (DP) and inference workflows, implemented as software, hardware or mixed software and hardware. MPAI-AIF will offer extended explainability to applications conforming to MPAI standards. The submissions received are enabling MPAI to develop the inten­ded standard whose publication is planned for July 2021.

MPAI has issued two Calls for Technologies supporting two new standards:

  1. The Context-based Audio Enhancement (MPAI-CAE) standard will improve the user exper­ien­ce for several audio-related applications in a variety of contexts such as in the home, in the car, on-the-go, in the studio etc. Examples of use are adding a desired emotion to a speech without emotion, preserving old audio tapes, improving the audioconference experience and removing unwanted sounds while keeping the relevant ones to a user walking in the street.
  2. The Multimodal Conversation (MPAI-MMC) standard will enable a human-mach­ine conver­sation that emulates human-human conversation in completeness and intensity by using AI. Examples of use are an audio-visual conversation with a machine where the machine is imper­sonated by a synthesised voice and an animated face, a request for information about an object while displaying it, a human question to a machine translated using a voice preserving the speech features of the human.

The content of the two Calls for Technologies will be introduced at two online conferences. Attendance is open to interested parties.

MPAI has developed the functional requirements for the Compression and Understanding of In­dustrial Data (MPAI-CUI) and, by decision of the General Assembly, MPAI Active Members may now develop the Framework Licence to facilitate the actual licences – to be developed outside of MPAI. The standard will be used to assess the risks faced by a company by using information from the flow of data produced.

The MPAI web site provides information about other AI-based standards being developed: AI-Enhanced Video Coding (MPAI-EVC) will improve the performance of existing video codecs, Integrative Genomic/Sensor Analysis (MPAI-GSA) will compress and understand the res­ults of combining genomic experiments with those produced by related devices/sensors, and Server-based Predictive Multiplayer Gaming (MPAI-SPG) will improve the user experience of online multiplayer game players.

MPAI develops data coding standards for applications that have AI as the core enabling technology. Any legal entity who supports the MPAI mission may join MPAI if it is able to contribute to the development of standards for the efficient use of data.

Visit the MPAI web site and contact the MPAI secretariat for specific information.

 


Having a conversation with a machine

Like it or not, we do what the title says regularly, with mixed results. Searching for that file in your computer should be easy but it is often not, finding that an email is a hassle, especially when retrieving it is so important, and talking with an information service is often challenge to your nervous system.

I have no intention to criticise the – difficult – work that others have done, but the results of human-machine conversation are far from satisfactory.

Artificial Intelligence promises to do a better job.

Since a few months, MPAI has started working on an area called Multimodal Conversation (MMC). The intention is to use a plurality of communication means to improve the ability of humans to talk to machines. Currently, the area includes 3 Use Cases:

  1. Conversation with Emotion (CWE)
  2. Multimodal Question Answering (MQA)
  3. Personalized Automatic Speech Translation (PST).

In this article I would like to present the first, CWE.

The driver of this use case is that, to improve the ability of a machine to have a conversation with a human, it is important for the machine to have information about the words that are used by the human but also on other entities such as the emotional state of the human. If a human is asking a question about the quality of a telephone line, it is important for the machine to know if the emotion of the speaker is neutral – the question is likely purely informational – or altered – the question likely implies a dissatisfaction with the telephone service.

In CWE the machine uses speech, text from a keyboard and video to make the best assessment about the conversation partner’s emotional state. This is shown by Figure 1 where you can see that three blocks are dedicated to extracting emotion in 3 modes. The 3 estimated emotions are feed to another block which is tasked to make a final decision.

Figure 1 – Emotion extraction from text, speech, and video

The first three blocks starting from the left-hand side can be implemented as neural networks. But MPAI does not wish to disenfranchise those who have invested for years in traditional data processing solutions and have produced state-of-the-art technologies. MPAI seeks to define standards that are, as far as possible, technology independent. Therefore, in a legacy context Figure 1 morphes to Figure 2

Figure 2 – Emotion extraction using legacy technologies

 In the system of Figure 2, each of the 3 initial blocks extracts features from the input data and uses a vector of features to query an appropriate Knowledge Base that responds with one or more candidate emotions.

Actually, Video analysis and Language understanding do more than just providing emotion information. This is seen in the following Figure 3 where the two blocks additionally provide Meaning, i.e., information extracted from the text and video such as question, statement, exclamation, expression of doubt, request, invitation etc.

 

Figure 3 – Emotion and meaning enter Dialogue processing

 

Meaning and emotion are fed into the Dialogue processing component. Note that in a legacy implementation Dialogue processing, too, needs access to a Dialogue Knowledge Base. From now on, however, we will assume to deal with a full AI-based implementation.

Dialogue processing produces two streams of data, as depicted in Figure 4. The result is composed by:

  1. a data stream to drive speech synthesis expressed either as “Text with emotion” and “Concept with emotion”.
  2. a data stream to drive face animation in tune with the speech.

Figure 4 – End-to-end multimodal conversation with emotion

 

The last element, to move from theory to practice, is that you need an environment where you can place the blocks (that MPAI calls AI Modules – AIM) establish all connections, activate all timings, and execute the chain. One could even want to train or retrain the individual neural networks.

The technology that makes this possible by the MPAI AI Framework (MPAI-AIF) for which a Call for Technologies has been published on 2020/12/16 and whose responses are due on 2021/02/15. The full scheme of Multimodal conversation with emotion in the MPAI AI Framework is represented by Figure 5

Figure 5 – Multimodal conversation with emotion in the MPAI AI Framework

The six components of MPAI-AIF are:

  • Management and Control, in charge of the workflow
  • Execution, the environment where the workflow is executed.
  • AI Modules (AIM), the basic blocks of the system
  • Communication, to handle both internal and external communication (e.g., Dialogue processing could be located on the cloud)
  • Storage, for exchanging data between AIMs.
  • Access, to access static or slowly changing external data.

What does MPAI intend to standardise in Conversation with emotion? In a full AI-based implementation

  1. The format of the Text leaving Speech recognition.
  2. Representation of Emotion
  3. Representation of Meaning
  4. Format of Reply: Text with Emotion or Concept with Emotion
  5. Format of Video anmation

In case of a legacy implementation, in addition to the above we need:

  1. Emotion KB (video) query format with Video features
  2. Emotion KB (speech) query format with Speech features
  3. Emotion KB (text) query format with Text features
  4. Dialoue KB query format

As you see MPAI standardisation is minimal, in tune with the requirement of standardisation to specify the minimum that is necessary for interoperability. In the MPAI case the minimum is what is required to assemble a working system using AIMs from independent sources.


Opacity or transparency?

By publishing its Manifesto, MPAI has made clear in a few sentences its strategic analysis of the AI industry, what will be its action points and why that will provide benefits.

#1 Applications using AI are extending in scope and performance. the AI industry is one of the fastest growing industries.

#2 The industry is not developing as fast as it could because there are hurdles. The first is the fact that the AI application development model is based on frameworks that tend to create the well-known walled garden effect. Importing applications is easy but exporting applications is difficult. We are missing the seamless interactions that could propel the industry to new heights.

The second hurdle is the fact that most AI applications are monolithic and opaque. If these two adjectives apply to an application that changes a punctured tyre, you may not care, but if it is an application that selects relevant news, I do care.

#3 MPAI believes that AI interoperability standards will have the same beneficial effects that digital media standard had on the media industry starting 30 years ago and continuing today. AI and media may very well be different beasts, but the MPAI notion of coding as the transformation of data from one representation to an equivalent one more suited to a specific application shows that there are more commonalities than differences.

#4 The video and audio coding and decoding chips – minuscule entities compared to the size of digital media services they enabled – are mirrored by the MPAI notion of AI module. As much as the digital media standards defined the syntax and semantics of the data coming into the decoder, the AI modules are defined by the syntax and semantics of the data coming into the AI Module (AIM). However, because an AIM is typically connected to other AIMs, the syntax, and semantics of the output data of an AIM are also standardised. As much as the “digital media decoder” became a basic component available on the open market, MPAI expects that the “AI Modules” will become basic components available on the open market.

#5 The fact that AIMs are meant to be connected in a variety of topologies shows that MPAI has an additional problem to solve. The MPAI “AI Framework” (AIF), for which MPAI has issued a Call for Technologies due 15 February 2021 will enable creation, execution, composition, and update of AIM-based workflows.

#6 The elements described above show that MPAI standards will benefit all actors involved.

  • Technology providers will be able to offer their conforming AIMs with different technologies – AI, ML, legacy data processing – and different levels of performance to an open market.
  • Application developers will find on the open market the AIMs needed by their applications and will thus be able to develop more ambitious applications that they could otherwise develop.
  • The fact that there will be a race in a level play field among providers of standard AIM will fuel innovation.
  • Consumers will have a wider choice of better AI applications from a competitive market of application providers who will be able to draw state-of-the-art technologies (AIMs) from the open market.
  • Society will be able to lift the veil of opacity from large, monolithic AI-based applications because atomic AIMs will tell a lot of what is the logic that runs the application.

#7 The very fact that so much investment is being made on AI by Academia and Industry – e.g., representation learning, transfer learning, edge AI, and reproducibility of performance today and more in the future – means that innovations will characterise AI on which MPAI standards are based for the years to come.

#8 One of the founding drivers of MPAI is the realisation that Fair, Reasonable and Non-Discriminatory (FRAND) declarations are no longer a match to today’s technology and business complexity. Society is deprived of the use of valuable technologies and IP holders are deprived of a fair remuneration of their investment. By setting in advance IP holders’ IPR guidelines MPAI’s Framework Licences will facilitate users of MPAI standards. A first instance of an MPAI Framework Licence for MPAI-AIF has already been developed.

#9 MPAI is proud of being a technical body, but even more proud of being aware of the revolutionary impact AI will have on the future of human society and that technology and society should not be antagonists in addressing the revolution. MPAI will create opportunities for its standards developers to interact with high-level thinkers. Instead of addressing expected problems from AI in abstract terms, MPAI will propose actual cases and seek advice on the impact its standards will have on society.

#10 This is not wishful thinking. Cutting monolithic AI applications in smaller pieces is not a dream but a concrete reality underpinned by the MPAI AI Framework and the MPAI AI Modules. It is debatable whether an AIM/AIF-based solution will be more efficient than a monolithic AI solution. It is not debatable that an AIM/AIF-based solution will be less onerous to design and make and that it will be more transparent to users.


MPAI addresses new standards for Context-based Audio Enhancement and Multimodal Conversation

Geneva, Switzerland – 20 January 2021. Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI), an international unaffiliated standards association, is engaging in two new areas for standardisation: Context-based Audio Enhancement and Multimodal Conversation.

Standards in the Context-based Audio Enhancement (MPAI-CAE) work area will improve the user experience for several audio-related applications in a variety of contexts such as in the home, in the car, on-the-go, in the studio etc. Examples are:

  1. adding emotions to a speech without emotion
  2. preserving old audio tapes
  3. improving the audioconference experience
  4. removing unwanted sounds while keeping relevant sounds to a user walking in the street.

Standards for Multimodal Conversation (MPAI-MMC) work area will enable a human-machine conversation that emulates human-human conversation in completeness and intensity by using AI. Examples are focused on machines

  1. producing speech and animated face that have a level of emotion consistent with the emotion contained in text, speech and face of the human who is talking to the machines;
  2. responding to a question asked by a human who is showing an object;
  3. translating to another language a question asked by a human using a voice with features similar to those of the human.

MPAI plans on publishing Calls for Technologies for these two standards at its next General Assembly on 2021/02/17. The current draft are available here and here.

At its last General Assembly (MPAI-3), MPAI issued a Call for Technologies for its planned AIF Framework (MPAI-AIF) standard. Submissions are due 2020/02/15 for review and action at its next General Assembly (MPAI-5) on 2020/02/17.

The MPAI web site provides information about other MPAI standards being developed: MPAI-CUI uses AI to compress and understand industrial data, MPAI-EVC to improve the performance of existing video codecs, MPAI GSA to understand and compress the res­ults of combining genomic experiments with those produced by related devices/sensors, e.g. video, motion, location, weather, medical sensors, and MPAI-SPG to improve the user experience of online multiplayer games.

MPAI develops data coding standards for applications that have AI as the core enabling technology. Any legal entity that supports the MPAI mission may join MPAI if it is able to contribute to the development of standards for the efficient use of data.

Visit the MPAI home page and contact the MPAI secretariat for specific information.

 

 


A vision for AI-based data coding standards

Use of technologies based on Artificial Intelligence (AI) is extending to more and more applic­ations yielding one of the fastest-grow­ing markets in the data analysis and service sector.

However, industry must overcome hurdles for stakeholders to fully exploit this historical oppor­tunity: the current framework-based development model that makes applic­ation redep­loyment difficult, and monolithic and opaque AI applications that generate mistrust in users.

MPAI – Moving Picture, Audio and Data Coding by Artificial Intelligence – believes that univer­sally accessible standards can have the same positive effects on AI as digital media stan­dards and has identified data coding as the area where standards can foster development of AI tech­nologies, promote use of AI applications and contribute to the solution of existing problems.

MPAI defines data coding as the transformation of data from a given representation to an equiv­alent one more suited to a specific application. Examples are compression and semantics extraction.

MPAI considers AI module (AIM) and its interfaces as the AI building block. The syntax and semantics of interfaces determine what AIMs should per­form, not how. AIMs can be implemented in hardware or software, with AI or Machine Learning legacy Data Processing.

MPAI’s AI framework enabling creation, execution, com­pos­ition and update of AIM-based work­flows (MPAI-AIF) is the cornerstone of MPAI standardisation because it enables building high-com­plexity AI solutions by interconnecting multi-vendor AIMs trained to specific tasks, operating in the standard AI framework and exchanging data in standard formats.

MPAI standards will address many of the problems mentioned above and benefit various actors:

  • Technology providers will be able to offer their conforming AIMs to an open market
  • Application developers will find on the open market the AIMs their applications need
  • Innovation will be fuelled by the demand for novel and more performing AIMs
  • Consumers will be offered a wider choice of better AI applications by a competitive market
  • Society will be able to lift the veil of opacity from large, monolithic AI-based applications.

Focusing on AI-based data coding will also allow MPAI to take advantage of the results of emer­ging and future research in representation learning, transfer learning, edge AI, and reproducibility of perfor­mance.

MPAI is mindful of IPR-related problems which have accompanied high-tech standardisation. Unlike standards developed by other bodies, which are based on vague and contention-prone Fair, Reasonable and Non-Discriminatory (FRAND) declarations, MPAI standards are based on Frame­work Licences where IPR holders set out in advance IPR guidelines.

Finally, although it is a technical body, MPAI is aware of the revolutionary impact AI will have on the future of human society. MPAI pledges to address ethical questions raised by its technical work with the involvement of high-profile external thinkers. The initial significant step is to enable the understanding of the inner working of complex AI systems.


FRAND forever? Or are there other business models possible?

What is the FrameWork License (FWL)

In the context of the development of new technology standards, Intellectual Property Rights (IP Rights) are the engine that ensure and sustain technology innovation. FWL intends to move past FRAND assurances which haven’t reduced friction between innovators and implementers.

In fact, the question of the implementation of FRAND assurances has created a diversity of interpretations, including different decisions taken by the courts. So much so that the recent judgment of the UK Supreme Court affirms a comprehensive principle of the meaning of FRAND: “This is a single, composite obligation, not three distinct obligations that the license terms should be fair, and separately, reasonable, and separately, non-discriminatory “.

So, the FRAND assurance, made during the standardization process of a new technology, has become a “headache” not only for the courts, but also for those who have to operate on the basis of what happened during the standardization process.

This is one reason why a new international and unaffiliated standards association called MPAI (Moving Picture, audio and data coding by Artificial Intelligence) has been established. MPAI has adopted a new management model of the IP Rights associated with the works done within a standardization body. This new industrial property management model is called FrameWork License (FWL). This model intends to overcome all the uncertainties generated by the FRAND declaration, because guidelines on how the future licenses relating to the Standard Essential Patents (SEPs) should be applied are already established at the outset of the standardization work.

With these more precise guidelines already decided in the course of the standardization process, MPAI plans to help both the holders of Standard Essential Patents (SEP) and the implementers of the new standardized technologies find an agreement for the use of SEPs, avoiding the frictions that we have sometimes seen.

As a consequence of the standardization works, a Call for Technologies supporting the MPAI AI Framework (AIF) standard was recently issued, along with AIF Framework License for its potentially essential IPRs.

The technical goal of MPAI-AIF is to enable the set up and the execution of mixed processing and inference workflows made of Machine Learning, Artificial Intelligence and legacy Data Processing components called AI Modules (AIM).

The MPAI AI Framework standard will facilitate integration of AI and legacy data processing components through standard interfaces and methods. MPAI experts have already validated MPAI’s innovative approach in a sample micro controller-based implementation that is synergistic with MPAI-AIF standard development.

The Framework License.

Access to the standard will be granted in a non-discriminatory fashion in compliance with the generally accepted principles of competition law and agreed upon before a standard is developed.

MPAI has replaced FRAND assurances with FWLs defined as the set of voluntary terms to use in a license, without monetary values. FWLs are developed by a committee (the IPR Support Advisory Committee) of MPAI members who are experts in the field of IP .

Practically, the FWL is the business model to remunerate IPRs in the standard that does not bear values: no $, no %, no dates etc. At most, the FWL could provide that in individual cases there is a CAP for the royalties to be paid, or an initial grace period where no royalties are paid to foster the adoption of the technology by the market and so on. Furthermore, the FWL states that the total cost of the licenses issued by IPR holders will be in line with the total cost of the licenses for similar standardized technologies and will take into account the value on the market of the specific standardized technology.

Only when the future standards developed by MPAI will be adopted by the marked and the FWLs will operate as guidelines for licensing the technologies compliant with the standard, it will be possible to really understand if the FWL is useful to help close the gap between the licensors and implementers. At that point, we might simply put the current FRAND declaration concept in the attic.

The full text of the FWL associated with the MPAI-AIF standard can be found on at this link.

The guidelines for the subsequent licenses to the AIF-FWL are listed in the following:

Conditions of use of the License

  1. The License will be in compliance with generally accepted principles of competition law and the MPAI Statutes
  2. The License will cover all of Licensor’s claims to Essential IPR practiced by a Licensee of the MPAI-AIF standard.
  3. The License will cover Development Rights and Implementation Rights
  4. The License will apply to a baseline MPAI-AIF profile and to other profiles containing additional technologies
  5. Access to Essential IPRs of the MPAI-AIF standard will be granted in a non-discriminatory fashion.
  6. The scope of the License will be subject to legal, bias, ethical and moral limitations
  7. Royalties will apply to Implementations that are based on the MPAI-AIF standard
  8. Royalties will not be based on the computational time nor on the number of API calls
  9. Royalties will apply on a worldwide basis
  10. Royalties will apply to any Implementation
  11. An MPAI-AIF Implementation may use other IPR to extend the MPAI-AIF Implementation or to provide additional functionalities
  12. The License may be granted free of charge for particular uses if so is decided by the licensors
  13. The Licenses will provide:
  14. a threshold below which a License will be granted free of charge and/or
  15. a grace period during which a License will be granted free of charge and/or
  16. an annual in-compliance royalty cap applying to total royalties due on worldwide revenues for a single Enterprise
  17. A preference will be expressed on the entity that should administer the patent pool of holders of Patents Essential to the MPAI-AIF standard
  18. The total cost of the Licenses issued by IPR holders will be in line with the total cost of the Licenses for similar technologies standardized in the context of Standard Development Organizations
  19. The total cost of the Licenses will take into account the value on the market of the AI Framework technology Standardized by MPAI.

By Robero Dini, member


What is the state of MPAI work?

Introduction

This article responds to the question:  where is MPAI today, 1st day of 2021, 3 months after its foundation, with its mission and plans?

Converting a mission into a work plan

Looking back, the MPAI mission “Moving Picture, Audio and Data Coding by Artificial Intelligence” looked very attractive, but the task of converting that nice-looking mission into a work plan wasdaunting. Is there anything to standardise in Artificial Intelligence (AI)? Thousands of companies use AI but do not need standards. Isn’t it so that AI signals the end of media and data coding standardisation?

The first answer is that we should first agree on a definition of standard. One is “the agreement reached by a group of individuals who recognise the advantage of all doing certain things in an agreed way”. There is, however, an older definition of standard that says “the agreement that permits large production runs of component parts that are readily fitted to other parts without adjustment”.

Everybody knows that implementing an MPEG audio or video codec means following a minutely prescribed procedure implied by definition #1. But what about an MPAI “codec”?

In the AI world, a neural network does the job it has been designed for and the network designer does not have to share with anyone else how his neural network works. This is true for the “simple” AI applications, like using AI to recognise a particular object, and for some of the large-scale AI applications that major OTTs run on the cloud.

The application scope of AI is expanding, however, and application developers do not necessarily have the know-how, capability or resources to develop all the pieces needed to make a complete AI application. Even if they wanted to, they could very well end up with an inferior solution because they would have to spread their resources across multiple technologies instead of concentrating on those they know best and acquire the others from the market.

MPAI has adopted the definition of standard as “the agreement that permits large production runs of component parts that are readily fitted to other parts without adjustment”. Therefore, MPAI standards target components, not systems, not the inside of the components, but the outside of the components. The goal is, indeed, to ensure standard users that the components will be “readily fitted to other parts without adjustment”.

The MPAI definition of standard appeared in an old version of the Encyclopaedia Britannica. Probably that the definition was inspired decades before, at the dawn of industrial standards and spearheaded by the British Standards Institute, the first modern industry standard association, when drilling, reaming and threading were all the rage in the industry of the time.

Drilling, reaming and threading in AI

AI has nothing to do with drilling, reaming and threading (actually, it could, but this is not a story for today). However, MPAI addresses the problem of standards in the same way a car manufacturer addresses the problem of procuring nuts and bolt.

Let us consider an example AI problem, a system that allows a machine to have a more meaningful dialogue with a human than it is possible today. Today, with speech recognition and synthesis technologies, it is already possible to have a meaningful man-machine dialogue. However, if you are offering a service and you happen to deal with an angry customer, it is highly desirable for the machine to understand the customer’s state of mind, i.e., her “emotion” and reconfigure the machine’s answers appropriately, lest the customer gets angrier. In yet another level of complexity, if your customer is having an audio-visual conversation with the machine, it would be useful for the machine to extract the person’s emotions from her face traits.

Sure, some companies can offer complete systems, full of neural networks designed to do the job. There is a problem, though, what control do you, as a user, have on the way AI is used in this big black box? The answer is unfortunately none, and this is one of the problems of mass use of AI where millions and in the future billions of people will deal with machines that show levels of intelligence, without people knowing how that (artificial) intelligence has been programmed before being injected in a machine or a service.

MPAI does not have in its mission nor can it offer a full solution to this problem. However, MPAI standards can offer a path that may lead to a less uncontrolled deployment of AI. This is exemplified by Figure 1 below.

Figure 1 – Human-machine conversation with emotion

Each of the six modules in the figure can be neural networks that have been trained to do a particular job. If the interfaces of the “Speech recognition” module, i.e., the AI equivalent of “mechanical threading”, are respected, the module can be replaced by another having the same interfaces. Eventually you can have a system with the same functionality but, possibly, with different performance. Individual modules can be tested in appropriate testing environments to assess how well a module does the job it claims it does.

It is useful to compare this approach with the way we understand the human brain operates. Our brain is not a network of variously connected 100 billion neurons. It is a system of “modules” whose nature and functions have been researched for more than a century. Each “module” is made of smaller components. All “modules” and their connections are implemented with the same technology: interconnected neurons.

Figure 2, courtesy of Prof. Wen Gao of Pengcheng Lab, Shenzhen, Guangdong, China, shows the processing steps of an image in the human brain until the content of the image is “understood” and the“push a button” action is ordered.

Figure 2 – The path from the retina to finger actuation in a human

A module of the figure is the Lateral Geniculate Nucleus (LGN). This connects the optic nerve to the occipital lobe. The LGN has 6 layers, kind of sub-modules, each of which performs distinct functions. Likewise for the other modules crossed by the path.

Independent modules need an environment

WE do not know what “entity” in the human brain controls the thousands of processes that take place in it, but we know that without an infrastructure governing the operation  we cannot make the modules of Figure 1 to operate and produce the desired results.

The environment where “AI modules” operate is clearly a target for a standard and MPAI has already defined the functional requirements for what it calls AI Framework, depicted in Figure 3. A Call for Technologies has been launched and submissions are due 2021/02/15.

Figure 3 – The MPAI AI Framework model (MPAI-AIF)

The inputs at the left-hand side correspond to the visual information from the retina in Figure 2, the outputs correspond to the activation of the muscle. One AI Module (AIM)  could correspond to the LGN and another to the V1 visual cortex, Storage could correspond to the short-term memory, Access to the long-term memory and Communication to the structure of axons connecting the 100 billion neurons. The AI Framework model considers the possibility to have distributed instances of AI Frameworks, something for which we have no correspondence, unless we believe in the possibility for a human to hypnotise another human and control their actions 😉.

The other element of the AI Framework that has no correspondence with the human brain – until proven otherwise, I mean – is the Management and Control component. In MPAI this plays clearly a very important role as demonstrated by the MPAI-AIF Functional Requirements.

Implementing human-machine conversation with emotion

Figure 1 is a variant of an MPAI Use Case called Conversation with Emotion, one of the 7 Use Cases that have reached the Commercial Requirements stage in MPAI. An implementation using the AI Framework can be depicted as in Figure 4.

Figure 4 – A fully AI-based implementation of human-machine conversation with emotion

If the six AIMs are implemented according to the emerging MPAI-AIF standard, then they can be individually obtained from an open “AIM market” and added to or replaced in Figure 4. Of course, a machine capable to have a conversation with a human can be implemented in many ways. However, a non standard system must be designed and implemented in all its components, and users have less visibility of how the machine works.

One could ask: why should AI Modules be “AI”? Why can’t they be simply Modules, i.e., implemented with legacy data processing technologies? Indeed, data processing in this and other fields has a decade-long history. While AI technologies are fast maturing, some implementers may wish to re-use some legacy Modules they have in their stores.

The AI Framework is open to this possibility and Figure 5 shows how this can be implemented. AI Modules contain the necessary intelligence in the neural networks inside the AIM, while legacy modules typically need Access to an external Knowledge Base.

Figure 5 – A mixed AI-legacy implementation of human-machine conversation with emotion

Conclusions

This article has described how MPAI is implementing its mission of developing standards in the Moving Picture, Audio and Data Coding by Artificial Intelligence domain. The method described blends the needs to have a common reference reference (the “agreement”, as called above) with the need to leave ample room to competition between actual implementations of MPAI standards.

The subdivision of a possibly complex AI system in elementary blocks – AI Modules – not only promotes the establishment of a competitive market of AI Modules, but gives users an insight on how the components of the AI system operate, hence giving back to humans more control on AI systems. It also lowers the threshold to the introduction of AI spreading its benefits to a larger number of people.


An introduction to the MPAI-AIF Call for Technologies

On 202/12/21 MPAI has held a teleconference to illustrate the MPAI-AIF Call for Technologies (CfT) and associated Framework Licence (FWL). This article summarises the main points illustrated at the teleconference: Whay and who is MPAI, the MPAI-AIF Functional Requirements, the MPAI_AIF Call for Technologies and the MPAI-AIF Framework Licence.

Miran Choi, an MPAI Director and Chair of the Communication Advisory Committee, recalled the reasons that led to the establishment of MPAI.

Over the past 3 decades, media compression standards have allowed manufacturing and services to boom. However, the technology momentum is progressively slowing while AI technologies are taking stage by offering more capabilities than traditional technologies, by being applicable to data other than audio/video and by being  supported by a global research effort. In addition to that, industry has recently suffered from the inadequacy of the FAIR, Reasonable and Non-Discriminatory (FRAND) model to deal with the tectonic changes of technology-intensive standards.

Miran then summarised the main characteristics of MPAI. A non-profit, unaffiliated and international association that develops

  • Standards for
    • AI-enabled data coding
    • Technologies that facilitate integration of data coding components in ICT systems and
  • Associated clear IPR licensing frameworks.

MPAI is the only standards organisation that has set AI as the key enabling technology for data coding standards. MPAI members come from industry, research and academia of 15 countries, representing a broad spectrum of technologies and applications.

The development of standards must obey rules of openness and due process, and MPAI has a rigorous process to develop standards in 6 steps

Step

Description

Use cases Collect/aggregate use cases in cohesive projects applicable across industries
Functional Requirements Identify the functional requirements the standard should satisfy
Commercial Requirements Develop and approve the framework licence of the stan­dard
Call for Technologies Publish a call for technologies supporting the functional and commercial requirements
Standard development Develop standard in an especially established Devel­opment Committee (DC)
MPAI standard Complete standard and obtain declarations from all Members

The transitions from one stage to the next are approved by the General Assembly.

The MPAI-AIF standard project is at the Call for Technologies stage.

Andrea Basso, Chair of the MPAI-AIF Development Committee (AIF-DC) in charge of the development of the MPAI-AIF standard introduced the motivations and functional requirements of the MPAI-AIF standard.

MPAI has developed several Use Cases for disparate applications coming to the conclusion that they can all be implemented with a combination of AI-based modules concurring to the achievement of the intended result. the history of media standards has shown the benefits of standardisation. Therefore, to avoid the danger of incompatible implementations of modules put to the market, where costs multiply at all levels and mass adoption of AI tech is delayed, MPAI seeks to standardise AI Modules (AIM) with standard interfaces, combined and executed within an MPAI-specified AI Framework. AIMs with standard interfaces will reduce overall design costs and increase component reusability,create favourable conditions leading to horizontal markets of competing implementations, and promote adoption and incite progress of AI technologies.

AIMs need an environment where they can be combined and executed. This is what MPAI- AIF – where AIF stands for AI Framework, is about. The AI Framework is depicted in the figure.

The AI Framework has 6 components: Management and Control, Execution, AI Modules, Communication, Storage and Access.

The MPAI functional requirements are

  1. Possibility to establish general Machine Learning and/or Data Processing life cycles
    1. for single AIMs to
      1. instantiate-configure-remove
      2. dump/retrieve internal state
      3. start-suspend-stop
      4. train-retrain-update
      5. enforce resource limits
      6. implement auto-configuration/reconfiguration of ML-based computational models of
    2. for multiple AIMs to
      1. initialise the overall computational model
      2. instantiate-remove-configure AIMs
      3. manually, automatically, dynamically and adaptively configure interfaces with Com­ponents
      4. one- and two-way signal for computational workflow initialisation and control of
      5. combinations of AIMs
  2. Application-scenario dependent hierarchical execution of workflows
  3. Topology of networked AIMs that can be synchronised according to a given time base and full ML life cycles
  4. Supervised, unsupervised and reinforcement-based learning paradigms
  5. Computational graphs, such as Direct Acyclic Graph (DAG) as a minimum
  6. Initialisation of signalling patterns, communication and security policies between AIMs
  7. Protocols to specify storage access time, retention, read/write throughput etc.
  8. Storage of Components’ data
  9. Access to
    1. Static or slowly changing data with standard formats
    2. Data with proprietary formats
  10. Possibility to implement AI Frameworks featuring
    1. Asynchronous and time-based synchronous operation depending on application
    2. Dynamic update of the ML models with seamless or minimal impact on its operation
    3. Time-sharing operation of ML-based AIMs shall enable use of the same ML-based AIM in multiple concurrent applications
    4. AIMs which are aggregations of AIMs exposing new interfaces
    5. Workflows that are a mixture of AI/ML-based and DP technology-based AIMs.
    6. Scalability of complexity and performance to cope with different scenarios, e.g. from small MCUs to complex distributed systems
  11. Possibility to create MPAI-AIF profiles

Panos Kudumakis, MPAI member, explained the MPAI-AIF Call For Technologies

  1. Who can submit
    1. All parties, including non members, who believe they have relevant technologies
    2. Responses submitted to secretariat who acknowledges via email
    3. Technologies submitted must
      1. Support the requirements of N74
      2. Be released according to the MPAI-AIF Framework Licence (N101) – if selected by MPAI for inclusion in MPAI-AIF
    4. MPAI will select the most suitable technologies on the basis of their technical merits for inclusion in MPAI-AIF.
    5. MPAI in not obligated to select a particular technology or to select any technology if those submitted are found inadequate.
  2. A submission shall contain
    1. Detailed documentation describing the proposed technologies.
    2. Annex A: Information Form (contact info, proposal summary).
    3. Annex B: Evaluation Sheet to be taken into consideration for self-evaluation (quantitative & qualitative) of the submission but will be filled out during the peer-to-peer evaluation phase.
    4. Annex C: Requirements Check List (N74) to be duly filled out indicating (using a table) which requirements identified are satisfied. If a requirement is not satisfied, the submission shall indicate the reason.
    5. Annex D: Mandatory text in responses
  3. A submission may contain
    1. Comments on the completeness and appropriateness of the MPAI-AIF requirements and any motivated suggestion to extend those requirements.
    2. A preliminary demonstration, with a detailed document describing it.
    3. Any other additional relevant information that may help evaluate the submission, such as additional use cases.
  4. Assessment
    1. Respondents must present their submission or proposal is discarded.
    2. If submission is accepted in whole or in part, submitter shall make available a working implementation, including source code (for use in MPAI-AIF Reference Software) before the technology is accepted for the MPAI-AIF standard.
    3. Software can be written in compiled or interpreted programming languages and in hardware description languages.
    4. A submitter who is not an MPAI member shall immediately join MPAI, else submission is discarded.
    5. Assessment guidelines form to aid peer-to-peer evaluation phase being finalised.
  5. Calendar
    1. Call for Technologies 16 Dec (MPAI-3)
    2. Presentation Conference Calls 21 Dec/07 Jan
    3. Notification of intention to submit 15 Jan
    4. Assessment form 20 Jan (MPAI-4)
    5. Submission deadline 15 Feb
    6. Calendar of evaluation of responses 17 Feb (MPAI-5)
    7. Approval of MPAI-AIF standard 19 July (estimate)

Davide Ferri, MPAI Director and Chair of AIF-FWL, the committee that developed the MPAI-AIF Framework Licence (FWL) explained that FWL covers the MPAI-AIF technology that specifies a generic execution environment, possibly integrating Machine Learning, Artificial Intelligence and legacy Data Processing components, implementing application areas such as

  1. Context-based Audio Enhancement (MPAI-CAE)
  2. Integrative Genomic/Sensor Analysis (MPAI-GSA)
  3. AI-Enhanced Video Coding (MPAI-EVC)
  4. Server-based Predictive Multiplayer Gaming (MPAI-SPG)
  5. Multi-Modal Conversation (MPAI-MMC)
  6. Compression and Understanding of Industrial data (MPAI-CUI)

These six application areas are expected to become MPAI standards.

The FWL includes a set of definitions that are omitted here. In particular the definition of Licence, namely, the Framework Licence to which values, e.g., currency, percent, dates etc., related to a specific Intellectual Property will be added.

The FWL is expressed in concise form as below

  1. The Licence will:
    1. be in compliance with generally accepted principles of competition law and the MPAI Statutes
    2. cover all of Licensor’s claims to Essential IPR practiced by a Licensee of the MPAI-AIF standard
    3. cover Development Rights and Implementation Rights
    4. apply to a baseline MPAI-AIF profile and to other profiles containing additional technologies
  2. Grant access to Essential IPRs of the MPAI-AIF standard in a non-discriminatory fashion.
  3. Have a scope to legal, bias, ethical and moral limitations
  4. Royalties will:
    1. apply to Implementations that are based on the MPAI-AIF standard
    2. not be based on the computational time nor on the number of API calls
    3. apply on a worldwide basis
    4. apply to any Implementation
  5. An MPAI-AIF Implementation may use other IPR to extend the MPAI-AIF Implementation or to provide additional functionalities
  6. The Licence may be granted free of charge for particular uses if so decided by the licensors
  7. The Licences will specify
    1. a threshold below which a Licence will be granted free of charge and/or
    2. a grace period during which a Licence will be granted free of charge and/or
    3. an annual in-compliance royalty cap applying to total royalties due on worldwide rev­enues for a single Enterprise
  8. A preference will be expressed on the entity that should administer the patent pool of holders of Patents Essential to the MPAI-AIF standard
  9. The total cost of the Licences issued by IPR holders will be in line with the total cost of the licences for similar technologies standardised in the context of Standard Development Organisations
  10. The total cost of the Licences will take into account the value on the market of the AI Framework technology Standardised by MPAI.

Miran reminded how easily legal entities or individuals representing a technical departments of a university supporting the MPAI mission and able to contribute to the development of MPAI standards can join MPAI. They should

  1. Choose one of the two classes of membership (until 2021/12/31):
    1. Principal Members, with the right to vote (2400 €)
    2. Associate Members, without the right to vote (480 €)
  2. Send secretariat@mpai.community
    1. a signed copy of Template for MPAI Membership applications
    2. a signed copy of the MPAI Statutes. Each page should be signed and initialled
    3. a copy of the bank transfer

MPAI issues a Call for Technologies supporting its AI Framework standard

Geneva, Switzerland – 16 December 2020. Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI), an international unaffiliated standards association, has approved a Call for Technologies (CfT) for publication at its 3rd General Assembly MPAI-3. The CfT concerns tech­nologies for MPAI-AIF, acronym of the MPAI AI Frame­work standard.

The goal of MPAI-AIF is to enable set up and execution of mixed processing and infer­ence work­flows made of Machine Learning, Artificial Intelligence and legacy Data Processing com­ponents called AI Modules (AIM).

The MPAI AI Framework standard will facilitate integration of AI and legacy data processing components through standard interfaces and methods. MPAI experts have already validated MPAI’s innovative approach in a sample micro controller-based implementation that is synergistic with MPAI-AIF standard development.

In line with its statutes, MPAI has developed the Framework Licence associated with the MPAI-AIF standard. Responses to the CfT shall be in line with the requirements laid down in the CfT and shall be supported by a statement that the respondent will licence their technologies, if adopted in the standard, according to the framework licence.

MPAI is also working on a range of standards for AIM input/output interfaces used in several application areas. Two candidate standards have completed the definition of Functional Requirements and have been promoted to the Commercial Requirements stage.

The two candidates are

  1. MPAI-CAE – Context-based Audio Enhancement uses AI to improve the user experience for a variety of uses such as entertainment, communication, teleconferencing, gaming, post-prod­uction, restoration etc. in the contexts of the home, the car, on-the-go, the studio etc. allowing a dynamically optimised user experience.
  2. MPAI-MMC – Multi-Modal Conversation uses AI to enable human-machine conversation that emulates human-human conversation in completeness and intensity.

MPAI adopts a light approach in the definition AIMs standardisation. Different implementors can produce AIMs of different performance still exposing the same standard interfaces. MPAI AIMs with different features from a variety of sources will promote hor­izontal markets of AI solutions that tap from and further promote AI innovation.

The MPAI web site provides more information about other MPAI standards: MPAI-CUI uses AI to compress and understand industrial data, MPAI-EVC to improve the performance of existing video codecs, MPAI GSA to to understand and compress the res­ult of combining genomic experiments with those produced by related devices, e.g. video, motion, location, weather, medical sensors, and MPAI-SPG to improve the user experience of online multiplayer games.

MPAI develops data coding standards for applications that have AI as core enabling technology. Any legal entity that supports the MPAI mission may join MPAI if it is able to contribute to the development of standards for the efficient use of Data.

Visit the MPAI home page and contact the MPAI secretariat for specific information.