Moving Picture, Audio and Data Coding
by Artificial Intelligence

Three minutes to know what you need to know about MPAI

If media have become so pervasive, it is because smart use of data processing has reduced the amount of data generated by audio and video. Digital media can give more, but the spirit that has produced MP3, digital television, audio and video on the internet, DASH and so much more has waned

The new organisation  MPAI – Moving pictures, audio and data coding by Artificial Intelligence (AI) – has the necessary propulsive thrust. MPAI AI as the enabling technology to code data.

The term AI is in the MPAI title because AI is the enabling technology extending coding from compression (i.e., less bits for a similar result) to understanding (i.e., what the bits mean), and the use of coding from media, to many more data types.

Data processing remains a valid alternative to A-I in many domains, though…

MPAI has defined 5 pillars on which it bases its operation. The formulation of the process benefits from 30+ years of standardisation where a huge organisation – MPEG – was created from nothing and processes have been fine tuned from day-to-day real-world experience.

Pillar #1 – the process

MPAI likes to call itself, as the domain extension in mpai.community implies, a “community”. The development of MPAI standards is divided in 4 phases: Preparation, Framework Licence, Standard Development and Standard approval. They are characterised by those who are allowed to participate in each phase. In total there are 7 stages, as can be seen from the  Figure 1

Figure 1 – The MPAI standard development stages

Phase 1 – Preparation.

Stage 0 – Interest Collection (IC): Members as well as non-members may submit proposals. These are collected and harmonised. Some proposals get merged with other similar proposals. Some get split because the harmonisation process so demands. The goal is to identify proposals of standard that reflect the proponent’s wishes while making sense in terms of specification and use across different environments.

Stage 1 – Use Case (UC): Use Cases are full characterised and description of the work program that will produce the Functional Requirements is developed.

Stage 2 – Functional Requirements (FR): detailed functional requirements of the Use Case are developed.

In the three stages above, MPAI is “open” in the sense that anybody interested may participate. However, if an MPAI Member wants to discuss a confidential proposal, only MPAI members may attend.  From the Commercial Requir­ements stage onward, non-members are not allowed to participate (but they may become members at any time).

Phase 2 – Framework Licence

Stage 3 – Commercial Requirements (CR): in a supply contract the characteristics (Func­tional Requirements) and the conditions (Commercial Requirements) are described. Antitrust laws do not permits that sellers (technology providers) and buyers (standard users) sit together and agree on values such as numbers, percentage or dates. However, sellers (technology providers) may indicate supply conditions, without values. Therefore, the embodiment of the Commercial Requirements, i.e. the Framework Licence, will not contain such details. Only Principal Members who declare they will make technical contributions to the standard can participate in the drafting of the Framework Licence.

Phase 3 – Standard Development

Stage 4 – Call for Technologies (CT): Once both Requirements are available, MPAI is in a position to draft the CfT. Anybody may respond to a CfT. However, if one of their proposed technologies from a non-member is accepted, the responder must join MPAI.

Stage 5 – Standard Development (SD): the Development Committee in charge reviews the responses and develops the standards.

Phase 4 – Standard approval

Stage 6 – MPAI Standard (MS): Only Principal Members may vote to approve the standard, hence trigger its publication. Associate Members, however, may become Principal Members at any time.

For each standard project, transition to each of the 7 stages of Figure 1 must  approved by a resolution of the General Assembly.

Pillar #2 & #3 – AI Modules and Framework

MPAI makes assumptions about the internal structure of an AI system to provide levels of guarantee about the “ethical performance” of an AI system that implements an MPAI standard.

  1. An implemention of an MPAI-specified Use Case is subdivided into functional components called AI Modules (AIM) that use Artificial Intelligence (AI) or Machine Learning (ML) or traditi­onal Data Processing (DP) or a combination of these and are implemented in software or hardware or mixed hardware and software.
  2. An AI system implementing a Use Case is an aggregation of AIMs, specified by the Use Case, in a topology specified in the standard, interconnec­ted as topology specified and executed inside an AI Frame­work (AIF).

The 2 basic elements of the MPAI standardisation are represented in Figure 42 e 9and Figure 3.

Figure 49 – The MPAI AI Module (AIM) Figure 50 – The MPAI AI Framework (AIF)

Figure 2 depicts a video coming from a camera shooting a human face. The function of the AIM (green block) is to detect the emotion on the face and the meaning of the sentence the human is uttering. The AIM can be implemented with a neural network or with DP technologies. In the latter case, the AIM accesses a knowledge base external to the AIM.

The input data enter the Execution area of the AIF (Figure 3) where the work­flow is executed under the supervision of Management and Control. AIMs communicate via the AIF’s Commun­ication and Storage infrastructure and may access static or slowly changing data sources (e.g., those of Figure 2) called Access. The result of the execution of the workflow is provided as output data.

Pillar #4 – IPR Guidelines

Seventy years ago, when ISO was established, it made sense to ask a participant in the standardisation process to make a declaration of availability to licence their technology at fair and reasonable terms and non discriminatory conditions (FRAND). Any IP item was typically held by one company. Actually that company most likely was approaching ISO because it had already products on the market, was already licensing its technology and just wanted ISO to ratify the status quo.

Forty years later, when MPEG started releasing its standards, the situation was entirely different. Each participant had IP but most participants were interested in using the standard.

Another 20 years later, the situation had changed beyond recognition. Each participant had IP, but most participant were not interested in using the standard, only in monetising their IP.

MPAI fully endorses IP as the engine of progress but cautions against standards released with FRAND promises. MPAI has not washed its hands of the IP issue, but has developed the notion of framework Licence. This is the IPR holders’ business model adopeted to monetise their IP in a standard without values: $, %, dates etc. This is the practice:

  • Before the standard is developed: Active members develop & adopt the Framework Licence
  • While the standard is developed: Members declare to make available their licences according to the Framework Licence after the standard is approved for any submissions they make.
  • After the standard is developed: All members declare they will get a licence for other members’ IPRs, if used, within 1 year after publication of IPR holders’ licensing terms.

Non members must get a licence from IPR holders to use an MPAI standard.

You can see an example of actual Framework Licence.

Pillar #5 – Ethical AI

MPAI is not alone in being aware of the impact AI will have on humans and society. Instead of just raising concerns about bias in AI, MPAI intends to offer practical solutions.

As an example, MPAI’s AIM-AIF approach is already capable to enhance explainability of MPAI standards’ implementations. By Explainability we mean “the ability to trace the output of an AI system back to the inputs that have produced it”.

The solution MPAI is working on will provide the means to test the Performance of MPAI standard implementations. An element of the solution is an identification system that helps users check whether an AI system is a tested implementation of an MPAI standard.

Who benefits from the MPAI approach?

  1. Component providers can offer conforming AIMs to an open competitive market and test or have them tested for performance.
  2. Application developers wishing to develop complex components may have some but not all AIMs they need. However, they can find the AIMs they need on the open competitive market.
  3. Consumers have a wider choice of better AI applications from competing application developers. Theu can discriminate from generic AI systems from AI systems that implement AI standards,
  4. Innovation will be fuelled by the need for novel/more performing AIMs to face competition.
  5. Society can lift the veil of opacity from large and monolithic AI systems.

MPAI has been in operation since the 30th of September 2020. A few months after, it is developing, 4 standards and the Functional Requirements for 3 standards, and is honing 2 Use Cases. Figure 4 depcts the situation.

Figure 51 – Snapshot of the MPAI work plan


MPAI starts development of AI-based company performance prediction standard

Geneva, Switzerland – 12 May 2021. At its 8th General Assembly, the international, unaffiliated Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) standards association has received substantial proposals in response to its Call for Technologies on AI-based Company Performance Prediction Use Case. Meanwhile the development of its foundational AI Framework standard is steadily progressing and the technical review of responses to the Context-based Audio Enhancement (MPAI-CAE) and Multimodal Conversation (MPAI-MMC) Calls for Technologies has been completed.

The goal of the AI Framework standard, nicknamed MPAI-AIF, is to enable creation and autom­ation of mixed Machine Learning (ML) – Artificial Intelligence (AI) – Data Processing (DP) – inference workflows, implemented as software, hardware, or mixed software and hardware. A major MPAI-AIF feature is enhanced explainability of MPAI standard applications.

Development of two new standards has started after completing the technical review of responses to the Calls for Technologies. Context-based Audio Enhancement (MPAI-CAE) covers four instances: adding a desired emotion to a speech without emotion, preserving old audio tapes, improving the audioconference experience and removing unwanted sounds while keeping the relevant ones to a user walking in the street. and Multimodal Conversation (MPAI-MMC) covers three instances: audio-visual conversation with a machine impersonated by a synthesised voice and an animated face, request for information about a displayed object, translation of a sentence using a synthetic voice that preserves the speech features of the human.

Substantial proposals received in response to the MPAI-CUI Call for Technologies has allowed starting the work on a fourth standard, AI-based Company Performance Prediction, part of the Compression and Understanding of Industrial Data standard. The standard will enable prediction of performance, e.g., organisati­onal adequacy or default probability, by extrac­ting information from governance, financial and risk data of a given company.

The MPAI web site provides information about other AI-based standards being developed: AI-Enhanced Video Coding (MPAI-EVC) will improve the performance of existing video codecs using AI, Server-based Predictive Multiplayer Gaming (MPAI-SPG) will compensates the loss of data and detect false data in online multiplayer gaming and Integrative Genomic/Sensor Analysis (MPAI-GSA) will compres­s and understand data from combined genomic and other experiments produced by related dev­ices/sensors.

MPAI develops data coding standards for applications that have AI as the core enabling technology. Any legal entity who supports the MPAI mission may join MPAI if it is able to contribute to the development of standards for the efficient use of data.

Visit the MPAI web site and contact the MPAI secretariat for specific information.


MPAI consolidates the development of three AI-based data coding standards

Geneva, Switzerland – 14 April 2021. At its 7th General Assembly, the international, unaffiliated Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) standards association has received substantial proposals in response to its two Calls for Technologies on Enhanced Audio and Multimodal Conversation that closed on the 12th of April. Meanwhile the development of its foundational AI Framework standard is steadily progressing targeting July 2021 for delivery of the standard.

The goal of the the AI Framework standard, nicknamed MPAI-AIF, is to enable creation and automation of mixed Machine Learning (ML) – Artificial Intelligence (AI) – Data Processing (DP) – inference workflows, implemented as software, hardware, or mixed software and hardware. A major MPAI-AIF feature is enhanced explainability to applications conforming to MPAI standards.

Work on the two new Context-based Audio Enhancement (MPAI-CAE) and Multimodal Conver­sation (MPAI-MMC) standards has started after receiving substantial technologies in response to the Calls for Technologies. MPAI-CAE covers four instances: adding a desired emotion to a speech without emotion, preserving old audio tapes, improving the audioconference experience and removing unwanted sounds while keeping the relevant ones to a user walking in the street. MPAI-MMC covers three instances: audio-visual conversation with a machine impersonated by a synthesised voice and an animated face, request for information about a displayed object, trans­lation of a sentence using a synthetic voice that preserves the speech features of the human.

Work on a fourth standard is scheduled to start at the next General Assembly (12th of May) after receiving responses – both from MPAI and non-MPAI members – to the currently open MPAI-CUI Call for Technologies. The standard will enable prediction of performance, e.g., organisati­onal adequacy or default probability, using Artificial Intelligence (AI)-based filtering and extrac­tion of information from a company’s governance, financial and risk data.

The MPAI web site provides information about other AI-based standards being developed: AI-Enhanced Video Coding (MPAI-EVC) that improves the performance of existing video codecs, Server-based Predictive Multiplayer Gaming (MPAI-SPG) that compensates the loss of data in online multiplayer gaming and Integrative Genomic/Sensor Analysis (MPAI-GSA) that compres­ses and understands data from combined genomic and other experiments produced by related dev­ices/sensors.

MPAI develops data coding standards for applications that have AI as the core enabling technology. Any legal entity who supports the MPAI mission may join MPAI if it is able to contribute to the development of standards for the efficient use of data.

Visit the MPAI web site and contact the MPAI secretariat for specific information.


Why the MPAI way is the only way

Research, business and society are gradually coming to realise that Artificial Intelligence (AI) is not just another generation of data processing technologies. It is a powerful set of pervasive technologies impacting the way individuals and society have behaved creating a set of customs, rules and laws governing their life and organisation.

The article Artificial Intelligence Beyond Deep Neural Networks by Naga Rayapati, Forbes Councils Member is enlightening of the level of awareness achieved:

Neural networks act as black boxes and are often not well-suited for applications that require explainability. Areas like employment, lending, education, health care and household assistants require explainability. In finance, machines predicting price changes is a significant win for companies, but without the explainability factor, it may be hard to convince regulators that it has not violated any regulations. Similarly, in transactions involving trust, such as credit card applications, one has to explain the reason for approval or rejection. In business applications, building the trust of a customer is critical, and decisions need to be explainable.

There is no better introduction to the standard that MPAI has set out to develop: Compression and Understanding of Industrial Data (MPAI-CUI) than this quote.

The standard intends to enable prediction of a company performance by filtering and extracting information from its governance, financial and risk data. Artificial Intelligence is a candidate technology to achieve that, but MPAI is open to other technologies. We live in a transition age and, while there is no doubt about the ultimate supremacy of AI technologies, at this point in time some traditional technologies may perform very well.

these are some of the needs MPAI-CUI can cover:

  1. Assess and monitor a company’s financial and organisational performance, as well as the impact of vertical risks (e.g., cyber, seismic, etc.).
  2. Identify clues to a crisis or bankruptcy years in advance.
  3. Support financial institutions when deciding on a loan to a troubled company.

Referencing the article again, MPAI is not looking for a black box that uses neural networks. MPAI seeks technologies fitting an architecture that can be used without having to convince regulators that their regulations have not been violated. That this os possible can be seen from the MPAI-CUI architecture:

These are some of the technologies that MPAI has identified in the MPAI-CUI Use Cases and Functional Requirements document and requested in the MPAI-CUI Call for Technologies document:

Data Conversion Gathers data needed for the assessment from internal and external) sources, in different formats and covert it to a unique format (e.g., json).
Financial assessment Analyses company data (i.e., financial statements) to assess the preliminary financial performances in the form of indexes.

Builds and extracts the financial features for the Decision tree and Prediction AIMs.

Governance assessment Builds and extracts the features related to the adequacy of the governance asset for the Decision tree and Prediction AIMs.
Risk matrix Builds the risk matrix to assess the impact of vertical risks (i.e., in this Use Case cyber and seismic).
Decision Creates the decision trees for making decisions.
Prediction Predicts company default probability within 36 months and of the adequacy of the organizational model.
Perturbation Perturbs company crisis probability computed by Prediction, considering vertical risks impact on company performance.

Interested? Join the Zoom conference on 2021/03/31T15:00UTC. you will know about

  1. MPAI’s approach to standardisation
  2. Presentation of Use Case and technologies requested
  3. How to submit a proposal
  4. The MPAI-CUI Framework Licence

MPAI: where it is, where it is going

Some 50 days after having been announced, MPAI was established as a not-for-profit organisation with the mission to develop data coding standards with an associated mechanism designed to facilitate the creation of licences.

Some 150 days have passed since its establishment. Where is MPAI in its journey to accomplish its missions?

Creating an organisation that would execute the mission in 50 days was an achievement, but the next goal of giving the organisation the means to accomplish its mission was difficult but was successfully achieved.

MPAI has defined five pillars on which the organisation rests.

Pillar #1: The standard development process.

MPAI is an open organisation not in words, but in practice.

Anybody can bring proposals – Interest Collection – and help merge their proposal with others into a Use Case. All can participate in the development of ed Functional Requirements. Once the functional requirements are defined, MPAI Principal Members develop the Commercial Requirements, all MPAI Members develop the Call for Technologies, review the submissions and start the Standard Development. Finally, MPAI Principal Members approve the MPAI standard.

The progression of a proposal from a stage to the next is approved by the General Assembly.

Pillar #2: AI Modules.

This pillar is technical in nature, but has far reaching implications. MPAI defines basic units called AI Modules (AIM) that perform a significant task and develops standards for them. The AIM in the figure processes the input video signal (a human face) to provide the emotion expressed by the face, the meaning (question, affirmation etc.) of what the human is saying.

MPAI confines the scope of standardisation to the format of input and output data of an AIM. It is silent on the inside of the AIM (the green box) which can use ML, AI or data processing technologies and can be implemented in hardware or software. In the figure the Emotion and Meaning Knowledge Base is required when the AIM is implemented with legacy technologies.

Pillar #3: AI Framework

It is clear that, just by itself, AIMs will have limited use. Practical applications are more complex and require more technologies. If each of these technologies are implemented as AIMs, how can they be connected and executed?

The figure depicts hiw the MPAI AI Framework (AIF) solves the problem. The MPAI AI Framework has the function of managing the life cycle of the individual AIMs, and of creating and executing the workflows. The Communication and Storage functionality allows possibly distributed AIMs to implement different forms of communication.

Pillar #4: Framework Licence

The process of Pillar #1 mentions the development of “Commercial Requirements”. Actually, MPAI Principal Members develop a specific document called Framework Licence, the patent holders’ business model to monetise their patent in a standard without values: dollars, percentage, dates etc. It is developed and adopted by Active Principal Members and is attached to the Call for Technologies. All submissions must contain a statement that the submitter agrees to licence their patents according to the framework licence.

MPAI Members have already developed 4 Framework Licences. Some of the interesting elements of them are:

  1. The License will be free of charge to the extent it is only used to evaluate or demo solutions or for technical trials.
  2. The License may be granted free of charge for particular uses if so decided by the licensors.
  3. A preference will be expressed on the entity that should administer the patent pool of patent holders.

Pillar #5: Conformance

Guarantee of a good performance of an MPAI standard implementation is a necessity for a user. From an implementer’s viewpoint, a measurable performance is a desirable characteristic because users seek assurance about performance before buying or using an implementation.

Testing an MPAI Implementations for conformance means to make available the tools, procedures, data sets etc. specific of an AIMs and of a complete MPAI Implementation. MPAI will not perform tests an MPAI Implementation for conforman­ce, but only provide testing tools that enable a third party to test the conformance of an AIM and of the set of AIMs that make up a Use Case.

The MPAI standards

MPAI has issued

  1. A call for technologies to enable the development of the AI Framework (MPAI-AIF) standard. Responses we received on the 17th of February and MPAI is busy developing the standard.
  2. Two calls for the Context-based Audio Enhancement (MPAI-CAE) and Multimodal Conversation (MPAI-MMC) standards. Responses are due on the 12th of April.
  3. A call for the Compression and Understanding of Industrial Data (MPAI-CUI) standard. Responses are due on the 10th of May.

Calls for technologies for a total of 9 Use Cases have been issued. One is being developed. Th Use cases described above are depicted in the figure below.

MPAI continues the development of Functional Requirements for the Server-based Multiplayer Gaming (MPAI-SPG) standard, the Integrated Genome/Sensor Analysis (MPAI-GSA) standard and the AI-based Enhanced Video Coding (MPAI-EVC).

In the next months will be busy producing its first standards. The first one – MPAI-AIF – is planned to be released on the 19th of July.


MPAI tackles AI-based risk analysis standard in a new Call for Technologies

Geneva, Switzerland – 17 March 2021. At its 6th General Assembly, the international, unaffiliated Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) standards association has promoted its 4th standard project Compression and understanding of industrial data (MPAI-CUI) to the Call for Technologies stage. The standard aims to enable Artificial Intellig­ence (AI)-based filtering and extraction of information from a company’s governance, financial and risk data enabling prediction of company performance, e.g., organisational adequacy or default probability.

All parties, including non-MPAI members, who believe they have relevant technologies satisfying all or most of the MPAI-CUI Functional Requirements are invited to submit proposals for consid­eration by MPAI. The MPAI-CUI Call for Technologies requests that technologies proposed, if accepted for inclusion in the standard, be released according to the MPAI-CUI Framework Licence to facilitate eventual definition of the final licence by patent holders.

The content of the Call for Technologies will be introduced at two online conferences. Interested parties are welcome to attend.

MPAI is continuing the development of its AI Framework standard, nicknamed MPAI-AIF. The goal of the standard is to enable creation and automation of mixed Mach­ine Learning (ML) – Artificial Intelligence (AI) – Data Proces­sing (DP) and inference workflows, implemented as soft­ware, hardware, or mixed software and hardware. A major MPAI-AIF feature is to offer enhanced explainability to applications conforming to MPAI standards. MPAI retains its intention to release the standard in July 2021.

At its previous General Assembly (MPAI-5), MPAI has issued two Calls for Technologies supporting two new standards:

  1. Context-based Audio Enhancement (MPAI-CAE) covering four instances: adding a desired emotion to a speech without emotion, preserving old audio tapes, improving the audiocon­ference experience and removing unwanted sounds while keeping the relevant ones to a user walking in the street.
  2. The Multimodal Conversation (MPAI-MMC) covering three instances: an audio-visual conversation with a machine impersonated by a synthesised voice and an animated face, a request for information about an object while displaying it, a human sentence translated using a synthetic voice that preserves the human speech features.

The MPAI web site provides information about the other AI-based standards being developed: AI-Enhanced Video Coding (MPAI-EVC) will improve the performance of existing video codecs, Integrative Genomic/ Sensor Analysis (MPAI-GSA) will compress and understand the res­ults of combining genomic experiments with those produced by related devices/sensors, and Server-based Predictive Multiplayer Gaming (MPAI-SPG) will improve the user experience of online multiplayer games.

MPAI develops data coding standards for applications that have AI as the core enabling technology. Any legal entity who supports the MPAI mission may join MPAI if it is able to contribute to the development of standards for the efficient use of data.

Visit the MPAI web site and contact the MPAI secretariat for specific information.


New age – new video coding technologies

The last 30 years of video coding have been productive because, over the years, compression rate has improved enabling digital video to take over all existing and extend to new video applications. The coding algorithm has been tweaked over and over, but it is still based on the same original scheme.

In the last decade, the “social contract” that allowed inventors to innovate, innovation to be brought into standards, standards to be used and inventors to be remunerated has stalled. The HEVC standard is used but the entire IP landscape is clouded by uncertainty.

The EVC standard, approved in 2020, 7 years after the HEVC was approved, has shown that even with an inflow of technologies from a reduced number of sources can provide outstanding results, as shown in the figure:

The EVC baseline, a profile that uses 20+ years old technologies, reaches the performance of EVC. The main profile offers a bitrate reduction of 39% over HEVC, a whisker away from the performance of the lasted VVC standard.

In 1997 the match between IBM Deep Blue and the (human) chess champion of the time made headlines: IBM Deep Blue beat Garry Kasparov. It was easy to herald the age when machines will overtake human non just in keeping accounts, but also in one of the noblest intellectual activities: chess.

This was achieved by writing a computer program that explored more alternatives that a human could reasonably do, although a human’s intuition can look far into the future. In that sense, Deep Blue operated much like MPEG video coding.

Google DeepMind’s AlphaGo did the same in 2015 by beating the Go champion Sedol Lee. The Go rules are simpler than chess rules, but the alternatives in Go are way more numerous. There is only one type of piece (the stone) onstead of six (king, queen, bishop, knight, rook and pawn), but the Go board has 19×19 boxes instead of 8×8 of chess. While DeepBlue made a chess move by brute-force exploring future moves, AlphaGo made go moves relying on neural networks which had learned moves.

That victory signalled a renewed interest in a 3/4 of a century old technology – neural networks.

In neural networks data are processed by different layers that extract essential information until a compressed representation of the input data is achieved (left-hand side). At the decoder, the inverse process takes place.

MPAI has established the AI-Enhanced Video Coding (MPAI-EVC) standard project. This is based on an MPAI study collecting published results where individual HEVC coding tools have been replaced by neural networks (in the following we call then AI tools). By summing up all the published gains and improvement of 29% is obtained.

This is an interesting, but far from being a scientifically acceptable result because the different tools used were differently trained. Therefore, MPAI is currently engaged in the MPAI-EVC Evidence Project that can be exemplified by the following figure:

Here all coding tools have been replaced by AI tools. We intend to train these new tools with the same source material (a lot of it) and assess the improvement obtained.

We expect to obtain an objectively measured improvement of at least 25%.

After this MPAI will engage in the actual development of MPAI-EVC. We expect to obtain an objectively measured improvement of at least 35%. Our experience suggests that the subjectively measured improvement will be around 50%.

Like in Deep Blue, old tools had a priori statistical knowledge is modelled and hardwired in the tools, but in AI, knowledge is acquired by learning the statistics.

 

This is the reason why AI tools are more promising than traditional data processing tools.

For a new age you need new tools and a new organisation tuned to use those new tools.

 

 


MPAI receives technologies for its AI framework standard and calls for technologies supporting audio and human-machine conversation

Geneva, Switzerland – 17 february 2021. At its 5th General Assembly, Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI), an international, unaffiliated standards association

  1. Has kicked off work on its AI Framework (MPAI-AIF) standard after receiving substantial proposed technologies.
  2. Is calling for technologies to develop two standards related to audio (MPAI-CAE) and multimodal conversation (MPAI-MMC).
  3. Will soon be developing the Framework Licence for the next maturing project “Compression and Understanding of Industrial Data” (MPAI-CUI).

MPAI has reviewed responses to the call issued 2 months ago for tech­nologies supporting its AI Framework (MPAI-AIF) standard. The goal is to enable creation and automation of mixed Mach­ine Learning (ML) – Artificial Intelligence (AI) – Data Proces­sing (DP) and inference workflows, implemented as software, hardware or mixed software and hardware. MPAI-AIF will offer extended explainability to applications conforming to MPAI standards. The submissions received are enabling MPAI to develop the inten­ded standard whose publication is planned for July 2021.

MPAI has issued two Calls for Technologies supporting two new standards:

  1. The Context-based Audio Enhancement (MPAI-CAE) standard will improve the user exper­ien­ce for several audio-related applications in a variety of contexts such as in the home, in the car, on-the-go, in the studio etc. Examples of use are adding a desired emotion to a speech without emotion, preserving old audio tapes, improving the audioconference experience and removing unwanted sounds while keeping the relevant ones to a user walking in the street.
  2. The Multimodal Conversation (MPAI-MMC) standard will enable a human-mach­ine conver­sation that emulates human-human conversation in completeness and intensity by using AI. Examples of use are an audio-visual conversation with a machine where the machine is imper­sonated by a synthesised voice and an animated face, a request for information about an object while displaying it, a human question to a machine translated using a voice preserving the speech features of the human.

The content of the two Calls for Technologies will be introduced at two online conferences. Attendance is open to interested parties.

MPAI has developed the functional requirements for the Compression and Understanding of In­dustrial Data (MPAI-CUI) and, by decision of the General Assembly, MPAI Active Members may now develop the Framework Licence to facilitate the actual licences – to be developed outside of MPAI. The standard will be used to assess the risks faced by a company by using information from the flow of data produced.

The MPAI web site provides information about other AI-based standards being developed: AI-Enhanced Video Coding (MPAI-EVC) will improve the performance of existing video codecs, Integrative Genomic/Sensor Analysis (MPAI-GSA) will compress and understand the res­ults of combining genomic experiments with those produced by related devices/sensors, and Server-based Predictive Multiplayer Gaming (MPAI-SPG) will improve the user experience of online multiplayer game players.

MPAI develops data coding standards for applications that have AI as the core enabling technology. Any legal entity who supports the MPAI mission may join MPAI if it is able to contribute to the development of standards for the efficient use of data.

Visit the MPAI web site and contact the MPAI secretariat for specific information.

 


Having a conversation with a machine

Like it or not, we do what the title says regularly, with mixed results. Searching for that file in your computer should be easy but it is often not, finding that an email is a hassle, especially when retrieving it is so important, and talking with an information service is often challenge to your nervous system.

I have no intention to criticise the – difficult – work that others have done, but the results of human-machine conversation are far from satisfactory.

Artificial Intelligence promises to do a better job.

Since a few months, MPAI has started working on an area called Multimodal Conversation (MMC). The intention is to use a plurality of communication means to improve the ability of humans to talk to machines. Currently, the area includes 3 Use Cases:

  1. Conversation with Emotion (CWE)
  2. Multimodal Question Answering (MQA)
  3. Personalized Automatic Speech Translation (PST).

In this article I would like to present the first, CWE.

The driver of this use case is that, to improve the ability of a machine to have a conversation with a human, it is important for the machine to have information about the words that are used by the human but also on other entities such as the emotional state of the human. If a human is asking a question about the quality of a telephone line, it is important for the machine to know if the emotion of the speaker is neutral – the question is likely purely informational – or altered – the question likely implies a dissatisfaction with the telephone service.

In CWE the machine uses speech, text from a keyboard and video to make the best assessment about the conversation partner’s emotional state. This is shown by Figure 1 where you can see that three blocks are dedicated to extracting emotion in 3 modes. The 3 estimated emotions are feed to another block which is tasked to make a final decision.

Figure 1 – Emotion extraction from text, speech, and video

The first three blocks starting from the left-hand side can be implemented as neural networks. But MPAI does not wish to disenfranchise those who have invested for years in traditional data processing solutions and have produced state-of-the-art technologies. MPAI seeks to define standards that are, as far as possible, technology independent. Therefore, in a legacy context Figure 1 morphes to Figure 2

Figure 2 – Emotion extraction using legacy technologies

 In the system of Figure 2, each of the 3 initial blocks extracts features from the input data and uses a vector of features to query an appropriate Knowledge Base that responds with one or more candidate emotions.

Actually, Video analysis and Language understanding do more than just providing emotion information. This is seen in the following Figure 3 where the two blocks additionally provide Meaning, i.e., information extracted from the text and video such as question, statement, exclamation, expression of doubt, request, invitation etc.

 

Figure 3 – Emotion and meaning enter Dialogue processing

 

Meaning and emotion are fed into the Dialogue processing component. Note that in a legacy implementation Dialogue processing, too, needs access to a Dialogue Knowledge Base. From now on, however, we will assume to deal with a full AI-based implementation.

Dialogue processing produces two streams of data, as depicted in Figure 4. The result is composed by:

  1. a data stream to drive speech synthesis expressed either as “Text with emotion” and “Concept with emotion”.
  2. a data stream to drive face animation in tune with the speech.

Figure 4 – End-to-end multimodal conversation with emotion

 

The last element, to move from theory to practice, is that you need an environment where you can place the blocks (that MPAI calls AI Modules – AIM) establish all connections, activate all timings, and execute the chain. One could even want to train or retrain the individual neural networks.

The technology that makes this possible by the MPAI AI Framework (MPAI-AIF) for which a Call for Technologies has been published on 2020/12/16 and whose responses are due on 2021/02/15. The full scheme of Multimodal conversation with emotion in the MPAI AI Framework is represented by Figure 5

Figure 5 – Multimodal conversation with emotion in the MPAI AI Framework

The six components of MPAI-AIF are:

  • Management and Control, in charge of the workflow
  • Execution, the environment where the workflow is executed.
  • AI Modules (AIM), the basic blocks of the system
  • Communication, to handle both internal and external communication (e.g., Dialogue processing could be located on the cloud)
  • Storage, for exchanging data between AIMs.
  • Access, to access static or slowly changing external data.

What does MPAI intend to standardise in Conversation with emotion? In a full AI-based implementation

  1. The format of the Text leaving Speech recognition.
  2. Representation of Emotion
  3. Representation of Meaning
  4. Format of Reply: Text with Emotion or Concept with Emotion
  5. Format of Video anmation

In case of a legacy implementation, in addition to the above we need:

  1. Emotion KB (video) query format with Video features
  2. Emotion KB (speech) query format with Speech features
  3. Emotion KB (text) query format with Text features
  4. Dialoue KB query format

As you see MPAI standardisation is minimal, in tune with the requirement of standardisation to specify the minimum that is necessary for interoperability. In the MPAI case the minimum is what is required to assemble a working system using AIMs from independent sources.


Opacity or transparency?

By publishing its Manifesto, MPAI has made clear in a few sentences its strategic analysis of the AI industry, what will be its action points and why that will provide benefits.

#1 Applications using AI are extending in scope and performance. the AI industry is one of the fastest growing industries.

#2 The industry is not developing as fast as it could because there are hurdles. The first is the fact that the AI application development model is based on frameworks that tend to create the well-known walled garden effect. Importing applications is easy but exporting applications is difficult. We are missing the seamless interactions that could propel the industry to new heights.

The second hurdle is the fact that most AI applications are monolithic and opaque. If these two adjectives apply to an application that changes a punctured tyre, you may not care, but if it is an application that selects relevant news, I do care.

#3 MPAI believes that AI interoperability standards will have the same beneficial effects that digital media standard had on the media industry starting 30 years ago and continuing today. AI and media may very well be different beasts, but the MPAI notion of coding as the transformation of data from one representation to an equivalent one more suited to a specific application shows that there are more commonalities than differences.

#4 The video and audio coding and decoding chips – minuscule entities compared to the size of digital media services they enabled – are mirrored by the MPAI notion of AI module. As much as the digital media standards defined the syntax and semantics of the data coming into the decoder, the AI modules are defined by the syntax and semantics of the data coming into the AI Module (AIM). However, because an AIM is typically connected to other AIMs, the syntax, and semantics of the output data of an AIM are also standardised. As much as the “digital media decoder” became a basic component available on the open market, MPAI expects that the “AI Modules” will become basic components available on the open market.

#5 The fact that AIMs are meant to be connected in a variety of topologies shows that MPAI has an additional problem to solve. The MPAI “AI Framework” (AIF), for which MPAI has issued a Call for Technologies due 15 February 2021 will enable creation, execution, composition, and update of AIM-based workflows.

#6 The elements described above show that MPAI standards will benefit all actors involved.

  • Technology providers will be able to offer their conforming AIMs with different technologies – AI, ML, legacy data processing – and different levels of performance to an open market.
  • Application developers will find on the open market the AIMs needed by their applications and will thus be able to develop more ambitious applications that they could otherwise develop.
  • The fact that there will be a race in a level play field among providers of standard AIM will fuel innovation.
  • Consumers will have a wider choice of better AI applications from a competitive market of application providers who will be able to draw state-of-the-art technologies (AIMs) from the open market.
  • Society will be able to lift the veil of opacity from large, monolithic AI-based applications because atomic AIMs will tell a lot of what is the logic that runs the application.

#7 The very fact that so much investment is being made on AI by Academia and Industry – e.g., representation learning, transfer learning, edge AI, and reproducibility of performance today and more in the future – means that innovations will characterise AI on which MPAI standards are based for the years to come.

#8 One of the founding drivers of MPAI is the realisation that Fair, Reasonable and Non-Discriminatory (FRAND) declarations are no longer a match to today’s technology and business complexity. Society is deprived of the use of valuable technologies and IP holders are deprived of a fair remuneration of their investment. By setting in advance IP holders’ IPR guidelines MPAI’s Framework Licences will facilitate users of MPAI standards. A first instance of an MPAI Framework Licence for MPAI-AIF has already been developed.

#9 MPAI is proud of being a technical body, but even more proud of being aware of the revolutionary impact AI will have on the future of human society and that technology and society should not be antagonists in addressing the revolution. MPAI will create opportunities for its standards developers to interact with high-level thinkers. Instead of addressing expected problems from AI in abstract terms, MPAI will propose actual cases and seek advice on the impact its standards will have on society.

#10 This is not wishful thinking. Cutting monolithic AI applications in smaller pieces is not a dream but a concrete reality underpinned by the MPAI AI Framework and the MPAI AI Modules. It is debatable whether an AIM/AIF-based solution will be more efficient than a monolithic AI solution. It is not debatable that an AIM/AIF-based solution will be less onerous to design and make and that it will be more transparent to users.