Moving Picture, Audio and Data Coding
by Artificial Intelligence

MPAI consolidates the development of three AI-based data coding standards

Geneva, Switzerland – 14 April 2021. At its 7th General Assembly, the international, unaffiliated Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) standards association has received substantial proposals in response to its two Calls for Technologies on Enhanced Audio and Multimodal Conversation that closed on the 12th of April. Meanwhile the development of its foundational AI Framework standard is steadily progressing targeting July 2021 for delivery of the standard.

The goal of the the AI Framework standard, nicknamed MPAI-AIF, is to enable creation and automation of mixed Machine Learning (ML) – Artificial Intelligence (AI) – Data Processing (DP) – inference workflows, implemented as software, hardware, or mixed software and hardware. A major MPAI-AIF feature is enhanced explainability to applications conforming to MPAI standards.

Work on the two new Context-based Audio Enhancement (MPAI-CAE) and Multimodal Conver­sation (MPAI-MMC) standards has started after receiving substantial technologies in response to the Calls for Technologies. MPAI-CAE covers four instances: adding a desired emotion to a speech without emotion, preserving old audio tapes, improving the audioconference experience and removing unwanted sounds while keeping the relevant ones to a user walking in the street. MPAI-MMC covers three instances: audio-visual conversation with a machine impersonated by a synthesised voice and an animated face, request for information about a displayed object, trans­lation of a sentence using a synthetic voice that preserves the speech features of the human.

Work on a fourth standard is scheduled to start at the next General Assembly (12th of May) after receiving responses – both from MPAI and non-MPAI members – to the currently open MPAI-CUI Call for Technologies. The standard will enable prediction of performance, e.g., organisati­onal adequacy or default probability, using Artificial Intelligence (AI)-based filtering and extrac­tion of information from a company’s governance, financial and risk data.

The MPAI web site provides information about other AI-based standards being developed: AI-Enhanced Video Coding (MPAI-EVC) that improves the performance of existing video codecs, Server-based Predictive Multiplayer Gaming (MPAI-SPG) that compensates the loss of data in online multiplayer gaming and Integrative Genomic/Sensor Analysis (MPAI-GSA) that compres­ses and understands data from combined genomic and other experiments produced by related dev­ices/sensors.

MPAI develops data coding standards for applications that have AI as the core enabling technology. Any legal entity who supports the MPAI mission may join MPAI if it is able to contribute to the development of standards for the efficient use of data.

Visit the MPAI web site and contact the MPAI secretariat for specific information.


Why the MPAI way is the only way

Research, business and society are gradually coming to realise that Artificial Intelligence (AI) is not just another generation of data processing technologies. It is a powerful set of pervasive technologies impacting the way individuals and society have behaved creating a set of customs, rules and laws governing their life and organisation.

The article Artificial Intelligence Beyond Deep Neural Networks by Naga Rayapati, Forbes Councils Member is enlightening of the level of awareness achieved:

Neural networks act as black boxes and are often not well-suited for applications that require explainability. Areas like employment, lending, education, health care and household assistants require explainability. In finance, machines predicting price changes is a significant win for companies, but without the explainability factor, it may be hard to convince regulators that it has not violated any regulations. Similarly, in transactions involving trust, such as credit card applications, one has to explain the reason for approval or rejection. In business applications, building the trust of a customer is critical, and decisions need to be explainable.

There is no better introduction to the standard that MPAI has set out to develop: Compression and Understanding of Industrial Data (MPAI-CUI) than this quote.

The standard intends to enable prediction of a company performance by filtering and extracting information from its governance, financial and risk data. Artificial Intelligence is a candidate technology to achieve that, but MPAI is open to other technologies. We live in a transition age and, while there is no doubt about the ultimate supremacy of AI technologies, at this point in time some traditional technologies may perform very well.

these are some of the needs MPAI-CUI can cover:

  1. Assess and monitor a company’s financial and organisational performance, as well as the impact of vertical risks (e.g., cyber, seismic, etc.).
  2. Identify clues to a crisis or bankruptcy years in advance.
  3. Support financial institutions when deciding on a loan to a troubled company.

Referencing the article again, MPAI is not looking for a black box that uses neural networks. MPAI seeks technologies fitting an architecture that can be used without having to convince regulators that their regulations have not been violated. That this os possible can be seen from the MPAI-CUI architecture:

These are some of the technologies that MPAI has identified in the MPAI-CUI Use Cases and Functional Requirements document and requested in the MPAI-CUI Call for Technologies document:

Data Conversion Gathers data needed for the assessment from internal and external) sources, in different formats and covert it to a unique format (e.g., json).
Financial assessment Analyses company data (i.e., financial statements) to assess the preliminary financial performances in the form of indexes.

Builds and extracts the financial features for the Decision tree and Prediction AIMs.

Governance assessment Builds and extracts the features related to the adequacy of the governance asset for the Decision tree and Prediction AIMs.
Risk matrix Builds the risk matrix to assess the impact of vertical risks (i.e., in this Use Case cyber and seismic).
Decision Creates the decision trees for making decisions.
Prediction Predicts company default probability within 36 months and of the adequacy of the organizational model.
Perturbation Perturbs company crisis probability computed by Prediction, considering vertical risks impact on company performance.

Interested? Join the Zoom conference on 2021/03/31T15:00UTC. you will know about

  1. MPAI’s approach to standardisation
  2. Presentation of Use Case and technologies requested
  3. How to submit a proposal
  4. The MPAI-CUI Framework Licence

MPAI: where it is, where it is going

Some 50 days after having been announced, MPAI was established as a not-for-profit organisation with the mission to develop data coding standards with an associated mechanism designed to facilitate the creation of licences.

Some 150 days have passed since its establishment. Where is MPAI in its journey to accomplish its missions?

Creating an organisation that would execute the mission in 50 days was an achievement, but the next goal of giving the organisation the means to accomplish its mission was difficult but was successfully achieved.

MPAI has defined five pillars on which the organisation rests.

Pillar #1: The standard development process.

MPAI is an open organisation not in words, but in practice.

Anybody can bring proposals – Interest Collection – and help merge their proposal with others into a Use Case. All can participate in the development of ed Functional Requirements. Once the functional requirements are defined, MPAI Principal Members develop the Commercial Requirements, all MPAI Members develop the Call for Technologies, review the submissions and start the Standard Development. Finally, MPAI Principal Members approve the MPAI standard.

The progression of a proposal from a stage to the next is approved by the General Assembly.

Pillar #2: AI Modules.

This pillar is technical in nature, but has far reaching implications. MPAI defines basic units called AI Modules (AIM) that perform a significant task and develops standards for them. The AIM in the figure processes the input video signal (a human face) to provide the emotion expressed by the face, the meaning (question, affirmation etc.) of what the human is saying.

MPAI confines the scope of standardisation to the format of input and output data of an AIM. It is silent on the inside of the AIM (the green box) which can use ML, AI or data processing technologies and can be implemented in hardware or software. In the figure the Emotion and Meaning Knowledge Base is required when the AIM is implemented with legacy technologies.

Pillar #3: AI Framework

It is clear that, just by itself, AIMs will have limited use. Practical applications are more complex and require more technologies. If each of these technologies are implemented as AIMs, how can they be connected and executed?

The figure depicts hiw the MPAI AI Framework (AIF) solves the problem. The MPAI AI Framework has the function of managing the life cycle of the individual AIMs, and of creating and executing the workflows. The Communication and Storage functionality allows possibly distributed AIMs to implement different forms of communication.

Pillar #4: Framework Licence

The process of Pillar #1 mentions the development of “Commercial Requirements”. Actually, MPAI Principal Members develop a specific document called Framework Licence, the patent holders’ business model to monetise their patent in a standard without values: dollars, percentage, dates etc. It is developed and adopted by Active Principal Members and is attached to the Call for Technologies. All submissions must contain a statement that the submitter agrees to licence their patents according to the framework licence.

MPAI Members have already developed 4 Framework Licences. Some of the interesting elements of them are:

  1. The License will be free of charge to the extent it is only used to evaluate or demo solutions or for technical trials.
  2. The License may be granted free of charge for particular uses if so decided by the licensors.
  3. A preference will be expressed on the entity that should administer the patent pool of patent holders.

Pillar #5: Conformance

Guarantee of a good performance of an MPAI standard implementation is a necessity for a user. From an implementer’s viewpoint, a measurable performance is a desirable characteristic because users seek assurance about performance before buying or using an implementation.

Testing an MPAI Implementations for conformance means to make available the tools, procedures, data sets etc. specific of an AIMs and of a complete MPAI Implementation. MPAI will not perform tests an MPAI Implementation for conforman­ce, but only provide testing tools that enable a third party to test the conformance of an AIM and of the set of AIMs that make up a Use Case.

The MPAI standards

MPAI has issued

  1. A call for technologies to enable the development of the AI Framework (MPAI-AIF) standard. Responses we received on the 17th of February and MPAI is busy developing the standard.
  2. Two calls for the Context-based Audio Enhancement (MPAI-CAE) and Multimodal Conversation (MPAI-MMC) standards. Responses are due on the 12th of April.
  3. A call for the Compression and Understanding of Industrial Data (MPAI-CUI) standard. Responses are due on the 10th of May.

Calls for technologies for a total of 9 Use Cases have been issued. One is being developed. Th Use cases described above are depicted in the figure below.

MPAI continues the development of Functional Requirements for the Server-based Multiplayer Gaming (MPAI-SPG) standard, the Integrated Genome/Sensor Analysis (MPAI-GSA) standard and the AI-based Enhanced Video Coding (MPAI-EVC).

In the next months will be busy producing its first standards. The first one – MPAI-AIF – is planned to be released on the 19th of July.


MPAI tackles AI-based risk analysis standard in a new Call for Technologies

Geneva, Switzerland – 17 March 2021. At its 6th General Assembly, the international, unaffiliated Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) standards association has promoted its 4th standard project Compression and understanding of industrial data (MPAI-CUI) to the Call for Technologies stage. The standard aims to enable Artificial Intellig­ence (AI)-based filtering and extraction of information from a company’s governance, financial and risk data enabling prediction of company performance, e.g., organisational adequacy or default probability.

All parties, including non-MPAI members, who believe they have relevant technologies satisfying all or most of the MPAI-CUI Functional Requirements are invited to submit proposals for consid­eration by MPAI. The MPAI-CUI Call for Technologies requests that technologies proposed, if accepted for inclusion in the standard, be released according to the MPAI-CUI Framework Licence to facilitate eventual definition of the final licence by patent holders.

The content of the Call for Technologies will be introduced at two online conferences. Interested parties are welcome to attend.

MPAI is continuing the development of its AI Framework standard, nicknamed MPAI-AIF. The goal of the standard is to enable creation and automation of mixed Mach­ine Learning (ML) – Artificial Intelligence (AI) – Data Proces­sing (DP) and inference workflows, implemented as soft­ware, hardware, or mixed software and hardware. A major MPAI-AIF feature is to offer enhanced explainability to applications conforming to MPAI standards. MPAI retains its intention to release the standard in July 2021.

At its previous General Assembly (MPAI-5), MPAI has issued two Calls for Technologies supporting two new standards:

  1. Context-based Audio Enhancement (MPAI-CAE) covering four instances: adding a desired emotion to a speech without emotion, preserving old audio tapes, improving the audiocon­ference experience and removing unwanted sounds while keeping the relevant ones to a user walking in the street.
  2. The Multimodal Conversation (MPAI-MMC) covering three instances: an audio-visual conversation with a machine impersonated by a synthesised voice and an animated face, a request for information about an object while displaying it, a human sentence translated using a synthetic voice that preserves the human speech features.

The MPAI web site provides information about the other AI-based standards being developed: AI-Enhanced Video Coding (MPAI-EVC) will improve the performance of existing video codecs, Integrative Genomic/ Sensor Analysis (MPAI-GSA) will compress and understand the res­ults of combining genomic experiments with those produced by related devices/sensors, and Server-based Predictive Multiplayer Gaming (MPAI-SPG) will improve the user experience of online multiplayer games.

MPAI develops data coding standards for applications that have AI as the core enabling technology. Any legal entity who supports the MPAI mission may join MPAI if it is able to contribute to the development of standards for the efficient use of data.

Visit the MPAI web site and contact the MPAI secretariat for specific information.


New age – new video coding technologies

The last 30 years of video coding have been productive because, over the years, compression rate has improved enabling digital video to take over all existing and extend to new video applications. The coding algorithm has been tweaked over and over, but it is still based on the same original scheme.

In the last decade, the “social contract” that allowed inventors to innovate, innovation to be brought into standards, standards to be used and inventors to be remunerated has stalled. The HEVC standard is used but the entire IP landscape is clouded by uncertainty.

The EVC standard, approved in 2020, 7 years after the HEVC was approved, has shown that even with an inflow of technologies from a reduced number of sources can provide outstanding results, as shown in the figure:

The EVC baseline, a profile that uses 20+ years old technologies, reaches the performance of EVC. The main profile offers a bitrate reduction of 39% over HEVC, a whisker away from the performance of the lasted VVC standard.

In 1997 the match between IBM Deep Blue and the (human) chess champion of the time made headlines: IBM Deep Blue beat Garry Kasparov. It was easy to herald the age when machines will overtake human non just in keeping accounts, but also in one of the noblest intellectual activities: chess.

This was achieved by writing a computer program that explored more alternatives that a human could reasonably do, although a human’s intuition can look far into the future. In that sense, Deep Blue operated much like MPEG video coding.

Google DeepMind’s AlphaGo did the same in 2015 by beating the Go champion Sedol Lee. The Go rules are simpler than chess rules, but the alternatives in Go are way more numerous. There is only one type of piece (the stone) onstead of six (king, queen, bishop, knight, rook and pawn), but the Go board has 19×19 boxes instead of 8×8 of chess. While DeepBlue made a chess move by brute-force exploring future moves, AlphaGo made go moves relying on neural networks which had learned moves.

That victory signalled a renewed interest in a 3/4 of a century old technology – neural networks.

In neural networks data are processed by different layers that extract essential information until a compressed representation of the input data is achieved (left-hand side). At the decoder, the inverse process takes place.

MPAI has established the AI-Enhanced Video Coding (MPAI-EVC) standard project. This is based on an MPAI study collecting published results where individual HEVC coding tools have been replaced by neural networks (in the following we call then AI tools). By summing up all the published gains and improvement of 29% is obtained.

This is an interesting, but far from being a scientifically acceptable result because the different tools used were differently trained. Therefore, MPAI is currently engaged in the MPAI-EVC Evidence Project that can be exemplified by the following figure:

Here all coding tools have been replaced by AI tools. We intend to train these new tools with the same source material (a lot of it) and assess the improvement obtained.

We expect to obtain an objectively measured improvement of at least 25%.

After this MPAI will engage in the actual development of MPAI-EVC. We expect to obtain an objectively measured improvement of at least 35%. Our experience suggests that the subjectively measured improvement will be around 50%.

Like in Deep Blue, old tools had a priori statistical knowledge is modelled and hardwired in the tools, but in AI, knowledge is acquired by learning the statistics.

 

This is the reason why AI tools are more promising than traditional data processing tools.

For a new age you need new tools and a new organisation tuned to use those new tools.

 

 


MPAI receives technologies for its AI framework standard and calls for technologies supporting audio and human-machine conversation

Geneva, Switzerland – 17 february 2021. At its 5th General Assembly, Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI), an international, unaffiliated standards association

  1. Has kicked off work on its AI Framework (MPAI-AIF) standard after receiving substantial proposed technologies.
  2. Is calling for technologies to develop two standards related to audio (MPAI-CAE) and multimodal conversation (MPAI-MMC).
  3. Will soon be developing the Framework Licence for the next maturing project “Compression and Understanding of Industrial Data” (MPAI-CUI).

MPAI has reviewed responses to the call issued 2 months ago for tech­nologies supporting its AI Framework (MPAI-AIF) standard. The goal is to enable creation and automation of mixed Mach­ine Learning (ML) – Artificial Intelligence (AI) – Data Proces­sing (DP) and inference workflows, implemented as software, hardware or mixed software and hardware. MPAI-AIF will offer extended explainability to applications conforming to MPAI standards. The submissions received are enabling MPAI to develop the inten­ded standard whose publication is planned for July 2021.

MPAI has issued two Calls for Technologies supporting two new standards:

  1. The Context-based Audio Enhancement (MPAI-CAE) standard will improve the user exper­ien­ce for several audio-related applications in a variety of contexts such as in the home, in the car, on-the-go, in the studio etc. Examples of use are adding a desired emotion to a speech without emotion, preserving old audio tapes, improving the audioconference experience and removing unwanted sounds while keeping the relevant ones to a user walking in the street.
  2. The Multimodal Conversation (MPAI-MMC) standard will enable a human-mach­ine conver­sation that emulates human-human conversation in completeness and intensity by using AI. Examples of use are an audio-visual conversation with a machine where the machine is imper­sonated by a synthesised voice and an animated face, a request for information about an object while displaying it, a human question to a machine translated using a voice preserving the speech features of the human.

The content of the two Calls for Technologies will be introduced at two online conferences. Attendance is open to interested parties.

MPAI has developed the functional requirements for the Compression and Understanding of In­dustrial Data (MPAI-CUI) and, by decision of the General Assembly, MPAI Active Members may now develop the Framework Licence to facilitate the actual licences – to be developed outside of MPAI. The standard will be used to assess the risks faced by a company by using information from the flow of data produced.

The MPAI web site provides information about other AI-based standards being developed: AI-Enhanced Video Coding (MPAI-EVC) will improve the performance of existing video codecs, Integrative Genomic/Sensor Analysis (MPAI-GSA) will compress and understand the res­ults of combining genomic experiments with those produced by related devices/sensors, and Server-based Predictive Multiplayer Gaming (MPAI-SPG) will improve the user experience of online multiplayer game players.

MPAI develops data coding standards for applications that have AI as the core enabling technology. Any legal entity who supports the MPAI mission may join MPAI if it is able to contribute to the development of standards for the efficient use of data.

Visit the MPAI web site and contact the MPAI secretariat for specific information.

 


Having a conversation with a machine

Like it or not, we do what the title says regularly, with mixed results. Searching for that file in your computer should be easy but it is often not, finding that an email is a hassle, especially when retrieving it is so important, and talking with an information service is often challenge to your nervous system.

I have no intention to criticise the – difficult – work that others have done, but the results of human-machine conversation are far from satisfactory.

Artificial Intelligence promises to do a better job.

Since a few months, MPAI has started working on an area called Multimodal Conversation (MMC). The intention is to use a plurality of communication means to improve the ability of humans to talk to machines. Currently, the area includes 3 Use Cases:

  1. Conversation with Emotion (CWE)
  2. Multimodal Question Answering (MQA)
  3. Personalized Automatic Speech Translation (PST).

In this article I would like to present the first, CWE.

The driver of this use case is that, to improve the ability of a machine to have a conversation with a human, it is important for the machine to have information about the words that are used by the human but also on other entities such as the emotional state of the human. If a human is asking a question about the quality of a telephone line, it is important for the machine to know if the emotion of the speaker is neutral – the question is likely purely informational – or altered – the question likely implies a dissatisfaction with the telephone service.

In CWE the machine uses speech, text from a keyboard and video to make the best assessment about the conversation partner’s emotional state. This is shown by Figure 1 where you can see that three blocks are dedicated to extracting emotion in 3 modes. The 3 estimated emotions are feed to another block which is tasked to make a final decision.

Figure 1 – Emotion extraction from text, speech, and video

The first three blocks starting from the left-hand side can be implemented as neural networks. But MPAI does not wish to disenfranchise those who have invested for years in traditional data processing solutions and have produced state-of-the-art technologies. MPAI seeks to define standards that are, as far as possible, technology independent. Therefore, in a legacy context Figure 1 morphes to Figure 2

Figure 2 – Emotion extraction using legacy technologies

 In the system of Figure 2, each of the 3 initial blocks extracts features from the input data and uses a vector of features to query an appropriate Knowledge Base that responds with one or more candidate emotions.

Actually, Video analysis and Language understanding do more than just providing emotion information. This is seen in the following Figure 3 where the two blocks additionally provide Meaning, i.e., information extracted from the text and video such as question, statement, exclamation, expression of doubt, request, invitation etc.

 

Figure 3 – Emotion and meaning enter Dialogue processing

 

Meaning and emotion are fed into the Dialogue processing component. Note that in a legacy implementation Dialogue processing, too, needs access to a Dialogue Knowledge Base. From now on, however, we will assume to deal with a full AI-based implementation.

Dialogue processing produces two streams of data, as depicted in Figure 4. The result is composed by:

  1. a data stream to drive speech synthesis expressed either as “Text with emotion” and “Concept with emotion”.
  2. a data stream to drive face animation in tune with the speech.

Figure 4 – End-to-end multimodal conversation with emotion

 

The last element, to move from theory to practice, is that you need an environment where you can place the blocks (that MPAI calls AI Modules – AIM) establish all connections, activate all timings, and execute the chain. One could even want to train or retrain the individual neural networks.

The technology that makes this possible by the MPAI AI Framework (MPAI-AIF) for which a Call for Technologies has been published on 2020/12/16 and whose responses are due on 2021/02/15. The full scheme of Multimodal conversation with emotion in the MPAI AI Framework is represented by Figure 5

Figure 5 – Multimodal conversation with emotion in the MPAI AI Framework

The six components of MPAI-AIF are:

  • Management and Control, in charge of the workflow
  • Execution, the environment where the workflow is executed.
  • AI Modules (AIM), the basic blocks of the system
  • Communication, to handle both internal and external communication (e.g., Dialogue processing could be located on the cloud)
  • Storage, for exchanging data between AIMs.
  • Access, to access static or slowly changing external data.

What does MPAI intend to standardise in Conversation with emotion? In a full AI-based implementation

  1. The format of the Text leaving Speech recognition.
  2. Representation of Emotion
  3. Representation of Meaning
  4. Format of Reply: Text with Emotion or Concept with Emotion
  5. Format of Video anmation

In case of a legacy implementation, in addition to the above we need:

  1. Emotion KB (video) query format with Video features
  2. Emotion KB (speech) query format with Speech features
  3. Emotion KB (text) query format with Text features
  4. Dialoue KB query format

As you see MPAI standardisation is minimal, in tune with the requirement of standardisation to specify the minimum that is necessary for interoperability. In the MPAI case the minimum is what is required to assemble a working system using AIMs from independent sources.


Opacity or transparency?

By publishing its Manifesto, MPAI has made clear in a few sentences its strategic analysis of the AI industry, what will be its action points and why that will provide benefits.

#1 Applications using AI are extending in scope and performance. the AI industry is one of the fastest growing industries.

#2 The industry is not developing as fast as it could because there are hurdles. The first is the fact that the AI application development model is based on frameworks that tend to create the well-known walled garden effect. Importing applications is easy but exporting applications is difficult. We are missing the seamless interactions that could propel the industry to new heights.

The second hurdle is the fact that most AI applications are monolithic and opaque. If these two adjectives apply to an application that changes a punctured tyre, you may not care, but if it is an application that selects relevant news, I do care.

#3 MPAI believes that AI interoperability standards will have the same beneficial effects that digital media standard had on the media industry starting 30 years ago and continuing today. AI and media may very well be different beasts, but the MPAI notion of coding as the transformation of data from one representation to an equivalent one more suited to a specific application shows that there are more commonalities than differences.

#4 The video and audio coding and decoding chips – minuscule entities compared to the size of digital media services they enabled – are mirrored by the MPAI notion of AI module. As much as the digital media standards defined the syntax and semantics of the data coming into the decoder, the AI modules are defined by the syntax and semantics of the data coming into the AI Module (AIM). However, because an AIM is typically connected to other AIMs, the syntax, and semantics of the output data of an AIM are also standardised. As much as the “digital media decoder” became a basic component available on the open market, MPAI expects that the “AI Modules” will become basic components available on the open market.

#5 The fact that AIMs are meant to be connected in a variety of topologies shows that MPAI has an additional problem to solve. The MPAI “AI Framework” (AIF), for which MPAI has issued a Call for Technologies due 15 February 2021 will enable creation, execution, composition, and update of AIM-based workflows.

#6 The elements described above show that MPAI standards will benefit all actors involved.

  • Technology providers will be able to offer their conforming AIMs with different technologies – AI, ML, legacy data processing – and different levels of performance to an open market.
  • Application developers will find on the open market the AIMs needed by their applications and will thus be able to develop more ambitious applications that they could otherwise develop.
  • The fact that there will be a race in a level play field among providers of standard AIM will fuel innovation.
  • Consumers will have a wider choice of better AI applications from a competitive market of application providers who will be able to draw state-of-the-art technologies (AIMs) from the open market.
  • Society will be able to lift the veil of opacity from large, monolithic AI-based applications because atomic AIMs will tell a lot of what is the logic that runs the application.

#7 The very fact that so much investment is being made on AI by Academia and Industry – e.g., representation learning, transfer learning, edge AI, and reproducibility of performance today and more in the future – means that innovations will characterise AI on which MPAI standards are based for the years to come.

#8 One of the founding drivers of MPAI is the realisation that Fair, Reasonable and Non-Discriminatory (FRAND) declarations are no longer a match to today’s technology and business complexity. Society is deprived of the use of valuable technologies and IP holders are deprived of a fair remuneration of their investment. By setting in advance IP holders’ IPR guidelines MPAI’s Framework Licences will facilitate users of MPAI standards. A first instance of an MPAI Framework Licence for MPAI-AIF has already been developed.

#9 MPAI is proud of being a technical body, but even more proud of being aware of the revolutionary impact AI will have on the future of human society and that technology and society should not be antagonists in addressing the revolution. MPAI will create opportunities for its standards developers to interact with high-level thinkers. Instead of addressing expected problems from AI in abstract terms, MPAI will propose actual cases and seek advice on the impact its standards will have on society.

#10 This is not wishful thinking. Cutting monolithic AI applications in smaller pieces is not a dream but a concrete reality underpinned by the MPAI AI Framework and the MPAI AI Modules. It is debatable whether an AIM/AIF-based solution will be more efficient than a monolithic AI solution. It is not debatable that an AIM/AIF-based solution will be less onerous to design and make and that it will be more transparent to users.


MPAI addresses new standards for Context-based Audio Enhancement and Multimodal Conversation

Geneva, Switzerland – 20 January 2021. Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI), an international unaffiliated standards association, is engaging in two new areas for standardisation: Context-based Audio Enhancement and Multimodal Conversation.

Standards in the Context-based Audio Enhancement (MPAI-CAE) work area will improve the user experience for several audio-related applications in a variety of contexts such as in the home, in the car, on-the-go, in the studio etc. Examples are:

  1. adding emotions to a speech without emotion
  2. preserving old audio tapes
  3. improving the audioconference experience
  4. removing unwanted sounds while keeping relevant sounds to a user walking in the street.

Standards for Multimodal Conversation (MPAI-MMC) work area will enable a human-machine conversation that emulates human-human conversation in completeness and intensity by using AI. Examples are focused on machines

  1. producing speech and animated face that have a level of emotion consistent with the emotion contained in text, speech and face of the human who is talking to the machines;
  2. responding to a question asked by a human who is showing an object;
  3. translating to another language a question asked by a human using a voice with features similar to those of the human.

MPAI plans on publishing Calls for Technologies for these two standards at its next General Assembly on 2021/02/17. The current draft are available here and here.

At its last General Assembly (MPAI-3), MPAI issued a Call for Technologies for its planned AIF Framework (MPAI-AIF) standard. Submissions are due 2020/02/15 for review and action at its next General Assembly (MPAI-5) on 2020/02/17.

The MPAI web site provides information about other MPAI standards being developed: MPAI-CUI uses AI to compress and understand industrial data, MPAI-EVC to improve the performance of existing video codecs, MPAI GSA to understand and compress the res­ults of combining genomic experiments with those produced by related devices/sensors, e.g. video, motion, location, weather, medical sensors, and MPAI-SPG to improve the user experience of online multiplayer games.

MPAI develops data coding standards for applications that have AI as the core enabling technology. Any legal entity that supports the MPAI mission may join MPAI if it is able to contribute to the development of standards for the efficient use of data.

Visit the MPAI home page and contact the MPAI secretariat for specific information.

 

 


A vision for AI-based data coding standards

Use of technologies based on Artificial Intelligence (AI) is extending to more and more applic­ations yielding one of the fastest-grow­ing markets in the data analysis and service sector.

However, industry must overcome hurdles for stakeholders to fully exploit this historical oppor­tunity: the current framework-based development model that makes applic­ation redep­loyment difficult, and monolithic and opaque AI applications that generate mistrust in users.

MPAI – Moving Picture, Audio and Data Coding by Artificial Intelligence – believes that univer­sally accessible standards can have the same positive effects on AI as digital media stan­dards and has identified data coding as the area where standards can foster development of AI tech­nologies, promote use of AI applications and contribute to the solution of existing problems.

MPAI defines data coding as the transformation of data from a given representation to an equiv­alent one more suited to a specific application. Examples are compression and semantics extraction.

MPAI considers AI module (AIM) and its interfaces as the AI building block. The syntax and semantics of interfaces determine what AIMs should per­form, not how. AIMs can be implemented in hardware or software, with AI or Machine Learning legacy Data Processing.

MPAI’s AI framework enabling creation, execution, com­pos­ition and update of AIM-based work­flows (MPAI-AIF) is the cornerstone of MPAI standardisation because it enables building high-com­plexity AI solutions by interconnecting multi-vendor AIMs trained to specific tasks, operating in the standard AI framework and exchanging data in standard formats.

MPAI standards will address many of the problems mentioned above and benefit various actors:

  • Technology providers will be able to offer their conforming AIMs to an open market
  • Application developers will find on the open market the AIMs their applications need
  • Innovation will be fuelled by the demand for novel and more performing AIMs
  • Consumers will be offered a wider choice of better AI applications by a competitive market
  • Society will be able to lift the veil of opacity from large, monolithic AI-based applications.

Focusing on AI-based data coding will also allow MPAI to take advantage of the results of emer­ging and future research in representation learning, transfer learning, edge AI, and reproducibility of perfor­mance.

MPAI is mindful of IPR-related problems which have accompanied high-tech standardisation. Unlike standards developed by other bodies, which are based on vague and contention-prone Fair, Reasonable and Non-Discriminatory (FRAND) declarations, MPAI standards are based on Frame­work Licences where IPR holders set out in advance IPR guidelines.

Finally, although it is a technical body, MPAI is aware of the revolutionary impact AI will have on the future of human society. MPAI pledges to address ethical questions raised by its technical work with the involvement of high-profile external thinkers. The initial significant step is to enable the understanding of the inner working of complex AI systems.