Moving Picture, Audio and Data Coding
by Artificial Intelligence

All posts

MPAI commences development of the Framework Licence for the MPAI AI Framework

Geneva, Switzerland – 18 November 2020. The Geneva-based international Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) has concluded its second General As­sembly making a major step toward the development of its first standard called MPAI AI Frame­work, acronym MPAI-AIF.

MPAI-AIF has been designed to enable creation and automation of mixed processing and infer­ence workflows made of Machine Learning, Artificial Intelligence and traditional Data Processing components.

MPAI wishes to give as much information as possible to users of its standards. After approving the Functional Requirements, MPAI is now developing the Commercial Requirements, to be embodied in the MPAI-AIF Framework Licence. This will collect the set of conditions of use of the eventual licence(s), without values, e.g. currency, percentage, dates etc.

An optimal implementation of the MPAI use cases requires a coordinated combination of processing modules. MPAI has assessed that, by standardising the interfaces of Processing Mod­ules, to be executed in the MPAI AI-Framework, horizontal markets of competing standard implementations of processing modules will emerge.

The MPAI-AIF standard, that MPAI plans on delivering in July 2021, will reduce cost, promote adoption and incite progress of AI technologies while, if the market develops incompatible implementations, costs will multiply, and adoption of AI technologies will be delayed.

MPAI-AIF is the first of a series of standards MPAI has in its development pipeline. The following three work areas, promoted to Functional Requirements stage, will build on top of MPAI-AIF:

  1. MPAI-CAE – Context-based Audio Enhancement uses AI to improve the user experience for a variety of uses such as entertainment, communication, teleconferencing, gaming, post-prod­uction, restoration etc. in the contexts of the home, the car, on-the-go, the studio etc. allowing a dynamically optimised user experience.
  2. MPAI-GSA – Integrative Genomic/Sensor Analysis uses AI to understand and compress the results of high-throughput experiments combining genomic/proteomic and other data – for in­stance from video, motion, location, weather, medical sensors. The target use cases range from personalised medicine to smart farming.
  3. MPAI-MMC – Multi-Modal Conversation uses AI to enable human-machine conversation that emulates human-human conversation in completeness and intensity.

The MPAI web site provides more information about other MPAI standards: MPAI-EVC uses AI to improve the performance of existing video codecs, MPAI-SPG to improve the user experience of online multiplayer games and MPAI-CUI to compress and understand industrial data.

MPAI seeks the involvement of companies who can benefit from international data coding stan­dards and calls for proposals of standards. In a unique arrangement for a standards organisations, MPAI gives the opportunity, even to non-members, to accompany a proposal through the defin­ition of its goals and the development of functional requirements. More details here.

MPAI develops data coding standards for a range of applications with Artificial Intelligence (AI) as its core enabling technology. Any legal entity that supports the MPAI mission may join MPAI if it is able to contribute to the development of Technical Specifications for the efficient use of Data.

Visit the MPAI home page and  contact the MPAI secretariat for specific information.

 


A new way to develop useful standards

Communication standards, at least so far, are handled in an odd way. They are meant to serve the needs of millions, if not billions of people, still the decisions about what the standards should do are left in the hands of people who, no matter how many, are not billions, not millions, not even thousands.

This is the end point of the unilateral approach adopted by inventors starting, one can say, from Gutenberg’s moving characters and continuing with Niépce-Daguerre’s photography, Morse’s telegraph, Bell-Meucci’s telephone, Marconi’s radio and tens more.

In retrospect, those were “easy” times because each invention satisfied a basic need. Today, the situation is quite different: basic needs are more than satisfied (at least for a significant part of human beings) while the “other needs” can hardly be addressed by the mentioned unilateral approach at technology use. This is even more true today when we are dealing with a technology – Artificial Intelligence – that will likely be the most pervasive technology ever seen.

This is the reason why MPAI – Moving Pictures and Audio Coding for Artificial Intelligence – likes to call itself, as the domain extension says, a “community”. Indeed, MPAI opens its doors to those who have for a need or wish to propose the development of a new standard. As “MPAI doors” are virtual because all MPAI activities are done online, access to MPAI is all the more immediate.

This MPAI openness should not be taken as a mere “suggestion box”, because MPAI does more than just asking for ideas.

To understand how MPAI is “a community”, I need to explain the MPAI process to develop standards, depicted in Figure 1

Figure 1 The MPAI standard development stages

Let’s start from the bottom of the process. Members as well as non-members submit proposals. These are collected and harmonised, some proposals get merged with other similar proposals. Some get split because the harmonisation process so demands. The goal is to identify proposals of standard that reflect proponent’s wishes while making sense in terms of specification and use across different environments. Non-members can fully participate in this process on par with other members. The result of this process is the definition of one homogeneous areas of work called “Use Case”, but it is possible that or possibly more than one Use Case is identified. Each Use Case is described in an Application Note. An example of Application Note is

The 1st stage of the process entails a full characterisation of the Use Case and the description of the work program that will produce the Functional Requirements.

The 2nd stage is the actual development of the Functional Requirements of the area of work represented by the Use Case.

The “MPAI openness” is represented by the fact that anybody may participate in the three stages of Interest Collection, Use Case and Functional Requirements. With an exception, though, when a Member makes a proposal that s/he wishes to be exposed to members only.

The next stage is Commercial Requirements. A standard is like any other supply contract. You describe what you supply with its characteristics (Functional Requirements) and at what conditions (Commercial Requirements).

It should be noted that, from this stage on, non-members are not allowed to participate (but they can become members at any time), because their role of proposing and describing what a standard should do is over. Antitrust laws do not permits that sellers (technology providers) and buyers (users of the standard) sit together and agree on values such as numbers, percentage or dates, but it permit sellers to indicate the conditions, but without values. Therefore, the embodiment of the Commercial Requirements, i.e. the Framework Licence, will refrain from adding such details.

Once both Requirements are available MPAI is in a position to draft the call for Technologies (stage 4), review the proposals and develop the standards (stage 5). This is where the role of Associate Members in MPAI ends. Only Principal Members may vote to approve the standard, hence trigger its publication. But A, Associate Member may become a Principal Member at any time.


A new channel between industry and standards

The challenges of MPAI standardisation

Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) is a standards organisation with the mission to develop data coding standards that have Artificial Intelligence as its core enabling technology.

MPAI faces two main challenges to achieve its mission. The first comes from MPAI’s definition of “data”: any digital representation of a real or computer-generated entity. As any living being, or human organisation generates data, and they are more and more pervasive, the scope of MPAI standards is definitely very broad. The second comes from MPAI’s definition of data coding as the transformation of data in one representation into another representation that is more convenient. As convenience is in the eye of the beholder, the number of transformations, hence standards, is potentially large.

Unlike audio and video coding, whose needs were clear from day one and whose application domains have incrementally extended over the years, data coding is a much more articulated domain where the simple one-dimensional (compression) world of audio and video coding morphs to a world where the dimensions are potentially many.

This is not a late discovery. The challenges were clear when the MPAI Statutes were drafted. The standard development workflow described in Annex 1 to the Statutes envisages the involvement of any party interested in the process of identifying projects of standard. This is very different from what happens, e.g. in ISO, where the process of definition of a standard happen in watertight compartments where only committee members or National Bodies can propose standard projects. The logic behind is that standards project are weapons in the hands of those who control the committees and you do not want to give up your weapons easily.

The MPAI standardisation process

MPAI has no weapons because its mission is to serve the industry. MPAI has a mission which it implements as follows

  1. MPAI solicits proposals of new projects in the form of use cases from anybody, collects and harmonises use cases in a structured proposal of a project and defines a comprehensive use case that is likely to be usable across industries.
  2. Then MPAI develops functional requirements. Note that in the first two stages of the MPAI standards workflow – Use Cases (UC) and Functional Requirements (FR) – participation in the relevant meetings is open to anybody (strictly speaking, however, the member who has been proposed a use case may request that only MPAI members participate in the UC and FR stages).
  3. The following stages are: Commercial Requirements, Call for Technologies, Standard development and MPAI standard.

The AI Framework case

In the spirit of collecting inputs from anybody, I will report the case of AI Framework that is likely to become the first MPAI standard. If you have any comment please send an email to leonardo@chiariglione.org.

MPAI has built and analysed six use cases where Artificial Intelligence (AI) technologies can offer sig­nif­icant benefits compared to traditional technologies. The use cases cover widely different applic­ation areas:

  1. imp­rov­ing the audio experience when audio is consumed in non-deal conditions
  2. processing DNA information jointly with consequent physical effects on living organisms
  3. replacing components of a traditional video codec with AI-based components
  4. making up for missing information from online gaming clients
  5. multimodal conversation, and
  6. compression and understanding of industrial data.

Even though use cases are disparate, each one of them can be implemented as a combination of processing modules performing functions that concur to achieving the intended result.

MPAI has assessed that leaving it to the market to develop individual implementations would multiply costs and delay adoption of AI technologies, while a suitable level of standardisation can reduce overall design costs and increase component reusability. Eventually a horizontal market may emerge where proprietary and competing implementations of components exposing standard interfaces will reduce cost and promote adoption and progress of AI technologies.

The MPAI-AIF standard

MPAI has determined that a standard for a processing framework satisfying the requirements derived from the six use cases will achieve the goal. MPAI calls the future standard as AI Framework (MPAI-AIF). As AI is a fast-moving field, MPAI expects that MPAI-AIF will be extended as new use cases will bring new requirements and new technologies will reach maturity.

To avoid the deadlock experienced in other high-technology fields, MPAI will develop a Frame­work Licence (FWL) associated with the defined MPAI-AIF Requirements. The FWL, essentially the business model that SEP holders apply to monetise their Intellectual Properties (IP), but without values such as the amount or percentage of royalties or dates due, will act as Commercial Requirements for the standard and provide a clear IPR licensing frameworks.

MPAI-AIF enables the creation and automation of mixed ML-AI-DP processing and inference workflows at scale for the use cases mentioned above. The key components of the framework should address different modalities of operation (AI, ML and DP), data pipeline jungles and computing resource allocations including constrained hardware scenarios of edge AI devices.

The MPAI-AIF reference model

The reference diagram of MPAI-AIF is given by the following figure

Figure 1 – Normative MPAI-AIF Architecture

  1. Management and Control: acts on PMs, so that they execute in the correct order and at the time when they are needed handling both simple orchestration tasks (i.e. a script) and much more complex tasks with a topology of networked PMs that need to be synchronised according to a given time base.
  2. Execution: is the environment where PMs operate. It is interfaced with M&C and with Communication and Storage. It receives external inputs and produces the requested outputs both of which are application specific.
  3. Processing Modules (PM) are composed of (ML or traditional Data Processor) processing element, interface to communication and storage and input and output interfaces (processing specific)
  4. Communication: is required in several cases and can be implemented accordingly, e.g. by means of a service bus.
  5. Storage: Stores the inputs and outputs of the individual PMs, data from the PM’s state and intermediary results, shared data among PMs, information used by M&C and its procedures
  6. Access: represents the access to static or slowly changing data required by the application such as domain knowledge data, data models, etc.

Requirements

Component requirements

  1. The MPAI-AIF standard shall include specifications of 6 Components
    1. Management and Control
    2. Execution
    3. Processing Modules (PM)
    4. Communication
    5. Storage
    6. Access
  2. Management and Control shall enable operations on the life cycle of
    1. Single PMs: instantiation/removal, reconfiguration, dump/retrieve internal state, start-suspend-stop, train-retrain-update, enforce resource limits
    2. Combinations of PMs: initialisation of the computational model, instructions (e.g. manual or automatic) to computational nodes to communicate between themselves and with output and storage, auto-configuration based on machine learning reconfig­uration of computational models, instantiation-removal-reconfiguration of PMs.
  3. Management and Control shall support
    1. An architecture that allows hierarchical execution of workflows, i.e. computational graphs of PMs, possibly structured in hierarchies, for the identified application scenarios
    2. Supervised and unsupervised learning, and reinforcement-based learning paradigms
    3. Direct Acyclic Graph (DAG) topology of PMs
  4. Execution shall
    1. Support distributed deployment of PMs in the cloud and at the edge
    2. Be scalable in complexity and performance to cope with different scenarios, e.g. from small MCUs to complex distributed systems
  5. PMs shall support protocols for
    1. Autoconfiguration (e.g. peer-to-peer)
    2. Manual configuration
    3. Advertising and Discovery
  6. PMs
    1. May be a mixture of AI/ML or DP technologies
    2. Shall be directly connected to the ML life cycle without interruption
  7. Communication shall enable direct and mediated interconnections of PMs
  8. Storage shall support protocols to specify application dependent non-functional requirements such as access time, retention, read/write throughput
  9. Access shall support access to static or slowly changing data of standard formats. Access to private data should also be possible

 Systems requirements

The following requirements are not intended to be applied to the MPAI-AIF standard, but should be used for assessing technologies

  1. Management and Control shall support time-based synchronous and asynchronous operation depending on application
  2. Execution shall allow seamless or minimal-impact operation of its PMs while algorithms or ML models are updated, because of new training, or retraining
  3. Task sharing for ML-based PMs shall be supported

General requirements

  1. The MPAI-AIF standard may include profiles for specific (sets of) requirements

For comments on MPAI requirements send an email to leonardo@chiariglione.org


MPAI launches 6 standard projects on audio, genomics, video, AI framework, multiuser online gaming and multimodal conversation

Geneva, Switzerland – 21 October 2020. The Geneva-based international Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) has concluded its first operational General Assembly adopting 6 areas of work, due to become standardisation projects.

MPAI-CAE – Context-based Audio Enhancement is an area that uses AI to improve the user experience for a variety of uses such as entertainment, communication, teleconferencing, gaming, post-production, restoration etc. in such contexts as in the home, in the car, on-the-go, in the studio etc. allowing a dynamically optimized user experience.

MPAI-GSA – Integrative Genomic/Sensor Analysis is an area that uses AI to understand and compress the results of high-throughput experiments combining genomic/proteomic and other data – for instance from video, motion, location, weather, medical sensors. The target use cases range. from personalised medicine to smart farming.

MPAI-SPG – Server-based Predictive Multiplayer Gaming uses AI to minimise the audio-visual and gameplay disruptions during an online real-time game caused by missing information at the server or at the client because of high latency and packet losses.

MPAI-EVC – AI-Enhanced Video Coding plans on using AI to further reduce the bitrate required to store and transmit video information for a variety of consumer and professional applications. One user of the MPAI-EVC standard is likely to be MPAI-SPG for improved compression and higher quality of cloud-gaming content.

MPAI-MMC – Multi-Modal Conversation aims to use AI to enable human-machine conversation that emulates human-human conversation in completeness and intensity

MPAI-AIF – Artificial Intelligence Framework is an area based on the notion of a framework populated by AI-based or traditional Processing Modules. As this is a foundational standard on which other planned MPAI standards such as MPAI-CAE, MPAI-GSA and MPAI-MMC, will be built, MPAI intends to move at an accelerated pace: Functional Requirements ready in November 2020, Commercial Requirements ready in December 2020 and Call for Technologies issued in January, 2021. The MPAI-AIF standard is planned to be ready before the summer holidays in 2021.

You can find more information about MPAI standards.

MPAI covers its Commercial Requirements needs with Framework Licences (FWL). These are the set of conditions of use of a license of a specific MPAI standard without the values, e.g. curren­cy, percentages, dates, etc. MPAI expects that FWLs will accelerate the practical use of its stan­dards.

MPAI develops data coding standards for a range of applications with Artificial Intelligence (AI) as its core enabling technology. Any legal entity that supports the MPAI mission may join MPAI if it is able to contribute to the development of Technical Specifications for the efficient use of Data.

Visit the MPAI home page and  contact the MPAI secretariat for spec­ific information.


What is MPAI going to do?

Moving Picture, Audio and Data Coding by Artificial Intelligence – MPAI – has been devised as an international non-profit organisation with the mission to take over the baton from old-style compression to the new AI-based compression . This will take compression performance to new levels, extend the benefits of compression to all industries beset by huge amounts of data and give them the possibility not only to save costs from compression, but to get more out of their data.

Now that MPAI has been officially constituted on Wednesday 30 September 2020  (see Press Release), what will MPAI do?

This is a reasonable question to ask, but a better question would be: what has MPAI been doing?  This is because, some 2 months before its actual establishment, a group of highly motivated experts has developed some use cases, aggregated in areas where MPAI standards can make the difference.

Thanks to the efforts of many, MPAI has the road already mapped out with several activities at different levels of maturity. The list below gives the more mature areas of the many that have been explored (see the list of use cases). The list order is a personal assessment of the maturity.

  1. Context-based Audio Enhancement (MPAI-CAE) is the most mature area. By using AI, MPAI-CAE can improve the user experience in a variety of instances such as entertainment, communication, teleconferencing, gaming, post-production, restorarton etc. in a variety of contexts such as in the home, in the car, on-the-go, in the studio etc.
  2. Integrative AI-based analysis of multi-source genomic/sensor experiments aims to define a framework where free and commercial AI-based processing components made available in a horizontal market can be combined to make application-specific “proc­essing apps”.
  3. Multi-modal conversation aims to define an AI-based framework of proces­sing components such as fusion of multimodal input, natural language understanding and generation, speech recognition and synthesis, emotion recognition, intention understanding, gesture recognition and knowledge fusion.
  4. Compression and understanding of financial data aims to enable AI-based filtering and extraction of key information from the flow of data that companies receive from the outside, generate inside or issue because of regulatory compliance.
  5. Server-based Predictive Distributed Multiplayer Online Gaming aims to min­imise the visual discontinuities experienced by gameplayer by feeding the data collected from the clients involved in a particular game to an AI-based system that can predict each individual participants’ moves in case that information is missing.
  6. AI-Enhanced Traditional Video Coding aims to develop be a video compres­sion standard that will substantially enhance the performance of an existing video codec by ehnancing or replacing traditional tools with AI-based tools.

MPAI signals a discontinuity with the past not only in the technology it uses to address known industry needs, but also in the way it overcomes the limitations of the Fair, Reasonable and Non-Discriminatory (FRAND) licensing declarations, a burning issue for many standard developing organisations and their industries. MPAI plans to develop and make known, for each MPAI stan­dard, a “framework licence”, i.e. the business model, without values, dates and percentages, that standard essential patent holders intend to use to monetise their patents adopted in the standard.

Companies, academic institutions and individuals representing departments of academic institut­ions may apply for MPAI membership, provided that they can contribute to the development of technical specifications for the efficient use of data.

The MPAI website provides additional information. To join MPAI  please contact the secretariat.


A new organisation dedicated to data compression standards based on Artificial Intelligence

Geneva, 30 September 2020. Today, Moving Picture, Audio and Data Coding by Artificial Intelligence – MPAI – has been established as an international non-profit organisation in Geneva, Switzerland, at a conference call attended by 33 members from 15 countries.

One driving force behind MPAI is the need to have an organisation responsive to industry needs that devel­ops data cod­ing standards for a range of applications with Artificial Intelligence (AI) as its core enabling techn­ology. In the past, the sheer reduction of the amount of data – i.e. com­pression – has been the success factor for a variety of businesses that range from broad­cas­ting, to telecommunication, information technology and related manufacturing indus­tries.

In response to the demand for more compression MPAI intends to develop AI-enab­led standards that further improve the coding efficiency of data types that have already benefited from com­pression and bring the benefits of coding to new data types. An example of AI-enabled coding is to “bring out” aspects of the data semantics relevant to an application.

MPAI believes that, to ensure the success of its standards in the fast-evolving AI field, it must leverage its connection with academia and industrial research – some 40% of the current MPAI members are academic and res­earch institutions.

Another motivation to create MPAI is to overcome the limitations of the Fair, Reasonable and Non-Discriminatory (FRAND) licensing declarations, a burning issue for many standard devel­oping organisations and their industries. MPAI plans to develop, for each MPAI stan­dard, a “framework licence”, i.e. the business model, without values, dates and percentages, that stan­dard essential patent holders intend to use to monetise their patents eventually adopted in the standard.

MPAI has been moving fast. In the past two nonths, a large group of interested people have col­labor­ated to create a set of use cases, some to become standard projects soon, now that MPAI is established. A project that is quickly taking shape is Context-based Audio Enhancement (MPAI-CAE) to improve the user exper­ience in a variety of contexts of practical interest. Multiuser games, AI-assisted driving, and typ­ical “big data” fields such as financial and genomics are also fast-maturing use cases.

Experts from industry, science and academia are invited to join MPAI and help promote data-driven applications through AI-enabled standards.

About MPAI

MPAI is a non-profit, unaffiliated association whose goal is to develop AI enabled digital data compression specifications with clear IPR licensing frameworks.

Any entity, such as corporation and individual firm, partnership, university, governmental body or international organisation supporting the mission of MPAI may apply for membership, provided that it is able to contribute to the development of technical specifications for the efficient use of data.

Contact

For further information, please see https://mpai.community for openly accessible documents or contact leonardo@chiariglione.org. Information on MPAI-CAE can be found here. The list of use cases being considered can be found here.


A new organisation dedicated to data compression standards based on Artificial Intelligence

A new organisation dedicated to data compression standards based on Artificial Intelligence

2020/09/30

Geneva, 30 September 2020. Today, Moving Picture, Audio and Data Coding by Artificial Intelligence – MPAI – has been established as an international non-profit organisation in Geneva, Switzerland, at a conference call attended by 33 members from 15 countries.

One driving force behind MPAI is the need to have an organisation responsive to industry needs that develops data coding standards for a range of applications with Artificial Intelligence (AI) as its core enabling technology. In the past, the sheer reduction of the amount of data – i.e. com­pression – has been the success factor for a variety of businesses that range from broad­cas­ting, to telecommunication, information technology and related manufacturing indus­tries.

In response to the demand for more compression MPAI intends to develop AI-enab­led standards that further improve the coding efficiency of data types that have already benefited from com­pression and bring the benefits of coding to new data types. An example of AI-enabled coding is to “bring out” aspects of the data semantics relevant to an application.

MPAI believes that, to ensure the success of its standards in the fast-evolving AI field, it must leverage its connection with academia and industrial research – some 40% of the current MPAI members are academic and res­earch institutions.

Another motivation to create MPAI is to overcome the limitations of the Fair, Reasonable and Non-Discriminatory (FRAND) licensing declarations, a burning issue for many standard devel­oping organisations and their industries. MPAI plans to develop, for each MPAI stan­dard, a “framework licence”, i.e. the business model, without values, dates and percentages, that stan­dard essential patent holders intend to use to monetise their patents eventually adopted in the standard.

MPAI has been moving fast. In the past two nonths, a large group of interested people have col­labor­ated to create a set of use cases, some to become standard projects soon, now that MPAI is established. A project that is quickly taking shape is Context-based Audio Enhancement (MPAI-CAE) to improve the user exper­ience in a variety of contexts of practical interest. Multiuser games, AI-assisted driving, and typ­ical “big data” fields such as financial and genomics are also fast-maturing use cases.

Experts from industry, science and academia are invited to join MPAI and help promote data-driven applications through AI-enabled standards.

About MPAI

MPAI is a non-profit, unaffiliated association whose goal is to develop AI enabled digital data compression specifications with clear IPR licensing frameworks.

Any entity, such as corporation and individual firm, partnership, university, governmental body or international organisation supporting the mission of MPAI may apply for membership, provided that it is able to contribute to the development of technical specifications for the efficient use of data.

Contact

For further information, please see https://mpai.community for openly accessible documents or contact secretariat@mpai.community. Information on MPAI-CAE can be found here. The list of use cases being considered can be found here.


MPAI Application Note #1 Context-based Audio Enhancement (MPAI-CAE)

MPAI Application Note #1

Context-based Audio Enhancement (MPAI-CAE)

Proponents: Michelangelo Guarise, Andrea Basso (VOLUMIO)

 Description: The overall user experience quality is highly dependent on the context in which audio is used, e.g.

  1. Entertainment audio can be consumed in the home, in the car, on public transport, on-the-go (e.g. while doing sports, running, biking) etc.
  2. Voice communications: can take place office, car, home, on-the-go etc.
  3. Audio and video conferencing can be done in the office, in the car, at home, on-the-go etc.
  4. (Serious) gaming can be done in the office, at home, on-the-go etc.
  5. Audio (post-)production is typically done in the studio
  6. Audio restoration is typically done in the studio

By using context information to act on the content using AI, it is possible substantially to improve the user experience.

Comments: Currently, there are solutions that adapt the conditions in which the user experiences content or service for some of the contexts mentioned above. However, they tend to be vertical in nature, making it dif­ficult to re-use possibly valuable AI-based components of the solutions for differ­ent applications.

MPAI-CAE aims to create a horizontal market of re-usable and possibly context-depending components that expose standard interfaces. The market would become more receptive to innov­ation hence more compet­itive. Industry and consumers alike will benefit from the MPAI-CAE stan­dard.

Examples

The following examples describe how MPAI-CAE can make the difference.

  1. Enhanced audio experience in a conference call

Often, the user experience of a video/audio conference can be marginal. Too much background noise or undesired sounds can lead to participants not understanding what participants are saying. By using AI-based adaptive noise-cancellation and sound enhancement, MPAI-CAE can virtually eliminate those kinds of noise without using complex microphone systems to capture environment characteristics.

  1. Pleasant and safe music listening while biking

While biking in the middle of city traffic, AI can process the signals from the environment captured by the microphones available in many earphones and earbuds (for active noise cancellation), adapt the sound rendition to the acoustic environment, provide an enhanced audio experience (e.g. performing dynamic signal equalization), improve battery life and selectively recognize and allow relevant environment sounds (i.e. the horn of a car). The user enjoys a satisfactory listening experience without losing contact with the acoustic surroundings.

  1. Emotion enhanced synthesized voice

Speech synthesis is constantly improving and finding several applications that are part of our daily life (e.g. intelligent assistants). In addition to improving the ‘natural sounding’ of the voice, MPAI-CAE can implement expressive models of primary emotions such as fear, happiness, sad­ness, and anger.

  1. Efficient 3D sound

MPAI-CAE can reduce the number of channels (i.e. MPEG-H 3D Audio can support up to 64 loudspeaker channels and 128 codec core channels) in an automatic (unsupervised) way, e.g. by mapping a 9.1 to a 5.1 or stereo (radio broadcasting or DVD), maintaining the musical touch of the composer.

  1. Speech/audio restoration

Audio restoration is often a time-consuming process that requires skilled audio engineers with specific experience in music and recording techniques to go over manually old audio tapes. MPAI-CAE can automatically remove anomalies from recordings through broadband denoising, declicking and decrackling, as well as removing buzzes and hums and performing spectrographic ‘retouching’ for removal of discrete unwanted sounds.

  1. Normalization of volume across channels/streams

Eighty-five years after TV has been first introduced as a public service, TV viewers are still strug­gling to adapt to their needs the different average audio levels from different broadcasters and, within a program, to the different audio levels of the different scenes.

MPAI-CAE can learn from user’s reactions via remote control, e.g. to a loud spot, and control the sound level accordingly.

  1. Automotive

Audio systems in cars have steadily improved in quality over the years and continue to be integrated into more critical applications. Toda, a buyer takes it for granted that a car has a good automotive sound system. In addition, in a car there is usually at least one and sometimes two microphones to handle the voice-response system and the hands-free cell-phone capability. If the vehicle uses any noise cancellation, several other microphones are involved.

MPAI-CAE can be used to improve the user experience and enable the full quality of current audio systems by reduc­ing the effects of the noisy automotive environment on the signals.

  1. Audio mastering

Audio mastering is still considered as an ‘art’ and the prerogative of pro audio engineers. Normal users can upload an example track of their liking (possibly obtained from similar musical content) and MPAI-CAE analyzes it, extracts key features and generate a master track that ‘sounds like’  the example track starting from the non-mastered track.  It is also possible to specify the desired style without an example and the original track will be adjusted accordingly.

Requirements: The following is an initial set of MPAI-CAE functional requirements to be further developed in the next few weeks. When the full set of requirements will be developed, the MPAI General Assembly will decide whether an MPAI-CAE standard should be developed.

  1. The standard shall specify the following natural input signals
    1. Microphone signals
    2. Inertial measurement signals (Acceleration, Gyroscope, Compass, …)
    3. Vibration signals
    4. Environmental signals (Proximity, temperature, pressure, light, …)
    5. Environment properties (geometry, reverberation, reflectivity, …)
  2. The standard shall specify
    1. User settings (equalization, signal compression/expansion, volume, …)
    2. User profile (auditory profile, hearing aids, …)
  3. The standard shall support the retrieval of pre-computed environment models (audio scene, home automation scene, …)
  4. The standard shall reference the user authentication standards/methods required by the specific MPAI-CAE context
  5. The standard shall specify means to authenticate the components and pipelines of an MPAI-CAE instance
  6. The standard shall reference the methods used to encrypt the streams processed by MPAI-CAE and service-related metadata
  7. The standard shall specify the adaptation layer of MPAI-CAE streams to delivery protocols of common use (e.g. Bluetooth, Chromecast, DLNA, …)

 Object of standard: Currently, three areas of standardization are identified:

  1. Context type interfaces: a first set of input and output signals, with corresponding syntax and semantics, for audio usage contexts considered of sufficient interest (e.g. audiocon­ferencing and audio consumption on-the-go). They have the following features
    1. Input and out signals are context specific, but with a significant degree of commonality across contexts
    2. The operation of the framework is implementation-dependent offering implementors the way to produce the set of output signals that best fit the usage context
  2. Processing component interfaces: with the following features
    1. Interfaces of a set of updatable and extensible processing modules (both traditional and AI-based)
    2. Possibility to create processing pipelines and the associated control (including the needed side information) required to manage them
    3. The processing pipeline may be a combination of local and in-cloud processing
  3. Delivery protocol interfaces
    1. Interfaces of the processed audio signal to a variety of delivery protocols

Benefits: MPAI-CAE will bring benefits positively affecting

  1. Technology providers need not develop full applications to put to good use their technol­ogies. They can concentrate on improving the AI technologies that enhance the user exper­ience. Further, their technologies can find a much broader use in application domains beyond those they are accustomed to deal with.
  2. Equipment manufacturers and application vendors can tap from the set of technologies made available according to the MPAI-CAE standard from different competing sources, integrate them and satisfy their specific needs
  3. Service providers can deliver complex optimizations and thus superior user experience with minimal time to market as the MPAI-CAE framework enables easy combination of 3rd party components from both a technical and licensing perspective. Their services can deliver a high quality, consistent user audio experience with minimal dependency on the source by selecting the optimal delivery method
  4. End users enjoy a competitive market that provides constantly improved user exper­iences and controlled cost of AI-based audio endpoints.

 Bottlenecks: the full potential of AI in MPAI-CAE would be unleashed by a market of AI-friendly processing units and introducing the vast amount of AI technologies into products and services.

 Social aspects: MPAI-CAE would free users from the dependency on the context in which they operate; make the content experience more personal; make the collective service experience less dependent on events affecting the individual participant and raise the level of past content to today’s expectations.

Success criteria: MPAI-CAE should create a competitive market of AI-based components expos­ing standard interfaces, processing units available to manufacturers, a variety of end user devices and trigger the implicit need felt by a user to have the best experience whatever the context.


MPAI launches Context-based Audio Enhancement standard project

Geneva, 2020/09/12

Formation of Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) was announced in July 2020. It is planned to be established as a non-profit organisation by the end of September 2020. It will develop technical specifications of data coding, especi­ally using Artificial Intelligence, and their integration in Information and Communication Technology systems, brid­ging the gap between its technical specifications and their practical use through Intellectual Property Rights Guidelines, such as Framework Licences.

Today MPAI announces that one use case – Context-based Audio Enhancement (MPAI-CAE) – has reached sufficient maturity to warrant the start of the next stage where detailed functional requirements are identified.

MPAI-CAE addresses a variety of consumer-oriented use cases, e.g. entertain­ment, voice com­munication, audio conferencing, gaming etc. relevant to different contexts – e.g., at home, in the car and on the go – that may greatly influence the audio experience. MPAI-CAE also addresses professional applications such as audio (post-)production and restoration.

The MPAI-CAE standard will specify

  1. Input and output interfaces for a set of contexts
  2. Interfaces of updatable and extensible processing modules, both traditional and AI-based, to create processing pipelines for possibly partly local and partly on-the-cloud execution
  3. Interfaces of the processed audio signals to a variety of delivery protocols.

MPAI envisages that technology providers will benefit from a wider usage of their technologies beyond their specific domains; application vendors adopting the emerging MPAI-CAE standard will be able to tap from the common set of technologies to support their specific needs; service providers will benefit from an accelerated delivery by being able to integrate third parties’ components from both a technical and licensing perspective; and end users will be able to tap from a competitive market providing constantly improved user experiences and AI-based audio endpoints.

MPAI is investigating several other draft projects in the area of coding of still and moving pictures, event sequences and other data such as interferometric data for gravitational-wave detection and genomic data. They are expected to become standard develop­ment projects as they mature.

About MPAI

MPAI is a non-profit, unaffiliated association whose goal is to establish a set of standards for advanced audio, video and data coding using artificial intelligence and to establish procedures that facilitate the timely and effective use of the standards it develops.

Any entity, such as corporation and individual firm, partnership, university, governmental body or international organisation supporting the mission of MPAI may apply for membership, provided that it is able to contribute to the development of technical specifications for the efficient use of data.

For further information, please contact leonardo@chiariglione.org and see https://mpai.community for MPAI and https://mpai.community/2020/09/12/mpai-cae/ for more details on MPAI-CAE.


MPAI launches Context-based Audio Enhancement standard project

Geneva, 2020/09/12 – Formation of Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) was announced in July 2020. It is planned to be established as a non-profit organisation by the end of September 2020. It will develop technical specifications of data coding, especi­ally using Artificial Intelligence, and their integration in Information and Communication Technology systems, brid­ging the gap between its technical specifications and their practical use through Intellectual Property Rights Guidelines, such as Framework Licences.

Today MPAI announces that one use case – Context-based Audio Enhancement (MPAI-CAE) – has reached sufficient maturity to warrant the start of the next stage where detailed functional requirements are identified.

MPAI-CAE addresses a variety of consumer-oriented use cases, e.g. entertain­ment, voice com­munication, audio conferencing, gaming etc. relevant to different contexts – e.g., at home, in the car and on the go – that may greatly influence the audio experience. MPAI-CAE also addresses professional applications such as audio (post-)production and restoration.

The MPAI-CAE standard will specify:

  1. Input and output interfaces for a set of contexts
  2. Interfaces of updatable and extensible processing modules, both traditional and AI-based, to create processing pipelines for possibly partly local and partly on-the-cloud execution
  3. Interfaces of the processed audio signals to a variety of delivery protocols.

MPAI envisages that technology providers will benefit from a wider usage of their technologies beyond their specific domains; application vendors adopting the emerging MPAI-CAE standard will be able to tap from the common set of technologies to support their specific needs; service providers will benefit from an accelerated delivery by being able to integrate third parties’ components from both a technical and licensing perspective; and end users will be able to tap from a competitive market providing constantly improved user experiences and AI-based audio endpoints.

MPAI is investigating several other draft projects in the area of coding of still and moving pictures, event sequences and other data such as interferometric data for gravitational-wave detection and genomic data. They are expected to become standard develop­ment projects as they mature.

About MPAI

MPAI is a non-profit, unaffiliated association whose goal is to establish a set of standards for advanced audio, video and data coding using artificial intelligence and to establish procedures that facilitate the timely and effective use of the standards it develops.

Any entity, such as corporation and individual firm, partnership, university, governmental body or international organisation supporting the mission of MPAI may apply for membership, provided that it is able to contribute to the development of technical specifications for the efficient use of data.

For further information, please contact leonardo@chiariglione.org and see https://mpai.community for MPAI and https://mpai.community/2020/09/12/mpai-cae/ for more details on MPAI-CAE.