Moving Picture, Audio and Data Coding
by Artificial Intelligence

MPAI Application Note #1 Context-based Audio Enhancement (MPAI-CAE)

MPAI Application Note #1

Context-based Audio Enhancement (MPAI-CAE)

Proponents: Michelangelo Guarise, Andrea Basso (VOLUMIO)

 Description: The overall user experience quality is highly dependent on the context in which audio is used, e.g.

  1. Entertainment audio can be consumed in the home, in the car, on public transport, on-the-go (e.g. while doing sports, running, biking) etc.
  2. Voice communications: can take place office, car, home, on-the-go etc.
  3. Audio and video conferencing can be done in the office, in the car, at home, on-the-go etc.
  4. (Serious) gaming can be done in the office, at home, on-the-go etc.
  5. Audio (post-)production is typically done in the studio
  6. Audio restoration is typically done in the studio

By using context information to act on the content using AI, it is possible substantially to improve the user experience.

Comments: Currently, there are solutions that adapt the conditions in which the user experiences content or service for some of the contexts mentioned above. However, they tend to be vertical in nature, making it dif­ficult to re-use possibly valuable AI-based components of the solutions for differ­ent applications.

MPAI-CAE aims to create a horizontal market of re-usable and possibly context-depending components that expose standard interfaces. The market would become more receptive to innov­ation hence more compet­itive. Industry and consumers alike will benefit from the MPAI-CAE stan­dard.

Examples

The following examples describe how MPAI-CAE can make the difference.

  1. Enhanced audio experience in a conference call

Often, the user experience of a video/audio conference can be marginal. Too much background noise or undesired sounds can lead to participants not understanding what participants are saying. By using AI-based adaptive noise-cancellation and sound enhancement, MPAI-CAE can virtually eliminate those kinds of noise without using complex microphone systems to capture environment characteristics.

  1. Pleasant and safe music listening while biking

While biking in the middle of city traffic, AI can process the signals from the environment captured by the microphones available in many earphones and earbuds (for active noise cancellation), adapt the sound rendition to the acoustic environment, provide an enhanced audio experience (e.g. performing dynamic signal equalization), improve battery life and selectively recognize and allow relevant environment sounds (i.e. the horn of a car). The user enjoys a satisfactory listening experience without losing contact with the acoustic surroundings.

  1. Emotion enhanced synthesized voice

Speech synthesis is constantly improving and finding several applications that are part of our daily life (e.g. intelligent assistants). In addition to improving the ‘natural sounding’ of the voice, MPAI-CAE can implement expressive models of primary emotions such as fear, happiness, sad­ness, and anger.

  1. Efficient 3D sound

MPAI-CAE can reduce the number of channels (i.e. MPEG-H 3D Audio can support up to 64 loudspeaker channels and 128 codec core channels) in an automatic (unsupervised) way, e.g. by mapping a 9.1 to a 5.1 or stereo (radio broadcasting or DVD), maintaining the musical touch of the composer.

  1. Speech/audio restoration

Audio restoration is often a time-consuming process that requires skilled audio engineers with specific experience in music and recording techniques to go over manually old audio tapes. MPAI-CAE can automatically remove anomalies from recordings through broadband denoising, declicking and decrackling, as well as removing buzzes and hums and performing spectrographic ‘retouching’ for removal of discrete unwanted sounds.

  1. Normalization of volume across channels/streams

Eighty-five years after TV has been first introduced as a public service, TV viewers are still strug­gling to adapt to their needs the different average audio levels from different broadcasters and, within a program, to the different audio levels of the different scenes.

MPAI-CAE can learn from user’s reactions via remote control, e.g. to a loud spot, and control the sound level accordingly.

  1. Automotive

Audio systems in cars have steadily improved in quality over the years and continue to be integrated into more critical applications. Toda, a buyer takes it for granted that a car has a good automotive sound system. In addition, in a car there is usually at least one and sometimes two microphones to handle the voice-response system and the hands-free cell-phone capability. If the vehicle uses any noise cancellation, several other microphones are involved.

MPAI-CAE can be used to improve the user experience and enable the full quality of current audio systems by reduc­ing the effects of the noisy automotive environment on the signals.

  1. Audio mastering

Audio mastering is still considered as an ‘art’ and the prerogative of pro audio engineers. Normal users can upload an example track of their liking (possibly obtained from similar musical content) and MPAI-CAE analyzes it, extracts key features and generate a master track that ‘sounds like’  the example track starting from the non-mastered track.  It is also possible to specify the desired style without an example and the original track will be adjusted accordingly.

Requirements: The following is an initial set of MPAI-CAE functional requirements to be further developed in the next few weeks. When the full set of requirements will be developed, the MPAI General Assembly will decide whether an MPAI-CAE standard should be developed.

  1. The standard shall specify the following natural input signals
    1. Microphone signals
    2. Inertial measurement signals (Acceleration, Gyroscope, Compass, …)
    3. Vibration signals
    4. Environmental signals (Proximity, temperature, pressure, light, …)
    5. Environment properties (geometry, reverberation, reflectivity, …)
  2. The standard shall specify
    1. User settings (equalization, signal compression/expansion, volume, …)
    2. User profile (auditory profile, hearing aids, …)
  3. The standard shall support the retrieval of pre-computed environment models (audio scene, home automation scene, …)
  4. The standard shall reference the user authentication standards/methods required by the specific MPAI-CAE context
  5. The standard shall specify means to authenticate the components and pipelines of an MPAI-CAE instance
  6. The standard shall reference the methods used to encrypt the streams processed by MPAI-CAE and service-related metadata
  7. The standard shall specify the adaptation layer of MPAI-CAE streams to delivery protocols of common use (e.g. Bluetooth, Chromecast, DLNA, …)

 Object of standard: Currently, three areas of standardization are identified:

  1. Context type interfaces: a first set of input and output signals, with corresponding syntax and semantics, for audio usage contexts considered of sufficient interest (e.g. audiocon­ferencing and audio consumption on-the-go). They have the following features
    1. Input and out signals are context specific, but with a significant degree of commonality across contexts
    2. The operation of the framework is implementation-dependent offering implementors the way to produce the set of output signals that best fit the usage context
  2. Processing component interfaces: with the following features
    1. Interfaces of a set of updatable and extensible processing modules (both traditional and AI-based)
    2. Possibility to create processing pipelines and the associated control (including the needed side information) required to manage them
    3. The processing pipeline may be a combination of local and in-cloud processing
  3. Delivery protocol interfaces
    1. Interfaces of the processed audio signal to a variety of delivery protocols

Benefits: MPAI-CAE will bring benefits positively affecting

  1. Technology providers need not develop full applications to put to good use their technol­ogies. They can concentrate on improving the AI technologies that enhance the user exper­ience. Further, their technologies can find a much broader use in application domains beyond those they are accustomed to deal with.
  2. Equipment manufacturers and application vendors can tap from the set of technologies made available according to the MPAI-CAE standard from different competing sources, integrate them and satisfy their specific needs
  3. Service providers can deliver complex optimizations and thus superior user experience with minimal time to market as the MPAI-CAE framework enables easy combination of 3rd party components from both a technical and licensing perspective. Their services can deliver a high quality, consistent user audio experience with minimal dependency on the source by selecting the optimal delivery method
  4. End users enjoy a competitive market that provides constantly improved user exper­iences and controlled cost of AI-based audio endpoints.

 Bottlenecks: the full potential of AI in MPAI-CAE would be unleashed by a market of AI-friendly processing units and introducing the vast amount of AI technologies into products and services.

 Social aspects: MPAI-CAE would free users from the dependency on the context in which they operate; make the content experience more personal; make the collective service experience less dependent on events affecting the individual participant and raise the level of past content to today’s expectations.

Success criteria: MPAI-CAE should create a competitive market of AI-based components expos­ing standard interfaces, processing units available to manufacturers, a variety of end user devices and trigger the implicit need felt by a user to have the best experience whatever the context.


MPAI launches Context-based Audio Enhancement standard project

Geneva, 2020/09/12

Formation of Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) was announced in July 2020. It is planned to be established as a non-profit organisation by the end of September 2020. It will develop technical specifications of data coding, especi­ally using Artificial Intelligence, and their integration in Information and Communication Technology systems, brid­ging the gap between its technical specifications and their practical use through Intellectual Property Rights Guidelines, such as Framework Licences.

Today MPAI announces that one use case – Context-based Audio Enhancement (MPAI-CAE) – has reached sufficient maturity to warrant the start of the next stage where detailed functional requirements are identified.

MPAI-CAE addresses a variety of consumer-oriented use cases, e.g. entertain­ment, voice com­munication, audio conferencing, gaming etc. relevant to different contexts – e.g., at home, in the car and on the go – that may greatly influence the audio experience. MPAI-CAE also addresses professional applications such as audio (post-)production and restoration.

The MPAI-CAE standard will specify

  1. Input and output interfaces for a set of contexts
  2. Interfaces of updatable and extensible processing modules, both traditional and AI-based, to create processing pipelines for possibly partly local and partly on-the-cloud execution
  3. Interfaces of the processed audio signals to a variety of delivery protocols.

MPAI envisages that technology providers will benefit from a wider usage of their technologies beyond their specific domains; application vendors adopting the emerging MPAI-CAE standard will be able to tap from the common set of technologies to support their specific needs; service providers will benefit from an accelerated delivery by being able to integrate third parties’ components from both a technical and licensing perspective; and end users will be able to tap from a competitive market providing constantly improved user experiences and AI-based audio endpoints.

MPAI is investigating several other draft projects in the area of coding of still and moving pictures, event sequences and other data such as interferometric data for gravitational-wave detection and genomic data. They are expected to become standard develop­ment projects as they mature.

About MPAI

MPAI is a non-profit, unaffiliated association whose goal is to establish a set of standards for advanced audio, video and data coding using artificial intelligence and to establish procedures that facilitate the timely and effective use of the standards it develops.

Any entity, such as corporation and individual firm, partnership, university, governmental body or international organisation supporting the mission of MPAI may apply for membership, provided that it is able to contribute to the development of technical specifications for the efficient use of data.

For further information, please contact leonardo@chiariglione.org and see https://mpai.community for MPAI and https://mpai.community/2020/09/12/mpai-cae/ for more details on MPAI-CAE.


MPAI launches Context-based Audio Enhancement standard project

Geneva, 2020/09/12 – Formation of Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) was announced in July 2020. It is planned to be established as a non-profit organisation by the end of September 2020. It will develop technical specifications of data coding, especi­ally using Artificial Intelligence, and their integration in Information and Communication Technology systems, brid­ging the gap between its technical specifications and their practical use through Intellectual Property Rights Guidelines, such as Framework Licences.

Today MPAI announces that one use case – Context-based Audio Enhancement (MPAI-CAE) – has reached sufficient maturity to warrant the start of the next stage where detailed functional requirements are identified.

MPAI-CAE addresses a variety of consumer-oriented use cases, e.g. entertain­ment, voice com­munication, audio conferencing, gaming etc. relevant to different contexts – e.g., at home, in the car and on the go – that may greatly influence the audio experience. MPAI-CAE also addresses professional applications such as audio (post-)production and restoration.

The MPAI-CAE standard will specify:

  1. Input and output interfaces for a set of contexts
  2. Interfaces of updatable and extensible processing modules, both traditional and AI-based, to create processing pipelines for possibly partly local and partly on-the-cloud execution
  3. Interfaces of the processed audio signals to a variety of delivery protocols.

MPAI envisages that technology providers will benefit from a wider usage of their technologies beyond their specific domains; application vendors adopting the emerging MPAI-CAE standard will be able to tap from the common set of technologies to support their specific needs; service providers will benefit from an accelerated delivery by being able to integrate third parties’ components from both a technical and licensing perspective; and end users will be able to tap from a competitive market providing constantly improved user experiences and AI-based audio endpoints.

MPAI is investigating several other draft projects in the area of coding of still and moving pictures, event sequences and other data such as interferometric data for gravitational-wave detection and genomic data. They are expected to become standard develop­ment projects as they mature.

About MPAI

MPAI is a non-profit, unaffiliated association whose goal is to establish a set of standards for advanced audio, video and data coding using artificial intelligence and to establish procedures that facilitate the timely and effective use of the standards it develops.

Any entity, such as corporation and individual firm, partnership, university, governmental body or international organisation supporting the mission of MPAI may apply for membership, provided that it is able to contribute to the development of technical specifications for the efficient use of data.

For further information, please contact leonardo@chiariglione.org and see https://mpai.community for MPAI and https://mpai.community/2020/09/12/mpai-cae/ for more details on MPAI-CAE.

 


MPAI at a technology and business watershed

For 30+ years, digital media have been the powerful driver that has fostered research, industry and commer­ce. The engine that has sustained the development could expand its coverage and provide new standards to a growing group of client industries. Academia and research, all facets of industry, and billions of users have benefited from this bonanza.

Unfortunately, the engine has run out of steam – technology-wise and business-wise.

Thirty years of practical data compression show the importance of the business that is built of data compression standards. Old technology has had its day. To renew it, we need two things: fresh new technologies, but also a fresh new approach to the field.

A new engine is coming to rescue. There is a vast group of technologies – going under the general name of Artificial Intelligence – that provide alternative and more promising approaches than statistical correlation. They go deeper understanding what are the physical phenomena we are trying to represent.

Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) is the vehicle designed to implement the plan. It is a win-win proposal because Digital Media gets more performing technologies and Artificial Intelligence extends the range where its technologies are applied – not just digital media, but also other data types whose use can be more effective if converted to a more efficient representation.

The MPAI Statutes define data coding as the transformation of data from one representation into another representation that is more convenient for a particular purpose. Reducing the amount of data, a.k.a. compression, is one purpose that has proved to be very important to billions of people, but there are many other purposes. Having AI as the underlying technology layer will ensure that AI technologies for data coding will have wider applications, practical deployment will be accelerated and interoperability improved.

This is the grand plan, but we should not forget that the devil is in the details. MPEG has shown that technically excellent standards are no guarantee that their access will be easy and their use possible. Therefore, MPAI abandons the old FRAND approach because it does not guarantee that a licence for a supposed FRAND standard will be available. It embraces instead the Framework Licence approach where IPR holders agree to a business model, and possibly a cap to the total cost of a licence, _before_ the work on the standard starts.

MPAI attacks the main issue of the digital world – data representation, i.e. coding – and leverages AI to get the best results achievable in the current time frame. However, it has learnt the lesson: industry is no longer available to wait for the terms after the standard is done. They want to know more before starting the work.


MPAI kicks off 5 areas of study

MPAI held its 3rd preparatory meeting on 2020/08/24T14:00-16:00 UTC and decided to initiate or continue the following five areas of activity

  1. Development of a bibliography of papers and patents on “Video coding by AI”
  2. Development of an annotated list of entities – associations and institutes – who have AI in their mission
  3. Development of use cases of AI-enabled data compression (so far some 30 use cases have been collected). Each use case is and will be described according to the following format
    1. Proponent: name of the member who proposed the use case
    2. Application description: a detailed description of the use case
    3. Comments: any comment that may clarify the use case
    4. What is standardised: as each use case is a candidate to enter the MPAI work plan, we need to understand what interface, format etc. requires standardisation
    5. Characterization table: a collection of relevant characteristics
      1. Expected benefits
      2. Volume
      3. Quality Criteria
      4. Maturity
      5. Relevance of AI
      6. Users
      7. Players
      8. Level of interest
  4. Identification and description, including structure, of all data types that are potentially the target of MPAI standardisation (the compression, not the data)
  5. Identification of conditions for MPAI standardisation. Standard for compressing data by AI are likely to be driven by a different logic than done so far in a non-AI context. Three conditions have already been identified.

A series of meeting has been planned (all meetings at 14:00 UTC)

Work area Confcalls
Use cases of AI-enabled data compression (audio) 2020/09/01
Use cases of AI-enabled data compression (video) 2020/09/02
4th MPAI preparatory meeting (Statutes) 2020/08/25
5th MPAI preparatory meeting 2020/09/09
6th MPAI preparatory meeting 2020/09/23
Constitutive General Assembly 2020/09/??

For more details please contact Leonardo

 


The two main MPAI purposes

One reason for creating MPAI is to respond to the needs of the MPEG constituency, ill-served by ISO’s self-imposed “FRAND” constraints and by its lack of reaction to the changes of the industry induced by MPEG standards and the effects wrought by industry changes on MPEG itself. MPAI intends to reverse the trend that has made progressively harder, especially for some industries, to use MPEG standards. MPAI does not believe that the alternative of offering “royalty free” standards to industries is sustainable even in the short term.

MPAI takes an antipodal attitude to MPEG with respect to the nature of the requirements that drive the work on a standard. In MPEG functional requirements, made widely known to industry, used to drive the development of a standard. Users were left to “discover” the commercial terms when the standard was done, possibly 4 years after the start of the work, but actually much later than that because of the time it usually took to develop licence(s) and in some cases, never.

MPAI would love to make both functional and commercial requirements available to users. However, providing a full set of commercial requirements may not by supported by antitrust regulations. Therefore, MPAI comes as close as possible to that by making known to users the business model, that MPAI calls Framework Licence (FWL), that IPR holders will eventually apply in their licence(s). The FWL does not contain the monetary values and other data that would be frown upon by antitrust authorities.

These are the main features of the MPAI FWL

  1. As a minimum, the FWL will state that the total cost of the license(s) will be in line with the total cost of the licenses for similar data coding/decoding technologies, considering the market value of the specific technology. While this is the minimum, the FWL may go as far as to provide a cap on the total licence cost.
  2. The FWL will also state that access to the standard shall be granted in a non-discriminatory fashion.
  3. The FWL may envisage that IPR holders make available their patents if all IPR holders agree to do so without requiring a licence. Of course, if certain events specified in the FWL happen, e.g. IPR holders may decide to withdraw their offer. Therefore, the FWL specifies the terms of the licence, but not the values, that IPR holders will make available in case such events happen.
  4. Documents submitted by MPAI members that relate to a standard shall contain a declaration that the proponent will make available the terms of the Licence related to their patents according to the FWL, alone or jointly with other IPR holders after the standard is approved and not after commercial implementations of the standard become available on the market.
  5. Each member will declare it will take a Licence for the patents held by other members, if used, within one year from the publication by patent holders of their licence terms. Non-members remain obligated to acquire licences to use MPAI standards as mandated by the legislation of the territories in which they use MPAI standards.
  6. Each MPAI member shall inform the Secretariat of the result of its best effort and transparent identification of IP that it believes is infringed by a standard that is being or has already been developed by MPAI.
  7. Finally, when the MPAI standard is approved, IPR holders express their preference on the entity that should administer the patent pool of the standard.

So far, we have talked about how MPAI intends to work, but that is not the only driver. MPAI intends to work differently also on the content of the standards.

After decades of hardly visible work by researchers, Artificial Intelligence (AI) is arousing the attention of the public at large. Various AI technologies have been and are being investigated with the goal to provide more efficient and more intelligent compression. MPAI retains the proposal made by the Italian Standards Organisation UNI in 2018 to consider coding as a single field of which instances are: images, moving pictures, audio, 3D Graphics and other data such as those generated in manufacturing, automotive, health and other fields, and generic data.

Even though MPAI has not been formally incorporated, experts are busy collecting use cases where AI-enabled coding can provide new solutions that enhance industry performance while benefitting end users.


Leaving FRAND for good

Fair, reasonable and non-discriminatory (FRAND) is the combination of adjectives that have been commonly used to indicate the way patent holders intended to licence their technologies in standards produced by many standardisation bodies and industry fora.

Decades ago, these adjectives were easily applicable to a standard. When JVC submitted their VHS cassette recording system to IEC for standardisation and IEC produced IEC 60774 – Helical-scan video tape cassette system using 12,65 mm (0,5 in) magnetic tape on type VHS, JVC submitted an FRAND declaration. The companies manufacturing VHS recorders knew exactly whom to call to get a licence. It was a small world with a limited number of players who knew each other since decades.

The same FRAND declaration was used in the MPEG-2 case. But the world at that time (1995) had already changed. As MPEG adopted the policy “any technology that proves its worth and is accompanied by an FRAND declaration”, not only the number of licensors skyrocketed to about 30, but licensors included Consumer Electronic companies, telcos, telco manufacturers, IT companies, some Non-Performing Entities (NPE) and a university.

MPEGLA created an MPEG-2 patent pool that gathered most patent holders and contributed to make MPEG-2 the first big success in a converged world.

Today the situation is again different than 25 years ago. As I have already said a few times, the HEVC standard has almost two times as many the number of patent holders of MPEG-2, three patent pools and several patent holders who did not join any patent pool.

Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI), is a non-profit organisation to be established soon in Geneva. MPAI has identified the standardisation process based on accepting FRAND declarations as unable to cope with the tectonic changes that have affected the industry during the 30 years since work on the successful MPEG-2 standard started.

MPAI acknowledges the value of unconstrained collaboration adopted by MPEG. This should be retained, however, leaving to the market the task of sorting out all commercial aspects related to a standard may lead to situations of confusion like in HEVC and MPEG-H 3D Audio.

Let’s consider the main steps of the MPEG process

  1. Develop the functional requirements of a standard
  2. Accept the organisation’s commercial requirements in the form of FRAND declaration
  3. Publish a Call for Proposals asking for technologies that satisfy the functional and commercial requirements
  4. Develop a standard that satisfies the functional and commercial requirements, i.e. using technologies that proponents have declared they will licence at FRAND terms.

The MPEG process assumed that the market would find a way to remunerate patent holders. This did happen, until patent pools stopped working, as the HEVC case and MPEG 3D Audio case demonstrate. The MPEG-style commercial requirements are no match to the needs of the industry today.

MPAI believes that it should be possible to move the development of some commercial aspects that do not affect the constraints of the competition law to a phase that precedes the technical work. This does not mean that commercial aspects will be mixed with the development of standards.

In other words, MPAI adopts the following modifications of the steps of the MPEG standardisation process (in red the MPAI modifications)

  1. Develop the functional requirements of a standard
  2. Develop commercial requirements, in the form of  a “Framework Licence” (FWL)
  3. Publish a Call for Proposals requesting technologies that satisfy the functional and commercial requirements
  4. Develop a standard that satisfies the functional and commercial requirements, i.e. using technologies the proponents have declared they will license in line with the FWL.

The FWL is the business model to remunerate IPRs in the standard that does not bear values: no $, no %, no dates etc. A FWL is obviously standard specific.

Thus, the MPAI policy becomes “any technology provided it proves its worth and is accompanied by a Framework Licence declaration”.

The table compares the two processes:

# MPEG process MPAI process
1 Develop the functional requirements of a standard Develop the functional requirements of a standard
2 Adopt ISO’s commercial requirements in the form of FRAND declarations Develop commercial requirements, in the form of a “Framework Licence” (FWL)
3 Publish a Call for Proposals requesting technologies that satisfy the functional and commercial requirements Publish a Call for Proposals requesting technologies that satisfy the functional and commercial requirements
4 Develop a standard that satisfies the functional and commercial requirements, i.e. using technologies that proponents have declared they will license at FRAND terms. Develop a standard that satisfies the functional and commercial requirements, i.e. using technologies that proponents have declared they will license at FWL terms.

The MPAI process is a major improvement to the MPEG process because the big problem of simultaneously defining the business model and the price of the patented technologies, is split in two: the business model before work on the standard starts (doable because the functional requirements are known), and the monetary values after the work on the standard is finished (doable because every patent holder at that time can make their assessment of the worth of their technology).

There are advantages for users of the standard as well because they know the business model of the standard before work on the standard even starts. Actually, the FWL can even set a cap to the total cost of the standard.

In a multi-polar world where there are multiple sources of coding standards, users can only be attracted if they know what they are committing to before work on the standard starts. It is no longer possible to promise technical wonders, today, and discover a business disaster, years later.

MPAI offers a process that accelerates the practical use of standardised technologies benefitting industry and end users alike.


Better information from data

What is data

Data can be defined as the digital representation of an entity. The entity can have different attributes be physical, virtual, logical or other.

A river may be represented by its length, its average width, its max, min, average flow, the coordinates of its bed from the source to its mouth and so on. Typically, different data of an entity are captured depending on the intended use of the data. If the use of a river data is for agriculture, the depth of the river, the flow during the seasons, the width at a certain point, the nature of the soil etc. are likely to be important.

Video and audio intended for consumption by humans are data types characterised by a large amount of data, typically samples of the visual and audio information: tens/hundreds/thousands/millions of Mbit/s for video, and tens/hundreds/thousands of kbit/s for audio. If we exclude niche cases, this amount of data is unsuited to storage and transmission.

High-speed sequencing machines produce snapshots of randomly taken segments with unknown coordinates of a DNA sample. As the “reading” process is noisy, the value of each nucleotide is assigned a “quality value” typically expressed by an integer. As a nucleotide must be read several tens of times to achieve a sufficient degree of confidence, the size of whole genome sequencing files may reach TeraBytes. This digital representation of a DNA sample made of unordered and unaligned reads is unsuited to storage and transmission but is also unsuited to extracting vital information for a medical doctor to make a diagnosis.

Data and information

Data become information when their representation makes them suitable to a particular use.

Tens of thousands of researcher-years were invested in studying the problem and finding practical ways conveniently to represent audio and visual data.

For several decades, facsimile compression became the engine that drove efficiency in the office by offering high quality prints at 1.6 of the early analogue facsimile machines.

Reducing the audio data rate by a factor of 10-20, as offered by MP3 and AAC, changed the world of music forever. Reducing the video data rate by a factor of 1,000, as achieved by the latest MPEG-I VVC standard, multiplies the way humans can have visual experience. Surveillance applications developed alternative ways to represent audio and video that allowed, for instance, event detection, object counting etc.

The MPEG-G standard, developed to convert DNA reads into information that needs less bytes to be represented, also gives easier access to information that is of immediate interest to a medical doctor.

These examples of transformation of data from “dull” into “smart” collection of numbers has largely been achieved by using the statistical properties of the data or their transformations. Although quite dated, the method used to compress facsimile information is emblematic. The Group 3 facsimile algorithm defines two statistical variables: the length of “white” or “black” runs, i.e. the number of white/black points following and including the first white/black point after a black point) until a black/white point is encountered. A machine that had been trained to read billions of pages could develop an understanding of how business documents are typically structured and probably be able to use less bits (and probably more meaningful to an application) to represent a page of a business document.

CABAC is another more sophisticated example of data compression using statistical method. CA in CABAC stands for “Context-Adaptive”, i.e. the code set is adapted to the local statistics and B stands for Binary, because all variable are converted to and handled in binary form. CABAC can be applied to statistical variables which do not have uniform statistical characteristics. A machine that had been trained by observing how the statistical variable changes depending on other parameters should probably be able to use less and more meaningful bits to represent the data.

The end of data coding as we know it?

Many approaches at data coding were developed having in mind to exploit some statistical properties in some data transformations. The two examples given above show that Machine Learning can probably be used to provide more efficient solutions to data coding than traditional methods could provide.

It should be clear that there is no reason to stay confined to the variables exploited in the “statistical” phase of data coding. There are probably better ways to use machines’ learning capabilities on data processed by different methods.

The MPAI work plan is not set yet, but one proposal is to see, on the one hand, how far Artificial Intelligence can be applied to the “old” variables and how far fresh approaches can transform data into better usable information.

Smart application of Artificial Intelligence promises to do a better job in converting data into information than statistical approaches have done so far. Delivering on the promise is MPAI’s mission.


The MPAI framework licence

Introduction

The main features of MPAI – Moving Picture, Audio and Data Coding by Artificial Intelligence – are the focus on efficient representation of moving pictures, audio and data in general using Artificial Intelligence technologies and the will to find a point where the interest of holders of IPR in technologies essential to a standard and the interest of users of the standard are equally upheld.

In this article I will analyse the principal tool devised by MPAI to achieve the latter goal, the framework licence.

The MPEG process

The process used by MPEG to develop its standards used to be simple and effective. As there were just too many IPRs in video coding back in the late 1980s, MPEG did not even consider the possibility to develop a royalty free standard. Instead it assessed any technology proposed and added it to the standard if the technology provided measurable benefits. MPEG did so because it expected that the interest of industry users and IPR holders would converge to a point satisfactory to both. This did indeed happen by reviving the institute of patent pools.

Therefore, the MPEG process can be summarised by this sequence of steps

Define requirements – Call for technologies – Receive patent declarations – Develop the standard – (Develop licence)

The brackets surrounding “Develop” indicate that MPEG has no business in that step. On the other hand, the entire process relied on the expectation that patent holders could be remunerated.

Lately, many including myself, have pointed out that last step of the process, has stalled. The fact that individually all patent holders declare to be willing to licence their patents at FRAND terms does not automatically translate into the only thing users need – a licence to use the standard.

A tour of existing licences

Conceptually a licence can be separated in two components. The first describes the business model that the patent holders apply to obtain their remuneration. The second determines the levels of remuneration.

Let’s take two relevant examples: the HEVC summary licences published by the two MPEG LA and HEVC Advance patent pools on their web sites where I have hidden the values in dollars, percentages and dates and replaced with variable. In the following you will find my summary. My wording is deliberately incomplete because my intention is to convey the essence of the licences and probably imperfect as I am not a lawyer. If you think I made a serious mistake or an important omission please send an email to Leonardo.

MPEG LA

  • Licence has worldwide coverage and includes right to make, use and sell
  • Royalty paid for products includes right to use encoders/dec­oders for content
  • Products is sold to end users by a licensee
  • Vendors of products that contain an encoder/decoder may pay royalties on behalf of their customers
  • Royalties of R$/unit start from a date and apply if licensee sells more than N units/year or the royalties paid are below a cap of C$
  • The royalty program is divided in terms, the first of which end on a date
  • The percent increase from one term to another is less than x%

HEVC Advance

  • Licence applies to
    • Encoder/decoders in consumer products with royalty rates that depend on the type of device
    • Commercial content distribution (optical discs, video packs etc.)
  • Commercial products used to create or distribute content and streaming are not licensed
  • Licence covers all of licensor(s)’ essential claims of the standard practiced by a licensee
  • Royalty rates
    • Rates depend on territory in which consumer product/content is first sold (y% less in less developed countries)
    • Rates include separate and non-additive discounts of z% for being in-compliance and standard rates if licence is not in-compliance
    • Base rates for baseline profiles and extended rates for advanced profiles
    • Optional features (e.g. SEI messages) have a separate royalty structure
    • Rates and caps will not increase more than z% for any renewal term
    • Multiple cap categories (different devices and content) and single enterprise cap
    • All caps apply to total royalties due on worldwide sales for a single enterprise
    • Standard rates are not capped
    • Annual Credit of E$ applies to all enterprises that owe royalties and are in-compliance provided in four equal quarterly installments of 25E$ each
  • Licences
    • Licenses are for n year non-terminable increments, under the same n year term structure
    • The initial n year term ends yyyy/01/01 and the first n-year renewal term ends yyyy+n/01/01

What is a framework licence?

A framework licence is the business model part of a licence. The MPEGLA and HEVC Advance licences in the form summarised above can be taken as examples of framework licences.

Therefore, a framework licence does not, actually shall not (for antitrust reasons) contain any value of dollars, percentages and dates.

How does MPAI intend to use framework licences?

MPAI brings the definition of the business model part of a licence (that used to be done after an MPEG standard was developed) at the point in the process between definition of requirements and call for technologies. In other words, the MPAI process becomes

Define requirements – Define framework licence – Call for technologies – Receive patent declarations – Develop the standard – (Develop licence)

As was true in MPEG, MPAI does not have any business in the last step in brackets.

Let’s have a closer look at how a framework licence is developed and used. First of all, active MPAI members, i.e. those who will participate in the technical development, are identified. Active members develop the licence and adopt it by a qualified majority

Jointly develop and adopt the FWL by a qualified majority.

Members who make a technical contribution to the standard must make a two-fold declaration that they will

  1. make available the terms of the licence related to their essential patents according to the FWL, alone or jointly with other IPR holders (i.e. in a patent pool), after the approval of the standard by MPAI and in no event after the commercial implementation of the standard.
  2. Take a licence for the essential patents held by other MPAI members, if used, within the term specified in the FWL from the publication by IPR holders of their licence terms. Evaluation of essentiality shall be made by an independent chartered patent attorney who never worked for the owner of such essential patent or by a chartered patent attorney selected by a patent pool.

What problem a framework licence solves

The framework licence approach is not a complete solution of the problem of providing a timely licence to data representation standards, it is a tool that facilitates reaching that goal.

When MPAI decides to develop a standard, it must know what purpose the standard serves, in other words it must have precise requirements. These are used to call for technologies but can also be used by IPR holders to define in a timely fashion how they intend to monetise their IP, in other words to define their business model.

Of course, the values of royalties, caps, dates etc. are important and IPR holders in a patent pool will need significant amounts of discussions to achieve a common view. However, unlike the HEVC case above, the potentially very significant business model differences no longer influence the discussions.

Users of the standard can know in advance how the standard can be used. The two HEVC cases presented above show that the licences can have very different business models and that some users may be discouraged from using – and therefore not wait for – the standard, if they know the business model. Indeed, a user is not only interested in the functional requirements but also in the commercial requirements. The framework licence tells the usage conditions, not the cost.

However, some legal experts think that the framework licence could include a minimum and maximum value of the licence without violating regulatory constraints. Again, this would not tell a user the actual cost, but a bracket.

Further readings

More information on the framework licence can be found on the MPAI Operation page where the complete MPAI workflow is described, or from the MPAI Statutes.


MPAI for the future of media

The four reasons to create MPAI

The first reason

As much as VC1, VP8/VP9 and AV1 were developed because MPEG and/or its ecosystem were not providing the solutions that the market demanded, MPAI responds to the need to have usable standards that allow industry and consumers to benefit from technological progress.

The second reason

The body producing standards of such an industrial and social importance should be credible. MPEG is no more, and its unknown SC 29 replacement operates in ISO, an organisation that is lacking governance.

MPAI offers a solid and clear governance that ensures that decisions affecting members are made by the members.

The third reason

The standards produced by the body should be usable. For many years MPEG has struggled with the problem of excellent standards with a complicated IP landscape. Two of the main components of MPEG-H, the latest integrated MPEG project – part 2 video (HEVC) and part 3 audio (3D Audio) – have failed exactly because of that. The hope to see a licence for Part 3 video (VVC) of the next integrated MPEG project – MPEG-I – is in the mists of an unknown future and may well tread the same path.

MPAI’s framework licences, developed and adopted by those who will develop the technical specification, significantly reduces the uncertainties that have plagued the definition of MPEG standard licences.

The fourth reason

We need a North Star guiding the industry in the years to come. Thirty-two years ago, the start of MPEG was a watershed. Digital technologies promised to provide more attractive moving pictures and audio, more conveniently and with more features to the many different and independent industries who were used to handle a host of incompatible audio-visual services. MPEG has delivered much more than promised. By following the MPEG North Star, industry has got a unified technology platform on which different industries can build and extend their business.

MPAI is the new watershed, probably bigger than MPEG’s 32 years ago. Artificial Intelligence technologies already demonstrate that it is possible to do better and more than traditional digital technologies. But there is a difference. In the last 32 years digital audio and video have offered wonders, still they kept the two information streams isolated from the rest of the information reaching the user. With artificial intelligence, audio and video have the potential to seamlessly integrate with the many other information types handled by a device on a unified technology platform. How? Leave it to digital media and artificial intelligence experts, which have started to become an integrated community, to open their respective domains to other technologies.

Forget the past 

It would be nice – and many, I for one, would thank for it – if someone undertook to solve the open problems in the use of digital media standards past. I am afraid this is an intricate problem without a unified point from which one can attempt to find a solution.

But is that a worthwhile effort? One way or another, industry has interoperable audio-visual technologies for its current needs, some even say more than it needs.

Look to the future

Let’s look to the future, because we can still give it the shape we want. The MPAI statutes suggest exactly that when they define the MPAI purpose as developing technical specifications of coded representation of moving pictures, audio and data, especially using artificial intelligence.

The task for MPAI is to call the large community of researchers from industry and academia to reach the goal to develop standards that provide a quantum leap in user experience by doing better and offering more than done so far, and by achieving a deeper integration of information sources reaching the user. The goal can only be achieved if there is a new organisation that has the spirit, the enthusiasm and the effectiveness of the old one to deliver on the new promises.

That is the ideal reason to create MPAI. A more prosaic but vital reason to do it is that standards should also be usable – and MPAI promises to make that possible.