Moving Picture, Audio and Data Coding
by Artificial Intelligence

All posts

Better information from data

What is data

Data can be defined as the digital representation of an entity. The entity can have different attributes be physical, virtual, logical or other.

A river may be represented by its length, its average width, its max, min, average flow, the coordinates of its bed from the source to its mouth and so on. Typically, different data of an entity are captured depending on the intended use of the data. If the use of a river data is for agriculture, the depth of the river, the flow during the seasons, the width at a certain point, the nature of the soil etc. are likely to be important.

Video and audio intended for consumption by humans are data types characterised by a large amount of data, typically samples of the visual and audio information: tens/hundreds/thousands/millions of Mbit/s for video, and tens/hundreds/thousands of kbit/s for audio. If we exclude niche cases, this amount of data is unsuited to storage and transmission.

High-speed sequencing machines produce snapshots of randomly taken segments with unknown coordinates of a DNA sample. As the “reading” process is noisy, the value of each nucleotide is assigned a “quality value” typically expressed by an integer. As a nucleotide must be read several tens of times to achieve a sufficient degree of confidence, the size of whole genome sequencing files may reach TeraBytes. This digital representation of a DNA sample made of unordered and unaligned reads is unsuited to storage and transmission but is also unsuited to extracting vital information for a medical doctor to make a diagnosis.

Data and information

Data become information when their representation makes them suitable to a particular use.

Tens of thousands of researcher-years were invested in studying the problem and finding practical ways conveniently to represent audio and visual data.

For several decades, facsimile compression became the engine that drove efficiency in the office by offering high quality prints at 1.6 of the early analogue facsimile machines.

Reducing the audio data rate by a factor of 10-20, as offered by MP3 and AAC, changed the world of music forever. Reducing the video data rate by a factor of 1,000, as achieved by the latest MPEG-I VVC standard, multiplies the way humans can have visual experience. Surveillance applications developed alternative ways to represent audio and video that allowed, for instance, event detection, object counting etc.

The MPEG-G standard, developed to convert DNA reads into information that needs less bytes to be represented, also gives easier access to information that is of immediate interest to a medical doctor.

These examples of transformation of data from “dull” into “smart” collection of numbers has largely been achieved by using the statistical properties of the data or their transformations. Although quite dated, the method used to compress facsimile information is emblematic. The Group 3 facsimile algorithm defines two statistical variables: the length of “white” or “black” runs, i.e. the number of white/black points following and including the first white/black point after a black point) until a black/white point is encountered. A machine that had been trained to read billions of pages could develop an understanding of how business documents are typically structured and probably be able to use less bits (and probably more meaningful to an application) to represent a page of a business document.

CABAC is another more sophisticated example of data compression using statistical method. CA in CABAC stands for “Context-Adaptive”, i.e. the code set is adapted to the local statistics and B stands for Binary, because all variable are converted to and handled in binary form. CABAC can be applied to statistical variables which do not have uniform statistical characteristics. A machine that had been trained by observing how the statistical variable changes depending on other parameters should probably be able to use less and more meaningful bits to represent the data.

The end of data coding as we know it?

Many approaches at data coding were developed having in mind to exploit some statistical properties in some data transformations. The two examples given above show that Machine Learning can probably be used to provide more efficient solutions to data coding than traditional methods could provide.

It should be clear that there is no reason to stay confined to the variables exploited in the “statistical” phase of data coding. There are probably better ways to use machines’ learning capabilities on data processed by different methods.

The MPAI work plan is not set yet, but one proposal is to see, on the one hand, how far Artificial Intelligence can be applied to the “old” variables and how far fresh approaches can transform data into better usable information.

Smart application of Artificial Intelligence promises to do a better job in converting data into information than statistical approaches have done so far. Delivering on the promise is MPAI’s mission.


The MPAI framework licence

Introduction

The main features of MPAI – Moving Picture, Audio and Data Coding by Artificial Intelligence – are the focus on efficient representation of moving pictures, audio and data in general using Artificial Intelligence technologies and the will to find a point where the interest of holders of IPR in technologies essential to a standard and the interest of users of the standard are equally upheld.

In this article I will analyse the principal tool devised by MPAI to achieve the latter goal, the framework licence.

The MPEG process

The process used by MPEG to develop its standards used to be simple and effective. As there were just too many IPRs in video coding back in the late 1980s, MPEG did not even consider the possibility to develop a royalty free standard. Instead it assessed any technology proposed and added it to the standard if the technology provided measurable benefits. MPEG did so because it expected that the interest of industry users and IPR holders would converge to a point satisfactory to both. This did indeed happen by reviving the institute of patent pools.

Therefore, the MPEG process can be summarised by this sequence of steps

Define requirements – Call for technologies – Receive patent declarations – Develop the standard – (Develop licence)

The brackets surrounding “Develop” indicate that MPEG has no business in that step. On the other hand, the entire process relied on the expectation that patent holders could be remunerated.

Lately, many including myself, have pointed out that last step of the process, has stalled. The fact that individually all patent holders declare to be willing to licence their patents at FRAND terms does not automatically translate into the only thing users need – a licence to use the standard.

A tour of existing licences

Conceptually a licence can be separated in two components. The first describes the business model that the patent holders apply to obtain their remuneration. The second determines the levels of remuneration.

Let’s take two relevant examples: the HEVC summary licences published by the two MPEG LA and HEVC Advance patent pools on their web sites where I have hidden the values in dollars, percentages and dates and replaced with variable. In the following you will find my summary. My wording is deliberately incomplete because my intention is to convey the essence of the licences and probably imperfect as I am not a lawyer. If you think I made a serious mistake or an important omission please send an email to Leonardo.

MPEG LA

  • Licence has worldwide coverage and includes right to make, use and sell
  • Royalty paid for products includes right to use encoders/dec­oders for content
  • Products is sold to end users by a licensee
  • Vendors of products that contain an encoder/decoder may pay royalties on behalf of their customers
  • Royalties of R$/unit start from a date and apply if licensee sells more than N units/year or the royalties paid are below a cap of C$
  • The royalty program is divided in terms, the first of which end on a date
  • The percent increase from one term to another is less than x%

HEVC Advance

  • Licence applies to
    • Encoder/decoders in consumer products with royalty rates that depend on the type of device
    • Commercial content distribution (optical discs, video packs etc.)
  • Commercial products used to create or distribute content and streaming are not licensed
  • Licence covers all of licensor(s)’ essential claims of the standard practiced by a licensee
  • Royalty rates
    • Rates depend on territory in which consumer product/content is first sold (y% less in less developed countries)
    • Rates include separate and non-additive discounts of z% for being in-compliance and standard rates if licence is not in-compliance
    • Base rates for baseline profiles and extended rates for advanced profiles
    • Optional features (e.g. SEI messages) have a separate royalty structure
    • Rates and caps will not increase more than z% for any renewal term
    • Multiple cap categories (different devices and content) and single enterprise cap
    • All caps apply to total royalties due on worldwide sales for a single enterprise
    • Standard rates are not capped
    • Annual Credit of E$ applies to all enterprises that owe royalties and are in-compliance provided in four equal quarterly installments of 25E$ each
  • Licences
    • Licenses are for n year non-terminable increments, under the same n year term structure
    • The initial n year term ends yyyy/01/01 and the first n-year renewal term ends yyyy+n/01/01

What is a framework licence?

A framework licence is the business model part of a licence. The MPEGLA and HEVC Advance licences in the form summarised above can be taken as examples of framework licences.

Therefore, a framework licence does not, actually shall not (for antitrust reasons) contain any value of dollars, percentages and dates.

How does MPAI intend to use framework licences?

MPAI brings the definition of the business model part of a licence (that used to be done after an MPEG standard was developed) at the point in the process between definition of requirements and call for technologies. In other words, the MPAI process becomes

Define requirements – Define framework licence – Call for technologies – Receive patent declarations – Develop the standard – (Develop licence)

As was true in MPEG, MPAI does not have any business in the last step in brackets.

Let’s have a closer look at how a framework licence is developed and used. First of all, active MPAI members, i.e. those who will participate in the technical development, are identified. Active members develop the licence and adopt it by a qualified majority

Jointly develop and adopt the FWL by a qualified majority.

Members who make a technical contribution to the standard must make a two-fold declaration that they will

  1. make available the terms of the licence related to their essential patents according to the FWL, alone or jointly with other IPR holders (i.e. in a patent pool), after the approval of the standard by MPAI and in no event after the commercial implementation of the standard.
  2. Take a licence for the essential patents held by other MPAI members, if used, within the term specified in the FWL from the publication by IPR holders of their licence terms. Evaluation of essentiality shall be made by an independent chartered patent attorney who never worked for the owner of such essential patent or by a chartered patent attorney selected by a patent pool.

What problem a framework licence solves

The framework licence approach is not a complete solution of the problem of providing a timely licence to data representation standards, it is a tool that facilitates reaching that goal.

When MPAI decides to develop a standard, it must know what purpose the standard serves, in other words it must have precise requirements. These are used to call for technologies but can also be used by IPR holders to define in a timely fashion how they intend to monetise their IP, in other words to define their business model.

Of course, the values of royalties, caps, dates etc. are important and IPR holders in a patent pool will need significant amounts of discussions to achieve a common view. However, unlike the HEVC case above, the potentially very significant business model differences no longer influence the discussions.

Users of the standard can know in advance how the standard can be used. The two HEVC cases presented above show that the licences can have very different business models and that some users may be discouraged from using – and therefore not wait for – the standard, if they know the business model. Indeed, a user is not only interested in the functional requirements but also in the commercial requirements. The framework licence tells the usage conditions, not the cost.

However, some legal experts think that the framework licence could include a minimum and maximum value of the licence without violating regulatory constraints. Again, this would not tell a user the actual cost, but a bracket.

Further readings

More information on the framework licence can be found on the MPAI Operation page where the complete MPAI workflow is described, or from the MPAI Statutes.


MPAI for the future of media

The four reasons to create MPAI

The first reason

As much as VC1, VP8/VP9 and AV1 were developed because MPEG and/or its ecosystem were not providing the solutions that the market demanded, MPAI responds to the need to have usable standards that allow industry and consumers to benefit from technological progress.

The second reason

The body producing standards of such an industrial and social importance should be credible. MPEG is no more, and its unknown SC 29 replacement operates in ISO, an organisation that is lacking governance.

MPAI offers a solid and clear governance that ensures that decisions affecting members are made by the members.

The third reason

The standards produced by the body should be usable. For many years MPEG has struggled with the problem of excellent standards with a complicated IP landscape. Two of the main components of MPEG-H, the latest integrated MPEG project – part 2 video (HEVC) and part 3 audio (3D Audio) – have failed exactly because of that. The hope to see a licence for Part 3 video (VVC) of the next integrated MPEG project – MPEG-I – is in the mists of an unknown future and may well tread the same path.

MPAI’s framework licences, developed and adopted by those who will develop the technical specification, significantly reduces the uncertainties that have plagued the definition of MPEG standard licences.

The fourth reason

We need a North Star guiding the industry in the years to come. Thirty-two years ago, the start of MPEG was a watershed. Digital technologies promised to provide more attractive moving pictures and audio, more conveniently and with more features to the many different and independent industries who were used to handle a host of incompatible audio-visual services. MPEG has delivered much more than promised. By following the MPEG North Star, industry has got a unified technology platform on which different industries can build and extend their business.

MPAI is the new watershed, probably bigger than MPEG’s 32 years ago. Artificial Intelligence technologies already demonstrate that it is possible to do better and more than traditional digital technologies. But there is a difference. In the last 32 years digital audio and video have offered wonders, still they kept the two information streams isolated from the rest of the information reaching the user. With artificial intelligence, audio and video have the potential to seamlessly integrate with the many other information types handled by a device on a unified technology platform. How? Leave it to digital media and artificial intelligence experts, which have started to become an integrated community, to open their respective domains to other technologies.

Forget the past 

It would be nice – and many, I for one, would thank for it – if someone undertook to solve the open problems in the use of digital media standards past. I am afraid this is an intricate problem without a unified point from which one can attempt to find a solution.

But is that a worthwhile effort? One way or another, industry has interoperable audio-visual technologies for its current needs, some even say more than it needs.

Look to the future

Let’s look to the future, because we can still give it the shape we want. The MPAI statutes suggest exactly that when they define the MPAI purpose as developing technical specifications of coded representation of moving pictures, audio and data, especially using artificial intelligence.

The task for MPAI is to call the large community of researchers from industry and academia to reach the goal to develop standards that provide a quantum leap in user experience by doing better and offering more than done so far, and by achieving a deeper integration of information sources reaching the user. The goal can only be achieved if there is a new organisation that has the spirit, the enthusiasm and the effectiveness of the old one to deliver on the new promises.

That is the ideal reason to create MPAI. A more prosaic but vital reason to do it is that standards should also be usable – and MPAI promises to make that possible.


A guided tour in MPAI

The Statutes give all needed details, but reading 15 dense pages is not a suggestion most will accept. Therefore, the purpose of this post is to provide a summary tour of how MPAI operates.

MPAI has two classes of Members: Principal and Associate. Principal Members can vote. Associate Members cannot vote. Both can participate with equal rights in the development of MPAI specifications.

The figure below depict the structure of MPAI.

The General Assembly (GA), chaired by the President of the Board of Directors (BD), develops and keeps updated the Work Plan(WP). This describes the areas on which MPAI is developing and intends to develop specifications, and their expected timing. The WP is developed using inputs from Members and by issuing Calls for Interest (CfI). Anybody, including non-Members, can respond to CfIs.

When the GA decides to develop a Technical Specification (TS), it requests the Standing Committee called Requirements (Reqs) to develop requirements. Reqs does so by using input contributions from Members. If there is no consensus that technologies satisfying the requirements exist, it will issue a Call for Evidence (CfE). If there is consensus or if the CfE confirms the existence of technologies, Reqs transfers the requirements developed to the GA.

GA updates the WP and develops the Terms of Reference (ToR) of the Development Committee (DC) who will develops the TS. Then the GA sends the requirements and the ToR to the BD. The Secretariat collects the names of Principal Members who intend to make technical contributions to the development of the TS (Active Members).

The BD requests the IPR Support Advisory Committee (IPR SAC) to develop a Framework Licence (FWL) for the TS within a certain date.

The IPR SAC is chaired by a Director and each Principal Member may appoint one members in the SAC. An FWL contains the elements of a future licence that can be discussed and agreed on the basis of generally accepted antitrust and competition principles, but without “values” such as royalty levels, percentage, dates etc.

The IPR SAC establishes a Subgroup composed of one representative per Active Member tasked to develop the FWL. The chair is appointed by the BD. The Subgroup may request the help of a counsel to progress its work. This is to be approved by the BD.

When the Subgroup has completed the work, it approves the FWL by 2/3 majority. The IPR SAC sends the FWL to the BD who reviews and approves the FWL, establishes the DC, appoints its chair and communicates its decisions to the GA.

The GA requests the Secretariat to collect declarations of Principal and Associate Members that they will license their essential patents according to the FWL, kicks off the work of the DC and requests Reqs to develop a Call for Technologies (CfT). Reqs does so jointly with the DC. The GA approves and issues the CfT.

Responses to the CfT are assessed jointly by Reqs and DC. The results are used by the DC to start the work of the DC. Representatives of Principal and Associate Members in appropriate numbers contribute to the development of the DC. Decisions are made by consensus detected by the chair. An issue that cannot be resolved by consensus, is brought to the attention of the GA who decides by 2/3 majority (of Principal Members).

The task of the Membership and Nominating Committee is to assess membership requests on the basis of objective and published criteria and to nominate candidate directors. The task of the  Finance and Audit Committee is to review the accounts and prepare the audit report.