<–Some MPAI data coding standards Some technologies from the MPAI repository–>

MPAI produces AI-based data coding standards. But what is a “standard”? For sure there is a technical document specifying how things should be done, but MPAI adds to this a reference software implementation, normatively equivalent to the technical specification. Then there is a specification to test that an implementation has been implemented in a technically correct fashion. Finally, MPAI adds a specification to assess how well an implementation “performs”, a multi-dimensional notion meaning, e.g., that the implementation is unbiased.

Therefore, MPAI defines a standard as a collection of four documents with associated software and data sets.

14.1 Technical Specification
14.2 Reference Software
14.3 Conformance Testing
14.4 Performance Assessment

14.1Technical Specification

The first document is the Technical Specification (TS), the document that contains normative clauses to be strictly followed by a user wishing to develop a conforming implementation. There are two types of TS: system-oriented and application oriented. The former concerns support for AI operation, such as the MPAI-AIF standard, and the latter concern application of AI to specific domains such as MPAI-CAE, MPAI-MMC and MPAI-CUI.

An MPAI application standard is typically a container of applications called use cases. For instance, the Multimodal Conversation TS contains, as of today, Conversation with Emotion (CWE), Multimodal Question Answering (MQA), Unidirectional Speech Translation (UST), Bidirectional Speech Translation (BST) and One-to-Many Speech Translation (MST). Each TS is identified by 3 characters (e.g., MMC) and each use case is also identified by 3 characters, e.g., CWE.

For each use case, the TS specifies the AI Workflow (AIW) that implements the use case with:

  1. The function executed by the AIW.

  2. The syntax and semantics of the AIW’s input and output data.

  3. The topology of the AIMs composing the AIW.

  4. For each AIM

    1. The function executed by the AIM.

    2. The syntax and semantics of the AIW’s input and output data.

The TS includes the syntax and semantics of all data formats used by all use cases in a single chapter. This is done because quite a few of the data formats are shared across AIMs. Some are also shared by different standards and MPAI plans on developing a standard collecting all AIMs used by more than one standard.

14.2Reference Software

The second component of an MPAI standard is the Reference Software. Ideally, an RS is the expression of the TS in a computer language, as opposed to the natural language used to express the TS. It is a technically correct implementation of the TS in the sense that its AIW and AIMs perform as the TS specifies.

The RS is composed of:

  1. Software implementing the TS released as a source code implementation of the AIF and the AIWs exposing all AIM interfaces with the full set of:

    1. High-performance source code AIMs, or

    2. Limited-performance source code AIMs, or

    3. Sufficiently high-performance compiled AIMs, not to be used for commercial implementations unless the AIM provider agrees.

  2. Sample input data or a data generating environment or endpoint for trialling the RS.

  3. A knowledge base conforming with the standard in case the RS requires use of a knowledge base for access by those using the RS.

The RS is distributed with the MPAI modified Berkeley Software Distribution (BSD) licence.

14.3Conformance Testing

If a TS can be considered as the “law”, i.e., the set of rules that implementers have to follow to develop correct implementations, Conformance Testing (CT) can be considered as the “tribunal” determining whether an implementation is indeed technically correct.

Conformance testing is not unknown in standardisation. Indeed, MPEG had always developed the conformance testing of its standards. However, the issue with digital media is that there is typically an “encoder” producing data that a “decoder” can decode. Therefore, conformance testing could be formulated as “provide bitstreams and check that the decoder under test can correctly decode them” and “feed the bitstreams produced by an encoder and check that the reference software decoder can correctly decode them”. Although digital media are, well, digital, in general, two digital media decoders may very well not decode the same bitstream in the same way. The reason is because two decoders may have a different initial state, and different precision levels may have been used in the many computations performed by a decoder. While different, they may very well pass the conformance testing.

In the MPAI world the difference of the outputs from different implementa-tions, more than the exception is the norm, because most AIMs contain neural networks of unspecified architectures, trained with unspecified data sets. The MPAI CT specifications define the procedure, the tools, the process, and the data to be used to Test the Conformance of an implementation and specifies the tolerance of the output of an AIM given the input data used for the Test.

An example of MPAI CT is the following. In Conversation with Emotion (MMC-CWE) there is an AIM whose function is to take input speech and produce as output the text corresponding to the input speech and the emotion contained in the input speech (Figure 23).

Figure 23 – An example of MPAI Conformance Testing

In this case Conformance is defined as the ability of an Implementation to produce Text expressed as Unicode and Emotion expressed as one of the MPAI standard Emotions. This is important because, to be able to interconnect and do something useful with the data, an AIM must receive them in the right format. For a user of the system, however, knowing that the data are syntactically and semantically correct is not particularly useful if Recognised Text has little resemblance with what was contained in the Input Speech and the Emotion is declared as Angry when in the Input Speech it was Happy.

Imposing that an AIM implementation is Conforming only when the AIM does a perfect job is not realistic either, because no implementation can be perfect in all cases. Giving a grade to a speech recogniser is a known problem that amounts to assigning a word error rate below which an implementation is accepted. For Emotion, however, the story is different because there is no established practice. MPAI is therefore considering three possibilities to measure the “emotion error rate”: 1) use human testers, 2) train a network to measure the distance between Emotions, or 3) define an emotion space with suitable metrics.

This shows that even a seemingly “boring” topic like Conformance Testing can become an attractive field of investigation.

14.4Performance Assessment

In the Introduction, we mentioned that AI has a subtle way of appearing reliable when it is not, whether by design or not. This most excruciating topic is engaging research in several affected fields.

MPAI addresses this issue by defining Performance of an MPAI standard implementation as the set of the following attributes: Reliability, Robustness, Fairness and Replicability. MPAI gives the following meanings to the four words:

  1. Reliability: implementation performs as specified by the standard e.g., within the application scope, with stated limitations, and for the period of time specified by the Implementer.

  2. Robustness: the implementation can cope with data outside of the stated application scope with an estimated degree of confidence.

  3. Replicability: the assessment made by an entity can be replicated, within an agreed level, by another entity.

  4. Fairness: the training set and/or network is open to testing for bias and unanticipated results so that the extent of applicability of the system can be assessed.

It should also be clear that Performance is not a yes/no attribute but can have “grades”, possibly depending on the specific domain to which AI is applied.

Performance Assessment, the fourth component of an MPAI standard, is the specification that defines the data sets or their characteristics, the tools, the procedures, and the grades used to Assess the Performance of an implementation.

<–Some MPAI data coding standards Some technologies from the MPAI repository–>