From the moment a human built the first machine, there was a need to “communicate” with it. As more complex machines were built, the need for more sophisticated communication methods arose. Today, as personal devices become more pervasive, and the use of information and other online services become ubiquitous, human-machine communication often becomes more direct and even “personal”. In the past, humans communicated with more primitive machines by touch, later by characters and then with speech and even visual means.

The ability of Artificial Intelligence to learn from interactions with humans gives machines the ability to improve their “conversational” capabilities by better understanding the meaning of what a human types or says and by providing more pertinent responses. If properly trained, machines can also learn to understand additional or hidden meanings of a sentence by analysing a human’s text, speech, or gestures. Machines can also be made to develop and rely on “internal statuses” comparable to those driving the attitudes of conversing humans. Thus, they can provide responses – in text, speech, and gestures – that are more human-like and richer in content.

Technical Specification: Multimodal Conversation (MPAI-MMC) V2 has been developed by MPAI – Moving Picture, Audio, and Data Coding by Artificial Intelligence, the international, unaffiliated, non-profit organisation developing standards for Artificial Intelligence (AI)-based data coding with clear Intellectual Property Rights licensing frameworks in compliance with the rigorous MPAI Process [16] in pursuit of the following policies:

  1. Be friendly to the AI context but, to the extent possible, agnostic to the technology – AI or Data Processing – used in an implementation.
  2. Be attractive to different industries, end users, and regulators.
  3. Address three levels of standardisation any of which an implementer can freely decide to adopt:
    1. Data types, i.e., the data exchanged by systems.
    2. Components (called AI Modules – AIM).
    3. Connections of components (called AI Workflows – AIW).
  4. Specify the data exchanged by components with a semantic that is clear to the extent possible.

Technical Specification: AI Framework (MPAI-AIF) V2 [2] enables dynamic configuration, initialisation, and control of AIWs in a standard environment called AI Framework (AIF). Figure 1 depicts the AI Framework.

Figure 1 – The AI Framework (MPAI-AIF) V2 Reference Model

AIWs and AIMs have standard interfaces. AIMs can execute data processing or Artificial Intelligence algorithms and can be implemented in hardware, software, or hybrid hardware/software. AI Module can be Composite if they include connected AI Modules.

The MPAI-AIF-specified AIF environment enables the secure execution of AIWs constituted by AIMs. Thus, users can have machines implementing AIMs whose internal operation they understand to some degree, rather than machines that are just “black boxes” resulting from unknown training with unknown data. AIM developers can provide components with standard interfaces that can have improved performance compared to other implementations.

An AIW and its AIMs may have 3 interoperability levels any of which implementers can freely adopt:

  1. Level 1 – Implementer-specific and satisfying the MPAI-AIF Standard.
  2. Level 2 – Specified by an MPAI Application Standard.
  3. Level 3 – Specified by an MPAI Application Standard and certified by a Performance Assessor.

As manager of the MPAI Ecosystem specified by Governance of MPAI Ecosystem (MPAI-GME) [1], MPAI ensures that a user can:

  1. Operate a reference implementation of the Technical Specification, by providing a Reference Software Specification with annexed software.
  2. Test the conformance of an implementation with the Technical Specification, by providing Conformance Testing Specification.
  3. Assess the performance of an implementation of a Technical Specification, by providing the Performance Assessment Specification.
  4. Get conforming implementations possibly with a performance assessment report from a trusted source through the MPAI Store.

The MPAI-MMC V2 Technical Specification will be accompanied by the Reference Software, Conformance Testing, and Performance Assessment Specifications. Conformance Testing specifies methods enabling users to ascertain whether a data type generated by an AIM, an AIM, or an AIW conform with this Technical Specification.

The MPAI-MMC V2 Technical Specification provides the technologies supporting the implementation of a subset or the totality of the possibilities envisaged by this Introduction:

  1. It is organised in Use Cases collected in Chapter 5, such as Conversation with Personal Status, Multimodal Question Answering, and Unidirectional Speech Translation, corresponding to AI Workflows.
  2. Each Use Case provides:
    1. The functions.
    2. The Input/Output Data of the AIW implementing it.
    3. The Reference Model specifying the AIM topology.
    4. The AIMs specified in terms of functions performed and Input/Output Data.
  3. A single chapter (Chapter 7) collects all data formats referenced in the specification.
  4. Annexes provide the JSON metadata of the AIWs, Composite AIM, and AIMs.

In this Introduction and in the following Chapters, Terms beginning with a capital letter are defined in Table 1 if they are specific to this Technical Specification and in Table 62 if they are common to all MPAI Technical Specifications. The chapters and the Annexes are Normative unless they are labelled as Informative.

Chapters, Sections, and Annexes are Normative unless they are explicitly identified as Informative.