(Informative)
From the moment a human built the first machine, there was a need to “communicate” with it. In the past, humans communicated with more primitive machines by touch, later by characters and then with speech and even visual means. Then, more complex machines were built and the need for more sophisticated communication methods arose. Today, as personal devices become more pervasive, and the use of information and other online services become ubiquitous, human-machine communication often becomes more direct and even “personal”.
The ability of Artificial Intelligence to learn from interactions with humans gives machines the ability to improve their “conversational” capabilities by better understanding the meaning of what a human types or says and by providing more pertinent responses. If properly trained, machines can also learn to understand additional or hidden meanings of a sentence by analysing a human’s text, speech, or gestures. Machines can also be made to develop and rely on “internal statuses” comparable to those driving the attitudes of conversing humans. Thus, they can provide responses – in text, speech, and gestures – that are more human-like and richer in content.
Technical Specification: Multimodal Conversation (MPAI-MMC) V2.2 has been developed by MPAI in pursuit of the following policies:
- Be friendly to the AI context but, to the extent possible, agnostic to the technology – AI or Data Processing – used in an implementation.
- Be attractive to different industries, end users, and regulators.
- Address three levels of standardisation any of which an implementer can freely decide to adopt:
- Data types, i.e., the data exchanged by systems.
- Components (called AI Modules – AIM).
- Connections of components (called AI Workflows – AIW).
- Specify the data exchanged by components with a semantic that is clear to the extent possible.
The MPAI-MMC V2 Technical Specification will be accompanied by the Reference Software, Conformance Testing, and Performance Assessment Specifications. Conformance Testing specifies methods enabling users to ascertain whether a data type generated by an AIM, an AIM, or an AIW conform with this Technical Specification.
The MPAI-MMC V2.2 Technical Specification provides the technologies supporting the implementation of a subset or the totality of the possibilities envisaged by this Introduction:
- It is organised by Use Cases, such as Conversation with Personal Status, Multimodal Question Answering, and Unidirectional Speech Translation, corresponding to AI Workflows.
- Each Use Case provides:
- The functions.
- The Input/Output Data of the AIW implementing it.
- The Reference Model specifying the AIM topology.
- The AIMs specified in terms of functions performed and Input/Output Data.
In all Chapters and Sections, Terms beginning with a capital letter are defined in Table 1 if they are specific to this Technical Specification and in Table 2 if they are common to all MPAI Technical Specifications. All Chapters, Sections, and Annexes are Normative unless they are labelled as Informative.