Technical Specification: Multimodal Conversation (MPAI-MMC) V2.2 specifies:

  1. AI Workflows implementing Use Cases that use AI Modules from MPAI-MMC and other MPAI Technical Specifications to provide recognised applications in the Multimodal Conversation domain.
  2. AI Modules to analyse text, speech, and other non-verbal components used in human-machine and machine-machine conversation applications.
  3. Data Types used by MPAI-MMC V2.2.

AI Modules and Data Types are defined for use in other MPAI Technical Specifications.

This Technical Specification includes the following Use Cases:

  1. Conversation with Personal Status (MMC-CPS), enabling conversation and question answering with a machine able to extract the inner state of the entity it is conversing with and showing itself as a speaking digital human able to express a Personal Status. By adding or removing minor components to this general Use Case, five Use Cases are spawned:
  2. Conversation About a Scene (MMC-CAS) where a human converses with a machine pointing at the objects scattered in a room and displaying Personal Status in their speech, face, and gestures while the machine responds displaying its Personal Status in speech, face, and gesture.
  3. Virtual Meeting Secretary (MMC-VSV) where an avatar not representing a human in a virtual avatar-based video conference extracts Personal Status from Text, Speech, Face, and Gestures, displays a summary of what other avatars say, and receives and act on comments.
  4. Human-Connected Autonomous Vehicle Interaction (MMC-HCI) where humans converse with a machine displaying Personal Status after having been properly identified by the machine with their speech and face in outdoor and indoor conditions while the machine responds displaying its Personal Status in speech, face, and gesture.
  5. Conversation with Emotion (CAE-CWE), enabling audio-visual conversation with a machine impersonated by a synthetic voice and an animated face.
  6. Multimodal Question Answering (MQA), enabling request for information about a dis­played object.
  7. Three Uses Cases supporting text and speech translation applications. In each Use Case, users can specify whether speech or text is used as input and, if it is speech, whether their speech features are preserved in the interpreted speech:
    1. Unidirectional Speech Translation” (UST).
    2. Bidirectional Speech Translation” (BST).
    3. One-to-Many Speech Translation” (MST).
  8. The “Personal Status Extraction” Composite AIM that estimates the Personal Status conveyed by Text, Speech, Face, and Gesture – of an Entity, i.e., a real or digital huma

Note that:

  1. Each Use Case normatively defines:
    • The Functions of the AIW implementing it and of the AIMs.
    • The Connections between and among the AIMs
    • The Semantics and the Formats of the input and output data of the AIW and the AIMs.
  2. Each Composite AIM normatively defines:
    • The Functions of the Composite AIM implementing it and of the AIMs.
    • The Connections between and among the AIMs
    • The Semantics and the Formats of the input and output data of the AIW and the AIMs.

The word normatively implies that an Implementation claiming Conformance to:

  1. An AIW, shall:
    1. Perform the AIW function specified in the appropriate Section of Chapter 5.
    2. All AIMs, their topology and connections should conform with the AIW Architecture specified in the appropriate Section of Chapter 5.
    3. The AIW and AIM input and output data should have the formats specified in the appropriate Sections of Chapter 7.
  1. An AIM, shall:
    1. Perform the functions specified by the appropriate Section of Chapter 5 or 6.
    2. Receive and produce the data specified in the appropriate Section of Chapter 7.
    3. A data Format, the data shall have the format specified in Chapter 7.

Implementers of this Technical Specification should note that:

  1. The Reference Software of this Technical Specification may be to develop Implementations.
  2. The Conformance Testing specification may be used to test the conformity of an Implemen­tation to this Standard.
  3. The level of Performance of an Implementation may be assessed based on the Performance Assessment specification of this Standard.

All Users should consider Notices and Disclaimers.

The current Version of MPAI-MMC has been developed by the MPAI Multimodal Conversation Development Committee (MM-DC). MPAI expects to produce future MPAI-MMC Versions extending the scope of the Use Cases and/or add new Use Cases supported by existing of new AI Modules and Data Types within the scope of Multimodal Conversation.