Multimodal Conversation (MPAI-MMC)
Important Information: Deadline for responses to the Call is 24 October
Multi-modal conversation (MPAI-MMC) is an MPAI standard enabling conversation between humans and machines that emulates human-human conversation in completeness and intensity using AI.
After completing the development of Version 1, MPAI is engaged in the development of MPAI-MMC V2:
- MPAI-MMC V2 Call for Technologies (the call closes on 2022/10/24)
- MPAI-MMC V2 Use Cases and Functional Requirements
- Clarifications about MPAI-MMC V2 CfT data formats
- MPAI-MMC V2 Framework Licence
- MPAI-MMC V2 Template for responses
The MPAI Secretariat shall receive submissions in response to the MPAI-MMC V2 Call for Technologies by 2022/10/24T23:59 UTC.
Multi-modal conversation (MPAI-MMC) Version 2 will specify two Composite AIMs:
- Personal Status Extraction: provides an estimate of the Personal Status (PS) – of a human or an avatar – conveyed by Text, Speech, Face, and Gesture. PS is the ensemble of information internal to a person, including Emotion, Cognitive State, and Attitude.
- Personal Status Display: generates an avatar from Text and PS that utters speech with the intended PS while the face and gesture show the intended PS.
and three new use cases:
- Conversation About a Scene: a human holds a conversation with a machine about objects in a scene. While conversing, the human points their fingers to indicate their interest in a particular object. The machine is helped by the understanding of the human’s Personal Status.
- Human-Connected Autonomous Vehicle (CAV) Interaction: a group of humans converse with a CAV which understands the utterances and the PSs of the humans it converses with and manifests itself as the output of a Personal Status Display.
- Avatar-Based Videoconference: avatars representing humans with a high degree of accuracy participate in a videoconference. A virtual secretary (VS) represented as an avatar displaying PS creates an online summary of the meeting with a quality enhanced by the VS’s ability to understand the PS of the avatar it converses with.
Multi-modal conversation (MPAI-MMC) Version 1 includes 5 Use Cases:
- Conversation with Emotion supporting audio-visual conversation with a machine impersonated by a synthetic voice and an animated face.
- Multimodal Question Answering supports request for information about a displayed object.
- Unidirectional, Bidirectional and One-to-Many Speech Translation support conversational translation using a synthetic voice that preserves the speech features of the human.
MPAI is indebted to the following individuals: Miran Choi (ETRI), Gérard Chollet (IMT), Jisu Kang (KLleon), Mark Seligman (SMI) and Fathy Yassa (SMI) for their efforts in developing the MPAI-MMC Technical Specification V1.
Reference Software, Conformance Testing and Performance Assessment for MPAI-MMC V1 are under development. Read the V1.2-related document
- Call for Patent Pool Administrator (Closed)
- Introdution to MPAI-MMC (V1)
- MPAI-MMC Standard (V1)
- Call for Technologies (V1)
- Use Cases and Functional Requirements (V1)
- Framework Licence (V1)
- Application Note
Visit the About MPAI-MMC page