MPAI-MMC V2.2 Technical Specification

MPAI-MMC V2.1 
Technical Specification
Conformance Testing Specification

What MPAI-MMC is about

Description of the standard

Technical Specification: Multimodal Communication (MPAI-MMC) V2.1 specifies technologies that enable a variety of forms of conversation between human and machine that are more human-like and richer in content. MPAI-MMC seeks to emulate human-human conversation in completeness and intensity. The technologies are applied to seven use cases in different domains.
MPAI-MMC is also IEEE 3300-2022

Watch video recording of online presentation (YouTube, WimTV) Read the PowerPoint presentation

What the MPAI-MMC standard is about

Version 1 – Version 2.1

Technical Specification: Multimodal Conversation (MPAI-MMC) V2.1 specifies 1) data formats for analysis of text, speech, and other non-verbal components, used in human-machine and machine-machine conversation, and 2) use cases providing recognised applications by using data formats from MPAI-MMC and other MPAI standards.

Material about MPAI-MMC V2.1 .

MPAI thanks the following individuals for their valuable contributions to the development of MPAI-MMC V2: Miran Choi (ETRI), Gérard Chollet (IMT), Paolo Ribeca (James HUtton Ltd), Mark Seligman (SMI), Fathy Yassa (SMI), and Jaime Yoon (Hancom).

MPAI appreciates the work carried out by Miran Choi, Gérard Chollet, Mark Seligman, and Jaime Yoon in the development of Conformance Testing Specification: Multimodal Conversation (MPAI-MMC) V2.1.


The MPAI-MMC V2 Working Draft (html, pdf) was published with a request for Community Comments. See also the video recordings (YouTube, WimTV) and the slides of the presentation made on 05 September. Read An overview of Multimodal Conversation (MPAI-MMC) V2. Comments should be sent to the MPAI Secretariat by 2023/09/25T23:59 UTC. MPAI will use the Comments received to develop the final draft planned to be published at the 36th General Assembly (29 September 2023).

MPAI-MMC Version 2 extends the capabilities of V1 specifying the data formats of two Composite AIMs:

  1. Personal Status Extraction: provides an estimate of the Personal Status (PS) – of a human or an avatar – conveyed by Text, Speech, Face, and Gesture. PS is the ensemble of information internal to a person, including Emotion, Cognitive State, and Attitude.
  2. Personal Status Display: generates an avatar from Text and PS that utters speech with the intended PS while the face and gesture show the intended PS.

in support of three new use cases:

  1. Conversation About a Scene: a human holds a conversation with a machine about objects in a scene. While conversing, the human points their fingers to indicate their interest in a particular object. The machine is helped by the understanding of the human’s Personal Status.
  2. Human-Connected Autonomous Vehicle (CAV) Interaction: a group of humans converse with a CAV which understands the utterances and the PSs of the humans it converses with and manifests itself as the output of a Personal Status Display.
  3. Virtual Secretary for Videoconference: in the Avatar-Based Videoconference use case a virtual secretary summarises what avatars are uttering understanding and captuting their Personal Status.

MPAI has published the following documents to develop MPAI-MMC V2:

  1. MPAI-MMC V2 Call for Technologies (closed)
  2. MPAI-MMC V2 Use Cases and Functional Requirements
  3. Clarifications about MPAI-MMC V2 CfT data formats
  4. MPAI-MMC V2 Framework Licence
  5. MPAI-MMC V2 Template for responses

Read about MPAI-MMC V2 Call for Technologies:

  1.  2 min video (YouTube ) and video (non YouTube) illustrating MPAI-MMC V2.
  2. slides presented at the online meeting on 2022/07/12.
  3. video recording of the online presentation (Youtube, non-YouTube) made at that 12 July presentation.
  4. Call for TechnologiesUse Cases and Functional Requirements, and Framework Licence.

Version 1 – Version 2

Multi-modal conversation (MPAI-MMC) Version 1 includes 5 Use Cases:

  1. Conversation with Emotion supporting audio-visual conversation with a machine impersonated by a synthetic voice and an animated face.
  2. Multimodal Question Answering supports request for information about a dis­played object.
  3. Unidirectional, Bidirectional and One-to-Many Speech Translation support conversational translation using a synthetic voice that preser­ves the speech features of the human.

MPAI is indebted to the following individualsMiran Choi (ETRI), Gérard Chollet (IMT), Jisu Kang (KLleon), Mark Seligman (SMI) and Fathy Yassa (SMI) for their efforts in developing the MPAI-MMC Technical Specification V1.

MPAI-MMC V1 

Users of MPAI standards should bear in mind the Notices and Disclaimers concerning use of MPAI Standards.

Version 1.1 Technical Specification

The Institute of Electrical and Electronic Engineers (IEEE) has adopted MPAI-MMC with the name IEEE 3300-2022.

Reference Software, Conformance Testing and Performance Assessment for MPAI-MMC V1 are under development. Read the V1.2-related document:

  1. MPAI-MMC V1 Standard
  2. Call for Patent Pool Administrator (Closed)
  3. Introdution to MPAI-MMC (V1)
  4. MPAI-MMC Standard (V1)
  5.  Call for Technologies (V1)
  6. Use Cases and Functional Requirements (V1)
  7. Framework Licence (V1)
  8. Application Note