About the Multimodal Conversation Standard
The Multimodal Conversation (MPAI‑MMC) V2.5 standard defines a comprehensive and interoperable framework for human‑machine and machine‑machine conversation systems integrating text, speech, vision, and behavioral signals. MPAI-MMC enables the creation of Advanced conversational applications that combine AI Modules (AIMs) exchanging standard Data Types and operating within the MPAI Artificial Intelligence Framework (MPAI‑AIF).
Key Use Cases
MPAI‑MMC V2.5 fully specifies a rich set of multimodal conversational applications:
- Answer to Multimodal Question (MMC‑AMQ): Responds to queries combining text, speech, and visual input.
- Conversation About a Scene (MMC‑CAS): Enables interactive dialogue about objects and environments using speech, gestures, and visual cues.
- Conversation with Personal Status (MMC‑CPS): Extracts and expresses internal states (Personal Status) during interaction.
- Conversation with Emotion (MMC‑CWE): Supports emotionally expressive audio‑visual dialogue with synthetic agents.
- Human‑Connected Autonomous Vehicle Interaction (MMC‑HCI): Enables natural interaction between humans and autonomous vehicles using multimodal signals.
- Multimodal Question Answering (MMC‑MQA): Answers queries about displayed objects and scenes.
- Text and Speech Translation (MMC‑TST): Provides flexible multimodal translation with optional preservation of speech characteristics.
- Virtual Meeting Secretary (MMC‑VMS): Summarises meetings, interprets participant signals, and supports interaction in virtual environments.
- Personal Status Extraction (MMC‑PSE): Estimates internal states from text, speech, face, and gestures.
Powered by the MPAI AI Framework
MPAI‑MMC operates within the MPAI Artificial Intelligence Framework (AIF), which provides a standard Execution Environment having an architecture composed of components (AIMs) that can implemented in a platform‑independent manner and dynamically configured and orchestrated.
Benefits for the Ecosystem
MPAI‑MMC enables a multi‑vendor, interoperable AI ecosystem:
- Technology Providers Offer standard-compliant AI components to a global market
- Developers & Integrators Build applications using reusable, interoperable modules
- End Users Access more powerful, transparent, and trustworthy AI applications
- Society Benefits from reduced opacity of AI through modular, inspectable systems
A New Paradigm for Conversational AI
MPAI‑MMC promotes a shift from monolithic AI systems to:
- Composable AI architectures
- Reusable multimodal components
- Transparent and explainable workflows
- With shared Data Types and reusable AIMs
MPAI‑MMC enables scalable innovation in the multimodal conversation domain and component reusability. Indeed, most AI Modules are reused across the MPAI-MMC use cases, ensuring efficiency, consistency, and rapid development.
Conclusion
MPAI‑MMC V2.5 delivers a complete, interoperable framework for building next-generation conversational systems that:
- Understand and generate across modalities
- Capture human behavioural signals
- Operate in standard, secure, and composable environments
Technical Specification: Multimodal Conversation (MPAI-MMC) V2 specifies technologies further enhancing the capability of a human to converse with a machine in a variety of application environments compared to V1. In particular it extends the notion and the data format of Emotion to Personal Status that additionally includes Cognitive State and Social Attitude. V2 applies Personal Status and other data types to support new use cases.
![]() |
Personal Status Extraction: provides an estimate of the Personal Status (PS) – of a human or an avatar – conveyed by a Modality (Text, Speech, Face, and Gesture). PS is the ensemble of Factors, i.e., information internal to a human or an avatar (Emotion, Cognitive State, and Social Attitude), extracted through the steps of Description Extraction and PS Interpretation. |
| Figure 1 – Personal Status Extraction (PSE) | |
![]() |
An entity – a real or digital human – converses with a machine possibly about physical objects in the environment. The machine captures and understands Speech, extracts Personal Status from the Text, Speech, Face, and Gesture Factors, fuses the Factors into an estimated Personal Status of the entity to achieve a better understanding of the context in which the entity converses. The machine is represented by a Portable Avatar. |
| Figure 2 – Conversation with Personal Status (MMC-CPS) | |
![]() |
A human holds a conversation with a machine about objects around the human. While conversing, the human points their fingers to indicate their interest in a particular object. The machine uses Visual Scene Description to extract the Human Object and the Physical Object, uses PSE to understand the human’s PS, and uses Personal Status Display (PSD) to respond while showing its PS. |
| Figure 3 – Conversation About a Scene (CAS) | |
![]() |
Humans converse with a CAV which understands their utterances and their PSs by means of the PSE and manifests itself as the output of a PSD. HCI also recognises humans by face and speech both when they are outside and approach the CAV and inside the cabin. The figure also represents the communication of the Ego CAV HCI with Remote HCIs. |
| Figure 4 – Human-Connected Autonomous Vehicle (CAV) Interaction (HCI) | |
![]() |
The Virtual Secretary (VS) is a human-like speaking avatar not representing a human who produces a summary of what is being said at the meeting, including the participants’ PSs. Participating avatars can make comments to the VS, answer questions, etc. The VS manifests itself through a PSD. |
| Figure 5 – Avatar-Based Videoconference (Virtual Secretary) | |
MPAI-MMC: Version 1 – Version 2
Version 1: MPAI-MMC V1 enables human-machine conversation emulating human-human conversation in completeness and intensity using AI. The MPAI-MMC standard includes 5 Use Cases: Conversation with Emotion, Multimodal Question Answering, Unidirectional Speech Translation, Bidirectional Speech Translation and One-to-Many Unidirectional Speech Translation.
The figures below shows the reference models of the MPAI-MMC Use Cases. Note that an Implementation is supposed to run in the MPAI-specified AI Framework (MPAI-AIF).
If you wish to participate in this work you have the following options:
- Join MPAI
- Keep an eye on this page.
Return to the MPAI-MMC page









