At its 40th General Assembly (MPAI-40), MPAI approved one draft one new and three extension standards. For an organisation that has already nine standards in its game bag, this may not look like big news. There are two reasons, though, to make this a remarkable moment in the MPAI short but intense life.
The first reason is that the draft standard posted for Community Comments – Human and Machine Communication (MPAI-HMC) – does not specify new technologies but leverages technologies from existing standards: Context-based Audio Enhancement (MPAI-CAE), Multimodal Conversation (MPAI-MMC), the newly approved Object and Scene Description (MPAI-OSD), and Portable Avatar Format (MPAI-PAF).
If not new technologies, what does MPAI-HMC specify then? To answer this question let’s consider Figure 1.
Figure 1 – The MPAI-HMC communications model
The human labelled as #1 is part of a scene with audio and visual attributes and communicates with the Machine by transmitting speech information and the entire audio-visual scene including him or herself. The Machine receives information including speech, processes it, and emits internally generated audio-visual scenes that include itself uttering vocal and displaying visual manifestations of its own internal state generated to interact more naturally with the human. The human may also communicate with the Machine when other humans are in the scene with him or her and the Machine can discern the individual human and identify (i.e., give a name to) audio and visual objects. However, only one human at a time can communicate with the Machine.
The human is not restricted to be in a real space. The same scenario can be described if his or her representation is in a Virtual Space as a Digitised Human, possibly together with other Digitised Humans or with Virtual Humans, i.e., audio-visual representations of processes, such as Machines. For this reason, we will use the word Entity to indicate both a human and a Machine.
The Machine can also act as an interpreter between the Entities and Contexts labelled as #1 or #2 and #3 or #4. By Context we mean information surrounding an Entity that provides additional insight into the information communicated by the Entity. The simplest example is language and, more generally, culture.
Communication between #1 and #3 represents the case of a human in a Context communicating with a Machine, e.g., an information service, in another Context. In this case the Machine communicates with the human by sensing and actuating audio-visual information, but the communication between the Machine and #3 may use a different communication protocol. The payload used to communicate is the “Portable Avatar” defined as a Data Type specified by the MPAI-PAF standard representing an Avatar and its Context,.
Communication between the human in #1 and the Machine need not use the Portable Avatar Format. However, communication between the Machine and #4, a human in a different Context, will use it.
Read a collection of usage scenarios.
The name of the standard is Human and Machine Communication (MPAI-HMC). It is published as a draft with a request for Community Comments, the last step before publication. Comments are due by 2024/02/19T23:59 UTC to firstname.lastname@example.org.
The second news is that four out of five documents published by MPAI-40, are based on the notion of AI Workflow (AIW) composed of interconnected AI Modules (AIM) executed in the AI Framework (AIF) specified by the MPAI-AIF standard, are published in a new format that retains the traditional Introduction-Scope-Definitions-References chapters but adds the Use Cases-AI Modules- Data Types chapters that make reference to a common body of AIMs and Data Types.
Component-based software engineering is a software engineering style that aims to build software out of modular components. MPAI is implementing this notion to the world of standards.
See the links below and enjoy: