Newsletter 2022-04-22

2022/04/22

MPAI alerts industry of upcoming MPAI-MMC V2 Call for Technologies

Multimodal Conversation is one of the first standards approved by MPAI (September 2021). It comprises five Use Cases, all sharing the use of AI to enable forms of human-machine conversation that emulate human-human conversation in completeness and intensity:

Conversation with Emotion (CWE), enables a human hold a conversation with a machine. The machine understand the human’s emotion as expressed by their speech and face and responds with a synthetic voice and an animated face both expressing emotion.
Multimodal Question Answering (MQA), enables a human to ask questions about a displayed object.
Three conversational translation Uses Cases enable a human to obtain a translation of their speech that preserves the “colour” of their speech in the interpreted speech.

The Multimodal Conversation Development Committee has investigated the Multimodal Conversation side of several use cases developed by other MPAI groups and has selected three as candidate use cases for Version 2 of the MPAI-MMC standard:

Conversation About a Scene (CAS): a human holds a conversation with a machine about objects in a scene of which the human is part. While conversing, the human points a finger to indicate their interest in a particular object.
Human-Connected Autonomous Vehicle Interaction (HCI): a group of humans has a conversation on a domain-specific subject (travel by car) with a Connected Autonomous Vehicle. The machine understands the utterances, the emotion in the speech and in the faces, and the expression in their gestures. The machine manifests itself as the Head and shoulders of an avatar whose face and head convey emotions congruent with the speech it utters.
Avatar-Based Videoconference (ABV). In this instance of Mixed-reality Collaborative Space (MCS), avatars represent humans participating in a videoconference. Avatars reproduce the movements of the torsos of human participants with a high degree of accuracy.

Several data formats for potential standardisation have been derived from the following identified functions. The use cases needing a particular function are indicated.

	ABV	CAS	HCI
1. Human selects:
a. The Ambient in which the avatars operate (ABV).	X
b. The avatar model used by the machine to manifest itself (CAS, ABV, HCI).	X	X	X
c. The Colour (i.e., the speech features) the machine uses to utter speech,	X	X	X
2. Machine locates the visual and speech components of human(s) in the visual and sound space.	X	X	X
3. Machine separates:
a. The visual components of the individual humans from the rest of the visual space (i.e., other visual objects and other visual humans).	X	X	X
b. The speech components of the individual speaking humans from the rest of sound space (i.e., other sound objects).	X		X
4. The machine extracts descriptors of:
a. Human face.	X	X	X
b. Physical gesture (i.e., head, arms, hands and fingers).	X	X	X
c. Human speech.	X	X	X
5. The machine uses
a. Face descriptors to:
i. Identify the human belonging to a group of a limited number of humans.	X		X
ii. Extract the emotion of the face.	X	X	X
iii. Animate the face of an avatar.	X	X	X
b. Physical gesture descriptors to:
i. Extract the Expression of the physical gestures.	X	X	X
ii. Interpret the sign language conveyed by the physical gesture descriptors.		X	X
iii. Animate the torso of an avatar using physical gesture descriptors	X	X
c. Speech Descriptors to
i. Identify a human belonging to a group composed of a limited number of humans.	X		X
ii. Recognise speech (i.e., extract text).	X	X	X
iii. Extract the emotion in the speech.	X	X	X
6. Machine holds a conversation with a human or an avatar
a. In the context of a specific domain			X
b. About objects in the visual space		X
by
a. Analysing and interpreting their Expression and Emotion
b. Uttering speech with Emotion, possibly spatially located on the lips of an avatar
c. Animating:
i. Eyes, lips and facial muscles of an avatar to display an Emotion.	X		X
ii. Lips in sync with an uttered speech.	X	X	X
d. Expressing/displaying a sequence of Emotions/Expressions that are congruent with
i. Text, Expressions and Emotion of the other party	X	X	X
ii. The Machine’s response and its associated Expressions and Emotions		X	X
e. Gazing at the other party it is conversing with		X	X

MPSI-MMC V2 Functional Requirements are being finalised. This is the current plan for developing the new standard:

Functional Requirements	2022/02/23
Commercial Requirements	2022/06/15
Call for Technologies	2022/07/13
Response to Call due	2022/10/10
Standard Development	2022/10/12
Technical Specification	2023/02/08

Watermarking and AI

The term watermarking comprises a family of methodological and application tools used to imperceptibly and persistently insert data into a content item. Watermarking is used for different purposes such as to enable an entity to claim ownership of or a device to use the content item.

As a neural network is one type of content – and one that may be quite expensive to develop – is the watermarking notion applicable to neural networks? MPAI thinks it is and is working to develop requirements for a Neural Network Watermarking (NNW) standard called MPAI-NNW that will enable a watermarking technology provider to qualify their products. The standard will provide the means to measure, for a given size of the watermarking payload, the ability of:

The watermark inserter to inject a payload without deteriorating the performance of the Neural Network. This item requires for a given application domain:
- A testing dataset to be used for the watermarked and unwatermarked neural network.
- An evaluation methodology to assess any change of the performance induced by the watermark.
The watermark detector to recognize the presence of the inserted watermark when applied to a watermarked network that has been modified (e.g., by transfer learning or pruning) or to any of the inferences of the modified model. This item requires for a given application domain:
- A list of potential modification types expected to be applied on the watermarked neural network as well as of their ranges (e.g., random pruning at 25%).
- Performance criteria for the watermark detector (e.g., relative numbers of missed detection and false alarm).
The watermark decoder to successfully retrieve the payload when applied to a watermarked network that has been modified (e.g., by transfer learning or pruning) or to any of the inferences of the modified model. This item requires for a given application domain:
- A list of potential modification types expected to be applied on the watermarked neural network as well as of their ranges (e.g., random pruning at 25%).
- Performance criteria for the watermark decoder (e.g., 100% or (100-α)% recovery).
The watermark inserter to inject a payload at a low computational cost, e.g., execution time on a given processing environment.
The watermark detector/decoder to detect/decode a payload from a watermarked model or from any of its inferences, at a low computational cost, e.g., execution time on a given processing environment.

The work of developing requirements for the MPAI-NNW is ongoing. Participation in the work is open to non-members. Contact the MPAI Secretariat if you wish to join the MPAI-NNW online meetings.

Activities in the next meeting cycle

Group name	Apr 25-29	May 02 – 06	May 09 – 13	May 16 – 20	Time (UTC)
Group name	Apr 25-29	May 02 – 06	May 09 – 13	May 16 – 20	Time (UTC)
AI Framework	25	2	9	16	15
Governance of MPAI Ecosystem	25	2	9	16	16
Mixed-reality Collaborative Spaces	25	2	9	13	17
Multimodal Conversation	26	3	10	14	14
Neural Network Watermarking	26	3	10	14	15
Context-based Audio enhancement	26	3	10	14	16
Connected Autonomous Vehicles		4	11	18	12
AI-Enhanced Video Coding	27		11		14
AI-based End-to-End Video Coding				17	13
AI-based End-to-End Video Coding		4			14
Avatar Representation and Animation	28	5	12		13:30
Server-based Predictive Multiplayer Gaming	28	5	12		14:30
AIM Health		6
Communication	28		12		15
Industry and Standards	29		13		16
General Assembly (MPAI-19)				18	15

Cookie	Duration	Description
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Technical".
CookieLawInfoConsent	1 year	The cookie is set by the GDPR Cookie Consent plug-in and is used to store whether the user has consented to the use of cookies or not. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_pk_id.6.08a8	13 months	Used to store a few details about the user such as the unique visitor ID
_pk_ses.6.08a8	30 minutes	Short lived cookies used to temporarily store data for the visit

MPAI alerts industry of upcoming MPAI-MMC V2 Call for Technologies

Notice