About MPAI-MCS

This is the public page of Mixed-reality Collaborative Spaces (MPAI-MCS) standard. See the MPAI-MCS homepage.

MPAI-MCS is an MPAI standard project developing technologies for scenarios where geographically separated humans represented by avatars collaborate in virtual-reality spaces where:

Virtual Twins of humans – embodied in speaking avatars having a high level of similarity, in terms of voice and appearance, with their Human Twins – are directed by Human Twins to achieve an agreed goal.
Human-like speaking avatars, possibly without a visual appearance, not representing a human, e.g., a secretary taking notes of the meeting, answer questions, etc.

The space where the collaboration takes place is called Environment. It can be anything from a fictitious space to a replica of a real space.

MPAI is currently investigating the Use Case called Avatar-Based Videoconference where each participant is represented by an avatar sitting at a table. The avatars faithfully represent the participants with their speech, faces, and gestures. This is achieved by using the emotion extracted from speech, face and gesture of participants.

MPAI-MCS seeks to define standard formats for the Environment and for the Avatar so that, by owning an MCS client, a participant can:

Distribute their own avatars reproducing their activity and speech to other participants in the virtual conference.
Assemble the videcoconference room using the received avatars and participate in it.

The end-to-end block diagram of the Avatar-Based Videonference Use Case is given by the figure below where:

Each participant sends:
- server (at start): language preferences, avatar model, and speech and face descriptors for authentication.
- server and virtual secretary (during conferemce): avatar description, and speech and text.
The server sends each participant:
- (at start): Environment description and avatar models.
- (during conference): participant ID, speech and text in the requested language, avatar descriptors.
The virtual secretary sends each participant its own avatar model (at start), avatars descriptors, and speech and text.

Figure 1 – Reference Model of the Avatar-Based Videoconference

The figures below describe the internals of the 4 system components with a particular partitioning of functionality: transmitting client (Figure 2), server (Figure 3), virtual secretary (Figure 4), and receiving client (Figure 5). Different partitions are obtained by moving the internal components from one system component (blue blocks in the figure above) to another).

	At the start of the meeting the client sends language preference and avatar descriptors to the server. When the conference is on, the client client continuously generates audio and visual scene descriptors, the former providing the individual speech sources and their locations and the latter the individual humans in the room and their locations. Part of the visual descriptors are used to enable face-based participant authentication and part to generate the avatar descriptors. Part of the the speech descriptors are used to to enable face-based participant authentication and to provide additional information to Avatar Description to refine avatar descriptors. The participant speech is sent to the server as is.
Figure 2 – The Avatar-Based Videoconference client (transmitter)
	The server performs the function of: Distributing the Environment Model to participants. Authenticating participant using face and speech descriptors. Uniquely associating speech sources and avatar descriptors. Forwarding the received and processed information to participants.
Figure 3 – The Avatar-Based Videoconference server
	The virtual secretary produces a summary of the utterances of the avatars integrated by its understanding of their emotion. The summary can then be forwearded to an external application where participants can edit the summary. In a more sophisticated set up, avatars can interact with the virtual secretary by speech and text. The virtual secretary edits the summary taking into account the avatars’ utterances and their emotion.
Figure 4 – The Virtual Secretary of the Avatar-Based Videoconference
	The participant locates at positions of their liking the avatars generated by the clients with the associated speech. Selects the point from which to see/hear the videoconference (non necessarily the position of their avatar). Participates in the videoconference.
Figure 5 – An MCS client (receiver)

This use case is part of theMPAI-MMC Use Cases and Functional Requirements WD1.4. MPAI intends to issue a Call for Technologies on 9 July 2022. Anybody may respond to the Call. If a proposed technology is accepted, the proponent is requested to join MPAI.

MPAI-MCSis at the level of Use Cases and Functional Requirements. If you wish to participate in this work you have the following options

Join MPAI
Participate until the MPAI-MCS Functional Requirements are approved (after that only MPAI members can participate) by sending an email to the MPAI Secretariat.
Keep an eye on this page.

Return to the MPAI-MCS page

Cookie	Duration	Description
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Technical".
CookieLawInfoConsent	1 year	The cookie is set by the GDPR Cookie Consent plug-in and is used to store whether the user has consented to the use of cookies or not. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_pk_id.6.08a8	13 months	Used to store a few details about the user such as the unique visitor ID
_pk_ses.6.08a8	30 minutes	Short lived cookies used to temporarily store data for the visit

Notice