“Digital humans” are computer-created digital objects that can be rendered with a human appearance and called Avatars. As Avatars have mostly been created, animated, and rendered in closed environments, it is no surprise that there has been very little need for standards.

In a communication context, say, in an interoperable metaverse, digital humans may not be constrained to be in a closed environment. Therefore, if a sender requires that a remote receiving client reproduce a digital human as intended by the sender, standards are needed.

Technical Specification: Portable Avatar Format is a first response to this need, with the following goals:

  • Objective1: To enable a user to reproduce a virtual environment as intended.
  • Objective2: to enable a user to reproduce a sender’s avatar and its animation as intended by the sender.
  • Objective3: to estimate the personal status of a human or avatar.
  • Objective4: to display an avatar with a selected personal status.

Personal Status is a data type standardised by Multimodal Conversation V2 representing the ensemble of the information internal to a person, including Emotion, Cognitive State, and Attitude. See more on Personal Status here.

The MPAI-PAF standard has been designed to provide all the standards that are required to implement the Avatar-Based Videoconference Use Case where Avatars, having the visual appearance and uttering the real voice of human participants, meet in a virtual environment (Figure 1).

Figure 1 – Avatar-Based Videoconference

MPAI-PAF assumes that the system is composed of fours subsystems, as depicted in Figure 2.

Figure 2 – Avatar-Based Videoconference System

This is how the system works:

Remotely located Transmitting Clients sends to Server:

  1. At the beginning:
    1. Avatar Model(s) and Language Preferences.
    2. Speech Object and Face Object for Authentication.
  2. Continuously sends:
    1. Avatar Descriptors and Speech to Server.

The Server:

  1. At the beginning:
    1. Selects an Environment, e.g., a meeting room.
    2. Equips the room with objects, i.e., meeting table and chairs.
    3. Places Avatar Models around the table.
    4. Distributes Environment, Avatars, and their positions to all receiving Clients.
    5. Authenticates Speech and Face Objects
  2. Continuously:
    1. Translates Speech from participants according to Language Preferences.
    2. Sends Avatar Descriptors and Speech to receiving Clients.

The Virtual Secretary

  1. Receives Text, Speech, and Avatar Descriptors of conference participants.
  2. Recognises Speech streams.
  3. Refines Recognised Text and extracts Meaning.
  4. Extracts Avatars’ Personal Status.
  5. Produces a Summary.
  6. Produces Edited Summary using the comments received from participants.
  7. Produces Text and Personal Status.
  8. Creates Speech and Avatar Descriptors from Text and Personal Status.

The Receiving Clients:

  1. At the beginning:
    1. Environment Model
    2. Avatar Models
    3. Spatial Attitudes
  2. Continuously:
    1. Creates Audio and Visual Scene Descriptors.
    2. Renders the Audio-Visual Scene from the Point of View selected by Participant.

Only the Receiving Client of Avatar-Based Videconference is depicted in Figure 3.

Figure 3 – Receiving Client of Avatar-Based Videconference

The data types use by the Avatar-Based Videconference use case are given by Table 1.

Table 1 – Data Types used by PAF-ABV

Name of Data Format Specified by
Environment OSD
Body Model ARA
Body Descriptors ARA
Face Model ARA
Face Descriptors ARA
Avatar Model ARA
Avatar Descriptors ARA
Spatial Attitude OSD
Audio Scene Descriptors CAE
Visual Scene Descriptors OSD
Text MMC
Language identifier MMC
Meaning MMC
Personal Status MMC

We note that MPAI-PAF only specifies Body Model and Descriptors, Face Model and Descriptors, and Avatar Model and Descriptors. Three other MPAI standards provide the needed specifications.

The MPAI-PAF Working Draft (html, pdf) is published with a request for Community Comments. See also the video recordings (YT, WimTV) and the slides of the presentation made on 07 September. Comments should be sent to the MPAI Secretariat by 2023/09/26T23:59 UTC. MPAI will use the Comments received to develop the final draft planned to be published at the 36th General Assembly (29 September 2023).

As we said, this is a first contribution to avatar interoperability. MPAI will continue the development of Reference Software, start the development of Conformance Testing and study extensions of MPAI-PAF (e.g., compression of Avatar Description).