1       Introduction

Technical Specification: Portable Avatar Format (V1.2) enables an implementation of the Avatar-Based Videoconference Use Case.  Table 2 lists all Data Types required by the Portable Avatar Data Type and MPAI Technical Specifications supporting them. Chapter 7 provides the full specification including references of the MPAI-PAF Data Formats.

Table 2 – Data Types of Technical Specification: Portable Avatar Format

Data Type Technical Specification
Space, Objects and Scenes
Coordinate Systems MPAI-PAF
Spatial Attitude and Point of View MPAI-OSD
Audio Scene MPAI-CAE
Visual Scene MPAI-OSD
Audio-Visual MPAI-OSD
Portable Avatar Format MPAI-PAF
Body
Body Model MPAI-PAF
Body Descriptors MPAI-PAF
Face
Face Model MPAI-PAF
Face Descriptors MPAI-PAF
Avatar
Avatar Model MPAI-PAF
Avatar Descriptors MPAI-PAF
Other Data Types
Text MPAI-MMC
Language identifier MPAI-MMC
Meaning MPAI-MMC
Personal Status MPAI-MMC

2       Scope of Use Case

The MPAI-PAF Avatar-Based Videoconference (PAF-ABV) Use Case enables a form of videoconference held in a Virtual Environment populated by Avatars representing humans showing their visual appearance and uttering their voices. Figure 2 depicts the system composed of four types of subsystems:

  1. Videoconference Client Transmitters
  2. Avatar Videoconference Server
  3. Virtual Meeting Secretary
  4. Videoconference Client Receivers.

Figure 2 – Avatar-Based Videoconference end-to-end diagram

The components of the PAF-ABV system:

  1. participant: a human joining an ABV either individually or as a member of a group of humans in the same physical space.
  2. Audio-Visual Scene: a Virtual Audio-Visual Environment equipped with Visual Objects such as a Table and an appropriate number Cf chairs and Audio Objects described by Audio-Visual Scene Descriptors.
  3. Portable Avatar: digitally represents a human participant as part of a Portable Avatar Format (PAF).
  4. Client Transmitter:
    • At the beginning of the conference:
      • Receives from Participants and sends to the Server Portable Avatars containing the Avatar Models and Language Preferences.
      • Sends to the Server Speech Object and Face Object for Authentication.
    • Continuously sends to the Server Portable Avatars containing Avatar Descriptors and Speech.
  5. The Avatar Videoconference Server
    • At the beginning:
      • Selects the Audio-Visual Descriptors, e.g., a Meeting Room.
      • Equips the Room with Objects, i.e., Table and Chairs.
      • Places Avatar Models around the table with a given Spatial Attitude.
      • Distributes Environment and Portable Avatars containing Avatars Models, and their Spatial Attitudes to all Receiving Clients.
      • Authenticates Speech and Face Objects and assigns IDs to Avatars.
      • Sets the common conference language.
    • Continuously:
      • Translates Speech to Participants according to their Language Preferences.
      • Sends Portable Avatars containing Avatar Descriptors, Speech, and Spatial Attitude of Participants and Virtual Meeting Secretary to all Receiving Clients and Virtual Meeting Secretary.
  6. Virtual Meeting Secretary is an Avatar not corresponding to any Participant that continuously:
    • Uses a common meeting Language.
    • Understands Avatars’ utterances and extracts their Personal Statuses.
    • Drafts a Summary of its understanding of Avatars’ Text and Personal Status.
    • Displays the Summary either to:
      • Outside of the Virtual Environment for participants to read and edit directly, or
      • The Visual Space for Avatars to comment, e.g., via Text.
    • Refines the Summary.
    • Sends its Portable Avatar containing its Avatar Descriptors to the Server.
  7. Client Receiver:
    • At the beginning:
    • Receives Visual Scene Descriptors and Portable Avatars containing Avatar Models with Spatial Attitudes.
    • Continuously:
      • Receives Portable Avatars with Avatar Descriptors and Speech.
      • Produces Visual and Audio Scene Descriptors.
      • Renders the Audio-Visual Scene by spatially adding the Avatars’ Utterances to the Spatial Attitude of the respective Avatars’ Mouths. Rendering may be done from a Point of View, possibly different from the Position assigned to their Avatars in the Visual Scene, selected by participant who use a device of their choice (Head Mounted Display or 2D display/earpad) to experience the Audio-Visual Scene.

Each component of the Avatar-Based Videoconference Use Case is implemented as an AI Workflow (AIW) composed of AI Modules (AIMs). It includes the following elements:

1 Functions of the AIW The functions performed by the AIW implementing the Use Case.
2 Reference Model of the AIW The Topology of AIMs in the AIW.
3 Input and Output Data of the AIW Input and Output Data of the AIW.
4 Functions of the AIMs Functions performed by the AIMs.
5 Input and Output Data of the AIW Input and Output Data of the AIMs.
6 AIMs and JSON Metadata Links to summary specification on the web of the AIMs and corresponding JSON Metadata [2].

Most MPAI Standards specify Use Cases that are implemented as AI Workflows. This page lists AI Workflows per Technical Specifications.

Acronym Names and Specifications of AI Workflows
PAF-CTX Videoconference Client Transmitter
MMC-VMS Virtual Meeting Secretary
PAF-AVS Avatar Videoconference Server
PAF-CRX Videoconference Client Receiver