<-References     Go to ToC      AI Modules->

Technical Specification: Portable Avatar Format (MPAI-PAF) V1.3 together with other MPAI Technical Specifications provides technologies enabling the implementation of the Avatar-Based Videoconference Use Case, a form of videoconference held in Virtual Environments populated by Avatars representing humans showing their visual appearance and uttering their voices.

MPAI-PAF) V1.3 assumes that implementations will be based on Technical Specification: AI Framework (MPAI-AIF) V2.1.

Table 1 displays the full list of AIWs specified by MPAI-PAF V1.2. Click a listed AIW to access its dedicated page, which includes its functions, reference model, I/O Data, Functions of AIMs, I/O Data of AIMs, and a table providing links to the AIW-related AIW, AIMs, and JSON metadata.

All previously specified MPAI-PAF AI-Workflows are superseded by those specified by V2.2 but may be used if the version is explicitly mentioned.

Acronym Names and Specifications of AI Workflows JSON
PAF-CTX Videoconference Client Transmitter X
MMC-VMS Virtual Meeting Secretary X
PAF-AVS Avatar Videoconference Server X
PAF-CRX Videoconference Client Receiver X

Figure 1 depicts the system composed of four types of subsystems specified as AI Workflows.

Figure 1 – Avatar-Based Videoconference end-to-end diagram

The components of the PAF-ABV system:

  1. Participant: a human joining an ABV either individually or as a member of a group of humans in the same physical space.
  2. Audio-Visual Scene: a Virtual Audio-Visual Environment equipped with Visual Objects such as a Table and an appropriate number of chairs and Audio Objects described by Audio-Visual Scene Descriptors.
  3. Portable Avatar: a data set specified by MPAI-PAF including data representing a human participant.
  4. Videoconference Client Transmitter:
    • At the beginning of the conference:
      • Receives from Participants and sends to the Server Portable Avatars containing the Avatar Models and Language Selectors.
      • Sends to the Server Speech Object and Face Object for Authentication.
    • Continuously sends to the Server Portable Avatars containing Avatar Descriptors and Speech.
  5. The Avatar Videoconference Server
    • At the beginning of the conference:
      • Selects the Audio-Visual Descriptors, e.g., a Meeting Room.
      • Equips the Room with Objects, i.e., Table and Chairs.
      • Places Avatar Models around the Table with a given Spatial Attitude.
      • Distributes Portable Avatars containing Avatars Models, their Speech Objects and Spatial Attitudes, and Audio-Visual Scene Descriptors to all Receiving Clients.
      • Authenticates Speech and Face Objects and assigns IDs to Avatars.
      • Sets the common conference language.
    • Continuously:
      • Translates Speech to Participants according to their Language Selectors.
      • Sends Portable Avatars containing Avatar Descriptors, Speech, and Spatial Attitude of Participants and Virtual Meeting Secretary to all Receiving Clients and Virtual Meeting Secretary.
  6. Virtual Meeting Secretary is an Avatar not corresponding to any Participant that continuously:
    • Uses the common meeting language.
    • Understands Text Objects and Speech Objects of all Avatars and extracts their Personal Statuses.
    • Drafts a Summary of its understanding of Avatars’ Text Objects, Speech Objects, and Personal Status.
    • Displays the Summary either to:
      • Outside of the Virtual Environment for participants to read and edit directly, or
      • The Visual Space for Avatars to comment, e.g., via Text Objects.
    • Refines the Summary.
    • Sends its Portable Avatar containing its Avatar Descriptors to the Server.
  7. Videoconference Client Receiver:
    • At the beginning of the conference:
      • Receives Audio-Visual Scene Descriptors and Portable Avatars containing Avatar Models with their Spatial Attitudes.
    • Continuously:
      • Receives Portable Avatars with Avatar Descriptors and Speech.
      • Produces Visual Scene Descriptors and Audio Scene Descriptors.
      • Renders the Audio-Visual Scene by spatially adding the Avatars’ Speech Objects to the Spatial Attitude of the respective Avatars’ Mouths. Rendering may be done from a Point of View, possibly different from the Position assigned to their Avatars in the Visual Scene, selected by participant who use a device of their choice (Head Mounted Display or 2D display/earpad) to experience the Audio-Visual Scene.

Each component of the Avatar-Based Videoconference Use Case is implemented as an AI Workflow (AIW) composed of AI Modules (AIMs). Each AIW includes the following elements:

1 Functions of the AIW The functions performed by the AIW implementing the Use Case.
2 Reference Model of the AIW The Topology of AIMs in the AIW.
3 Input and Output Data of the AIW Input and Output Data of the AIW.
4 Functions of the AIMs Functions performed by the AIMs.
5 Input and Output Data of the AIW Input and Output Data of the AIMs.
6 AIW, AIMs, and JSON Metadata Links to summary specification on the web of the AIMs and corresponding JSON Metadata [2].

 

<-References     Go to ToC      AI Modules->