<- References Go to ToC Composite AIMs->
Technical Specification: Portable Avatar Format (MPAI-PAF) V1.1 enables the implementation of the Avatar-Based Videoconference Use Case (PAF-ABV) enabling a form of videoconference held in a Virtual Environment populated by Avatars representing humans. The Avatars showing the humans’ visual appearance and utter their voices. Figure 2 depicts the system composed of four types of subsystems implemented as MPAI-AIF AI Workflows (AIW) whose specifications are available at the following links:
- Videoconference Client Transmitter
- Avatar Videoconference Server
- Virtual Meeting Secretary
- Videoconference Client Receiver..
Figure 2 – Avatar-Based Videoconference end-to-end diagram
The components of the PAF-ABV system:
- Participant: a human joining an ABV either individually or as a member of a group of humans in the same physical room.
- Audio-Visual Scene: a virtual audio-visual space equipped with Visual Objects such as a table and an appropriate number of chairs and Audio Objects described by Audio-Visual Scene Descriptors.
- Portable Avatar: represents a human participant represented in the Portable Avatar Format (PAF).
- Videoconference Client Transmitter:
- At the beginning of the conference,
- Receives from Participants and sends to the Server Portable Avatars containing the Avatar Models and Language Preferences.
- Sends to the Server Speech Object and Face Object for Authentication.
- Continuously sends to the Server Portable Avatars containing Avatar Descriptors and Speech.
- At the beginning of the conference,
- The Avatar Videoconference Server
- At the beginning:
- Selects a Visual Environment Model, e.g., a meeting room.
- Equips the room with objects, i.e., meeting table and chairs.
- Places Avatar Models around the table with a given Spatial Attitude.
- Distributes Environment and Portable Avatars containing Avatars Models, and their Spatial Attitudes to all Receiving Clients.
- Authenticates Speech and Face Objects and assigns IDs to Avatars.
- Sets the common conference language.
- Continuously:
- Translates Speech to Participants according to their Language Preferences.
- Sends Portable Avatars containing Avatar Descriptors, Speech, and Spatial Attitude of Participants and Virtual Meeting Secretary to all Receiving Clients and Virtual Meeting Secretary.
- At the beginning:
- Virtual Meeting Secretary is an Avatar not corresponding to any Participant that continuously:
- Uses the common meeting language.
- Understands Avatars’ utterances and extracts their Personal Statuses.
- Drafts a Summary of its understanding of Avatars’ Text and Personal Status.
- Displays the Summary either to:
- Outside of the Environment for Participants to read and edit directly, or
- The Visual Environment for Avatars to comment, e.g., via Text.
- Refines the Summary.
- Sends its Portable Avatar containing its Avatar Descriptors to the Server.
- Videoconference Client Receiver:
- At the beginning
- receives Visual Environment and Portable Avatars containing Avatar Models with Spatial Attitudes.
- Continuously:
- Receives Portable Avatars with Avatar Descriptors and Speech.
- Produces Visual and Audio Scene Descriptors.
- Renders the Audio-Visual Scene by spatially adding the participants’ utterances to the Spatial Attitude of the respective Avatars’ mouths. Rendering may be done from a Point of View, possibly different from the position assigned to their Avatars in the Visual Environment, selected by participant who use a device of their choice (HMD or 2D display/earpad).
- At the beginning
Each component of the Avatar-Based Videoconference Use Case is implemented as an AI Workflow (AIW) composed of AI Modules (AIMs) executed in an AI Framework. Basic notions concerning Technical Specification: AI Framework (MPAI-AIF) V2.0 are available here.
It includes the following elements:
- Functions of the AIW
- Reference Architecture of the AIW
- Input and Output Data of the AIW
- Functions of the AIMs
- Input and Output Data of the AIMs
- JSON Metadata of the AIW and its AIMs.