<-References Go to ToC AI Modules->
1. Technical Specification | 2. Reference Software | 3. Conformance Testing | 4. Performance Assessment |
1. Technical Specifications
Technical Specification: Portable Avatar Format (MPAI-PAF) V1.5, jointly with other MPAI Technical Specifications, provides technologies for the digital representation of 3D Model Data that enable the Avatar-Based Videoconference, a form of videoconference held in a Virtual Environment populated by speaking Avatars and implemented as an AI Workflow specified by Technical Specification: AI Framework (MPAI-AIF) V2.2.
Table 1 lists the AIWs specified by MPAI-PAF V1.5. Links lead to the individual MPAI-PAF V1.5 AIW specifications. They include AIW Function, Reference Model, and I/O Data and, for each AIM, Functions, I/O Data, and links to the AIMs and JSON metadata related to the AI Workflow.
All previously specified MPAI-PAF AI-Workflows are superseded by those specified by V1.5 but may be used if their version is explicitly mentioned.
Table 1 – AIWs specified by MPAI-PAF V1.5
Acronym | Names and Specifications of AI Workflows | JSON |
PAF-CTX | Videoconference Client Transmitter | X |
MMC-VMS | Virtual Meeting Secretary | X |
PAF-AVS | Avatar Videoconference Server | X |
PAF-CRX | Videoconference Client Receiver | X |
Figure 1 depicts the Avatar-Based Videoconference system composed of four subsystems (AI Workflows).
Figure 1 – Avatar-Based Videoconference end-to-end diagram
The components of the PAF-ABV system:
- Participant: a human joining an ABV either individually or as a member of a group of humans in the same physical space.
- Audio-Visual Scene: a Virtual Audio-Visual Environment populated by Visual Objects such as a Table, chairs etc., and Audio Objects. The Environment is described by Audio-Visual Scene Descriptors.
- Portable Avatar: an MPAI-PAF-specified Data Type including data representing a human participant.
- Videoconference Client Transmitter (VCT):
- At the beginning of the conference, VCT:
- Receives from Participants and sends to the Server Portable Avatars containing Avatar Model(s) and Language Selectors.
- Sends to the Server Speech Object and Face Object for Authentication.
- Continuously updates the Server with Portable Avatars containing Avatar Descriptors and Speech.
- At the beginning of the conference, VCT:
- The Avatar Videoconference Server (AVS)
- At the beginning of the conference, AVS:
- Selects specific Audio-Visual Descriptors, representing the meeting room.
- Equips the room with objects, e.g., table and chairs.
- Places Avatar Models around the Table with a specific Spatial Attitude.
- Distributes Portable Avatars containing Audio-Visual Scene Descriptors, Avatar Models, their Speech Objects and Spatial Attitudes to all Videoconference Client Receivers.
- Assigns IDs to Avatars after authenticating Speech and Face Objects.
- Sets the common conference language.
- Continuously:
- Translates and sends to Participants Speech according to their Language Selectors.
- Sends Portable Avatars containing Avatar Descriptors, Speech, and Spatial Attitude of Participants and Virtual Meeting Secretary to all Receiving Clients and Virtual Meeting Secretary.
- At the beginning of the conference, AVS:
- Virtual Meeting Secretary is represented as an Avatar but does not correspond to any Participant and continuously:
- Uses the common meeting language.
- Understands Text Objects and Speech Objects of all Avatars and extracts their Personal Statuses.
- Drafts a Summary of its understanding of Avatars’ Text Objects, Speech Objects, and Personal Statuses.
- Sends the Summary outside of the Virtual Environment for participants to read and edit directly, and/or displays the Summary ihe Visual Space for Avatars to comment, e.g., via Text Objects.
- Refines the Summary based on comments received.
- Sends its Portable Avatar containing its Avatar Descriptors to the Server.
- Videoconference Client Receiver (VCR):
- At the beginning of the conference, the VCR:
- Receives Audio-Visual Scene Descriptors and Portable Avatars containing Avatar Models with their Spatial Attitudes.
- Continuously:
- Receives Portable Avatars with Avatar Descriptors and Speech.
- Produces Visual Scene Descriptors and Audio Scene Descriptors.
- Renders the Audio-Visual Scene by spatially adding the Avatars’ Speech Objects to the Spatial Attitude of the respective Avatars’ Mouths.
Rendering may be done from a Point of View, that is possibly different from the Position assigned to their Avatars in the Visual Scene. This is selected by participant who use a device of their choice (Head Mounted Display or 2D display/earpad) to experience the Audio-Visual Scene.
- At the beginning of the conference, the VCR:
Each component of the Avatar-Based Videoconference Use Case is implemented as an AI Workflow (AIW) composed of AI Modules (AIMs). Each AIW includes the following elements:
1 | Functions of the AIW | The functions performed by the AIW implementing the Use Case. |
2 | Reference Model of the AIW | The Topology of AIMs in the AIW. |
3 | Input and Output Data of the AIW | Input and Output Data of the AIW. |
4 | Functions of the AIMs | Functions performed by the AIMs. |
5 | Input and Output Data of the AIW | Input and Output Data of the AIMs. |
6 | AIW, AIMs, and JSON Metadata | Links to summary specification on the web of the AIMs and corresponding JSON Metadata [2]. |
#2. Reference Software
As a rule, MPAI provides Reference Software implementing the Technical Specification released with the BSD-3-Clause licence and the following disclaimers
- The purpose of the Reference Software is to demonstrate a working Implementation of an AIW, not to provide a ready-to-use product.
- MPAI disclaims the suitability of the Software for any other purposes than those of the MPAI-PAF Standard, and does not guarantee that it offers the best performance and that it is secure.
- Users shall verify that they have the right to use any third-party software required by this Reference Software, e.g., by accepting the licences from third-party repositories.
Note that at this stage only part of the AIMs required to operate the MPAI-PAF AIWs have a Reference Software Implementation.
3. Conformance Testing
An implementation of an AI Workflow conforms with MPAI-PAF if it accepts as input and produces as output Data and/or Data Objects (Data of a Data Type and its Qualifier) conforming with those specified by MPAI-PAF.
The Conformance of an instance of a Data is to be expressed by a sentence like “Data validates against the Data Type Schema”. This means that:
- Any Data Sub-Type is as indicated in the Qualifier.
- The Data Format is indicated by the Qualifier.
- Any File and/or Stream have the Formats indicated by the Qualifier.
- Any Attribute of the Data is of the type or validates against the Schema specified in the Qualifier.
The method to Test the Conformance of a Data or Data Object instance is specified in the Data Types chapter.
4. Performance Assessment
Performance is a multidimensional entity because it can have various connotations. Therefore, the Performance Assessment Specification should provide methods to measure how well an AIW performs its function, using a metric that depends on the nature of the function, such as:
- Quality: the Performance of a Videoconference Client Transmitter AIW can measure how well the AIW represents the human Participant.
- Bias: Performance of a Videoconference Client Receiver AIW can reproduce the avatar videoconference.
- Legal compliance: the Performance of an AIW can measure the compliance of the AIW to a regulation, e.g., the European AI Act.
- Ethical compliance: the Performance Assessment of an AIW can measure the compliance of an AIW to a target ethical standard.
Note that at this stage MPAI-PAF AIWs do not have a Performance Assessment Specification.