<-References Go to ToC AI Modules->
1. Technical Specification | 2. Reference Software | 3. Conformance Testing | 4. Performance Assessment |
1. Technical Specifications
Technical Specification: Portable Avatar Format (MPAI-PAF) V1.3, jointly with other MPAI Technical Specifications, provides technologies for the digital representation of 3D Model Data that enable the Avatar-Based Videoconference, a form of videoconference held in a Virtual Environment populated by speaking Avatars and implemented as an AI Workflow specified by Technical Specification: AI Framework (MPAI-AIF) V2.1.
Table 1 displays the full list of AIWs specified by MPAI-PAF V1.3. Click a listed AIW to access its dedicated page, which includes its functions, reference model, I/O Data, Functions of AIMs, I/O Data of AIMs, and a table providing links to the AIW-related AIW, AIMs, and JSON metadata.
All previously specified MPAI-PAF AI-Workflows are superseded by those specified by V1.3 but may be used if their version is explicitly mentioned.
Acronym | Names and Specifications of AI Workflows | JSON |
PAF-CTX | Videoconference Client Transmitter | X |
MMC-VMS | Virtual Meeting Secretary | X |
PAF-AVS | Avatar Videoconference Server | X |
PAF-CRX | Videoconference Client Receiver | X |
Figure 1 depicts the system composed of four types of subsystems specified as AI Workflows.
Figure 1 – Avatar-Based Videoconference end-to-end diagram
The components of the PAF-ABV system:
- Participant: a human joining an ABV either individually or as a member of a group of humans in the same physical space.
- Audio-Visual Scene: a Virtual Audio-Visual Environment equipped with Visual Objects such as a Table and an appropriate number of chairs and Audio Objects described by Audio-Visual Scene Descriptors.
- Portable Avatar: a data set specified by MPAI-PAF including data representing a human participant.
- Videoconference Client Transmitter:
- At the beginning of the conference:
- Receives from Participants and sends to the Server Portable Avatars containing the Avatar Models and Language Selectors.
- Sends to the Server Speech Object and Face Object for Authentication.
- Continuously sends to the Server Portable Avatars containing Avatar Descriptors and Speech.
- At the beginning of the conference:
- The Avatar Videoconference Server
- At the beginning of the conference:
- Selects the Audio-Visual Descriptors, e.g., a Meeting Room.
- Equips the Room with Objects, i.e., Table and Chairs.
- Places Avatar Models around the Table with a given Spatial Attitude.
- Distributes Portable Avatars containing Avatars Models, their Speech Objects and Spatial Attitudes, and Audio-Visual Scene Descriptors to all Receiving Clients.
- Authenticates Speech and Face Objects and assigns IDs to Avatars.
- Sets the common conference language.
- Continuously:
- Translates Speech to Participants according to their Language Selectors.
- Sends Portable Avatars containing Avatar Descriptors, Speech, and Spatial Attitude of Participants and Virtual Meeting Secretary to all Receiving Clients and Virtual Meeting Secretary.
- At the beginning of the conference:
- Virtual Meeting Secretary is an Avatar not corresponding to any Participant that continuously:
- Uses the common meeting language.
- Understands Text Objects and Speech Objects of all Avatars and extracts their Personal Statuses.
- Drafts a Summary of its understanding of Avatars’ Text Objects, Speech Objects, and Personal Status.
- Displays the Summary either to:
- Outside of the Virtual Environment for participants to read and edit directly, or
- The Visual Space for Avatars to comment, e.g., via Text Objects.
- Refines the Summary.
- Sends its Portable Avatar containing its Avatar Descriptors to the Server.
- Videoconference Client Receiver:
- At the beginning of the conference:
- Receives Audio-Visual Scene Descriptors and Portable Avatars containing Avatar Models with their Spatial Attitudes.
- Continuously:
- Receives Portable Avatars with Avatar Descriptors and Speech.
- Produces Visual Scene Descriptors and Audio Scene Descriptors.
- Renders the Audio-Visual Scene by spatially adding the Avatars’ Speech Objects to the Spatial Attitude of the respective Avatars’ Mouths. Rendering may be done from a Point of View, possibly different from the Position assigned to their Avatars in the Visual Scene, selected by participant who use a device of their choice (Head Mounted Display or 2D display/earpad) to experience the Audio-Visual Scene.
- At the beginning of the conference:
Each component of the Avatar-Based Videoconference Use Case is implemented as an AI Workflow (AIW) composed of AI Modules (AIMs). Each AIW includes the following elements:
1 | Functions of the AIW | The functions performed by the AIW implementing the Use Case. |
2 | Reference Model of the AIW | The Topology of AIMs in the AIW. |
3 | Input and Output Data of the AIW | Input and Output Data of the AIW. |
4 | Functions of the AIMs | Functions performed by the AIMs. |
5 | Input and Output Data of the AIW | Input and Output Data of the AIMs. |
6 | AIW, AIMs, and JSON Metadata | Links to summary specification on the web of the AIMs and corresponding JSON Metadata [2]. |
#2. Reference Software
As a rule, MPAI provides Reference Software implementing the Technical Specification released with the BSD-3-Clause licence and the following disclaimers
- The purpose of the Reference Software is to demonstrate a working Implementation of an AIW, not to provide a ready-to-use product.
- MPAI disclaims the suitability of the Software for any other purposes than those of the MPAI-PAF Standard, and does not guarantee that it offers the best performance and that it is secure.
- Users shall verify that they have the right to use any third-party software required by this Reference Software, e.g., by accepting the licences from third-party repositories.
Note that at this stage only part of the AIMs required to operate the MPAI-PAF AIWs have a Reference Software Implementation.
3. Conformance Testing
An implementation of an AI Workflow conforms with MPAI-PAF if it accepts as input and produces as output Data and/or Data Objects (Data of a Data Type and its Qualifier) conforming with those specified by MPAI-PAF.
The Conformance of an instance of a Data is to be expressed by a sentence like “Data validates against the Data Type Schema”. This means that:
- Any Data Sub-Type is as indicated in the Qualifier.
- The Data Format is indicated by the Qualifier.
- Any File and/or Stream have the Formats indicated by the Qualifier.
- Any Attribute of the Data is of the type or validates against the Schema specified in the Qualifier.
The method to Test the Conformance of a Data or Data Object instance is specified in the Data Types chapter.
4. Performance Assessment
Performance is a multidimensional entity because it can have various connotations. Therefore, the Performance Assessment Specification should provide methods to measure how well an AIW performs its function, using a metric that depends on the nature of the function, such as:
- Quality: the Performance of a Videoconference Client Transmitter AIW can measure how well the AIW represents the human Participant.
- Bias: Performance of a Videoconference Client Receiver AIW can reproduce the avatar videoconference.
- Legal compliance: the Performance of an AIW can measure the compliance of the AIW to a regulation, e.g., the European AI Act.
- Ethical compliance: the Performance Assessment of an AIW can measure the compliance of an AIW to a target ethical standard.
Note that at this stage MPAI-PAF AIWs do not have a Performance Assessment Specification.