<-References Go to ToC AI Modules->

1. Technical Specification 2. Reference Software 3. Conformance Testing 4. Performance Assessment

1. Technical Specifications

Technical Specification: Portable Avatar Format (MPAI-PAF) V1.5, jointly with other MPAI Technical Specifications, provides technologies for the digital representation of 3D Model Data that enable the Avatar-Based Videoconference, a form of videoconference held in a Virtual Environment populated by speaking Avatars and implemented as an AI Workflow specified by Technical Specification: AI Framework (MPAI-AIF) V2.2.

Table 1 lists the AIWs specified by MPAI-PAF V1.5. Links lead to the individual MPAI-PAF V1.5 AIW specifications. They include AIW Function, Reference Model, and I/O Data and, for each AIM, Functions, I/O Data, and links to the AIMs and JSON metadata related to the AI Workflow.

All previously specified MPAI-PAF AI-Workflows are superseded by those specified by V1.5 but may be used if their version is explicitly mentioned.

Table 1 – AIWs specified by MPAI-PAF V1.5

Acronym Names and Specifications of AI Workflows JSON
PAF-CTX Videoconference Client Transmitter X
MMC-VMS Virtual Meeting Secretary X
PAF-AVS Avatar Videoconference Server X
PAF-CRX Videoconference Client Receiver X

Figure 1 depicts the Avatar-Based Videoconference system composed of four subsystems (AI Workflows).

Figure 1 – Avatar-Based Videoconference end-to-end diagram

The components of the PAF-ABV system:

  1. Participant: a human joining an ABV either individually or as a member of a group of humans in the same physical space.
  2. Audio-Visual Scene: a Virtual Audio-Visual Environment populated by Visual Objects such as a Table, chairs etc., and Audio Objects. The Environment is described by Audio-Visual Scene Descriptors.
  3. Portable Avatar: an MPAI-PAF-specified Data Type including data representing a human participant.
  4. Videoconference Client Transmitter (VCT):
    • At the beginning of the conference, VCT:
      • Receives from Participants and sends to the Server Portable Avatars containing Avatar Model(s) and Language Selectors.
      • Sends to the Server Speech Object and Face Object for Authentication.
    • Continuously updates the Server with Portable Avatars containing Avatar Descriptors and Speech.
  5. The Avatar Videoconference Server (AVS)
    • At the beginning of the conference, AVS:
      • Selects specific Audio-Visual Descriptors, representing the meeting room.
      • Equips the room with objects, e.g., table and chairs.
      • Places Avatar Models around the Table with a specific Spatial Attitude.
      • Distributes Portable Avatars containing Audio-Visual Scene Descriptors, Avatar Models, their Speech Objects and Spatial Attitudes to all Videoconference Client Receivers.
      •  Assigns IDs to Avatars after authenticating Speech and Face Objects.
      • Sets the common conference language.
    • Continuously:
      • Translates and sends to Participants Speech according to their Language Selectors.
      • Sends Portable Avatars containing Avatar Descriptors, Speech, and Spatial Attitude of Participants and Virtual Meeting Secretary to all Receiving Clients and Virtual Meeting Secretary.
  6. Virtual Meeting Secretary is represented as an Avatar but does not correspond to any Participant and continuously:
    • Uses the common meeting language.
    • Understands Text Objects and Speech Objects of all Avatars and extracts their Personal Statuses.
    • Drafts a Summary of its understanding of Avatars’ Text Objects, Speech Objects, and Personal Statuses.
    • Sends the Summary outside of the Virtual Environment for participants to read and edit directly, and/or displays the Summary ihe Visual Space for Avatars to comment, e.g., via Text Objects.
    • Refines the Summary based on comments received.
    • Sends its Portable Avatar containing its Avatar Descriptors to the Server.
  7. Videoconference Client Receiver (VCR):
    • At the beginning of the conference, the VCR:
      • Receives Audio-Visual Scene Descriptors and Portable Avatars containing Avatar Models with their Spatial Attitudes.
    • Continuously:
      • Receives Portable Avatars with Avatar Descriptors and Speech.
      • Produces Visual Scene Descriptors and Audio Scene Descriptors.
      • Renders the Audio-Visual Scene by spatially adding the Avatars’ Speech Objects to the Spatial Attitude of the respective Avatars’ Mouths.
        Rendering may be done from a Point of View, that is possibly different from the Position assigned to their Avatars in the Visual Scene. This is selected by participant who use a device of their choice (Head Mounted Display or 2D display/earpad) to experience the Audio-Visual Scene.

Each component of the Avatar-Based Videoconference Use Case is implemented as an AI Workflow (AIW) composed of AI Modules (AIMs). Each AIW includes the following elements:

1 Functions of the AIW The functions performed by the AIW implementing the Use Case.
2 Reference Model of the AIW The Topology of AIMs in the AIW.
3 Input and Output Data of the AIW Input and Output Data of the AIW.
4 Functions of the AIMs Functions performed by the AIMs.
5 Input and Output Data of the AIW Input and Output Data of the AIMs.
6 AIW, AIMs, and JSON Metadata Links to summary specification on the web of the AIMs and corresponding JSON Metadata [2].

#2. Reference Software

As a rule, MPAI provides Reference Software implementing the Technical Specification released with the BSD-3-Clause licence and the following disclaimers

  1. The purpose of the Reference Software is to demonstrate a working Implementation of an AIW, not to provide a ready-to-use product.
  2. MPAI disclaims the suitability of the Software for any other purposes than those of the MPAI-PAF Standard, and does not guarantee that it offers the best performance and that it is secure.
  3. Users shall verify that they have the right to use any third-party software required by this Reference Software, e.g., by accepting the licences from third-party repositories.

Note that at this stage only part of the AIMs required to operate the MPAI-PAF AIWs have a Reference Software Implementation.

3. Conformance Testing

An implementation of an AI Workflow conforms with MPAI-PAF if it accepts as input and produces as output Data and/or Data Objects (Data of a Data Type and its Qualifier) conforming with those specified by MPAI-PAF.

The Conformance of an instance of a Data is to be expressed by a sentence like “Data validates against the Data Type Schema”. This means that:

  • Any Data Sub-Type is as indicated in the Qualifier.
  • The Data Format is indicated by the Qualifier.
  • Any File and/or Stream have the Formats indicated by the Qualifier.
  • Any Attribute of the Data is of the type or validates against the Schema specified in the Qualifier.

The method to Test the Conformance of a Data or Data Object instance is specified in the Data Types chapter.

4. Performance Assessment

Performance is a multidimensional entity because it can have various connotations. Therefore, the Performance Assessment Specification should provide methods to measure how well an AIW performs its function, using a metric that depends on the nature of the function, such as:

  1. Quality: the Performance of a Videoconference Client Transmitter AIW can measure how well the AIW represents the human Participant.
  2. Bias: Performance of a Videoconference Client Receiver AIW can reproduce the avatar videoconference.
  3. Legal compliance: the Performance of an AIW can measure the compliance of the AIW to a regulation, e.g., the European AI Act.
  4. Ethical compliance: the Performance Assessment of an AIW can measure the compliance of an AIW to a target ethical standard.

Note that at this stage MPAI-PAF AIWs do not have a Performance Assessment Specification.

<-References Go to ToC AI Modules->