<-Foreword     Go to ToC      Scope->

(Informative)

There is a long history of computer-created objects called “digital humans”, i.e., digital objects having a human appearance when rendered. In most cases the underlying assumption of these objects has been that creation, animation, and rendering is done in a closed environment. Such digital humans had little or no need for standards.

In a communication and more so in a metaverse context, there are many cases where a digital human is not constrained within a closed environment thus requiring forms of standardisation. Technical Specification: Portable Avatar Format (MPAI-PAF) V1.3 – in the following also called MPAI-PAF V1.3 or MPAI-PAF – is a response to the requirements of new usage contexts. MPAI-PAF specifies a standard for Portable Avatar Format (PAF) enabling a receiving party to render a digital human as intended by the sending party.

MPAI-PAF V1.2 specifies the Avatar-Based Videoconference (PAF-ABV) AI Workflow where:

  1. Client Transmitters send PAFs containing:
    • At the beginning: Avatar Models, Language Selector, and Speech Object and Face Object for participant authentication.
    • Continuously: Avatar Descriptors, and Speech Objects to a Server.
  1. Avatar Videoconference Server:
    • At the beginning:
      • Selects an Environment, i.e., a meeting room and equips it with objects, i.e., meeting table and chairs.
      • Places Avatar Models around the table.
      • Distributes for each participant a PAF containing Environment, Avatar Models, and their positions to all receiving clients.
    • Continuously sends to receiving clients:
      • Translated Speech from participants according to Language Selectors.
      • Sends PAFs containing Avatar Descriptors and translated Speech.
  1. Client Receivers:
    • At the beginning: receive Environment and PAFs containing Avatar Models and Language Selectors from the server.
    • Continuously from the server:
      • Receive PAFs containing Avatar Descriptors and translated Speech.
      • Create Audio and Visual Scene Descriptors.
      • Render the Audio-Visual Scene as seen from the human participant-selected Point of View.

In all Chapters and Sections, Terms beginning with a capital letter are defined in Table 1 if they are specific to this Technical Specification and in Table 2 if they are common to all MPAI Technical Specifications. All Chapters, and Sections are Normative unless they are labelled as Informative.

 

<-Foreword     Go to ToC      Scope->