1     Functions 2     Reference Model 3     Input and Output Data
4     Functions of AI Modules 5     I/O Data of AI Modules 6    AIW, AIMs and JSON Metadata  

1      Functions of Client Transmitter

The function of a Client Transmitter is to:

  1. Receive from a Participant:
    • Input Audio from the microphone.
    • Input Visual from the camera.
    • Participant’s Avatar Model.
    • Participant’s language preferences (e.g., EN-US, IT-CH).
  2. Send to the Server:
    • Speech Object (for Authentication).
    • Face Object (for Authentication).
    • Input Portable Avatars containing:
      • Language preferences (at the start).
      • Avatar Model (at the start).
      • Speech.
      • Avatar Descriptors.

2      Reference Model of Client Transmitter

Figure 1 gives the Reference Model of Client Transmitter AIW. Red text refers to data sent at meeting start.

Figure 1 – Reference Model of Avatar

At the start, each participant provides:

  1. Language Selector
  2. Avatar Model.
  3. Speech Object (for Authentication).
  4. Face Object (for Authentication).
  5. Participant ID

During the videoconference:

  1. Audio-Visual Scene Description produces Speech Objects, Face Objects, Face Descriptors, Body Descriptors and Audio-Visual Scene Geometry.
  2. Automatic Speech Recognition produces Recognised Text.
  3. Personal Status Extraction produces Personal Status.
  4. Multiplexes Recognised Text, Face Descriptors, Body Descriptors, Personal Status
  5. Portable Avatar Multiplexing multiplexes Recognised Text, Personal Status, Input Speech, Face Descriptors, Body Descriptors,  Language Selector, Avatar Model, and Participant ID.
  6. Videoconference Client Transmitter sends Portable Avatars to Avatar Videoconference Server that the Server processes and re-distributes to Client Receivers.

3      Input and Output Data of Client Transmitter

Table 1 gives the input and output data of the Client Transmitter AIW:

Table 1 – Input and output data of Client Transmitter AIW

Input Description
Input Text Chat text used by a human to communicate with Virtual Meeting Secretary or other participants
Language Selector The language participant wishes to speak and hear.
Input Audio Audio of Speech of participants in a meeting room.
Input Visual Video of participants in a meeting room.
Avatar Model The avatar model selected by the participant.
Output Description
Speech Object An utterance of a Participant used by Server for authentication.
PartixipantPortable Avatar Portable Avatar produced by Client Transmitter.
Face Object Participant’s face used by Server for authentication.

4      Functions of Client Transmitter’s AI Modules

Table 2 gives the functions of AI Modules of the Client Transmitter AIW.

Table 2 – AI Modules of Client Transmitter AIW

AIM Function
Audio-Visual Scene Description 1.     Receives Input Audio and Input Visual.
2.     Provides Input Speech, Speech Object, Face Descriptors, Body Descriptors, Face Object.
Automatic Speech Recognition 1.     Receives Input Speech.
2.     Provides Recognised Text.
Personal Status Extraction 1.     Receives Recognised Text, Speech, Face Descriptors, Body Descriptors,.
2.     Provides the Participant’s Personal Status.
Portable Avatar Multiplexing 1.     Receives Language Preference, Avatar Model, Input Text, Input Speech, Recognised Speech, Personal Status, Participant ID, Face Descriptors, Body Descriptors.
2.     Provides Participant Portable Avatars.

5      I/O Data of Client Transmitter’s AI Modules

Table 3 gives the AI Modules of Client Transmitter AIW.

Table 3 – AI Modules of Client Transmitter AIW

AIM Input Output
Audio-Visual Scene Description Input Audio
Input Visual
Input Speech
Speech Object
Face Object
Face Descriptors
Body Descriptors
Automatic Speech Recognition Speech Objects Recognised Text
Personal Status Extraction Recognised Text
Input Speech
Face Object
Body Object
Personal Status
Portable Avatar Multiplexing Recognised Text
Personal Status
Input Speech
Face Descriptors
Body Descriptors
Input Text
Language Selector
Avatar Model
Participant ID
Portable Avatars.

6      AIW, AIM, and JSON Metadata of Videoconference Client Transmitter

Table 4 – AIW, WIMs, and JSON Metadata

AIW AIMs/1   AIMs/2 Name JSON
PAF-CTX Videoconference Client Transmitter X
OSD-AVS Audio-Visual Scene Description X
CAE-ASD Audio Scene Description X
CAE-AAT Audio Analysis Transform X
CAE-ASL Audio Source Localisation X
CAE-ASE Audio Separation and Enhancement X
CAE-AST Audio Synthesis Transform X
CAE-AMX Audio Descriptors Multiplexing X
OSD-VSD Visual Scene Description X
OSD-AVA Audio-Visual Alignment X
MMC-ASR Automatic Speech Recognition X
MMC-PSE Personal Status Extraction X
MMC-ITD Entity Text Description X
MMC-ISD Entity Speech Description X
PAF-IFD Entity Face Description X
PAF-IBD Entity Body Description X
MMC-PTI PS-Text Interpretation X
MMC-PSI PS-Speech Interpretation X
PAF-PFI PS-Face Interpretation X
PAF-PGI PS-Gesture Interpretation X
MMC-PMX Personal Status Multiplexing X
PAF-PMX Portable Avatar Multiplexing X