1 Functions | 2 Reference Model | 3 Input and Output Data |
4 Functions of AI Modules | 5 I/O Data of AI Modules | 6 AIW, AIMs and JSON Metadata |
1 Functions of Client Transmitter
The function of a Client Transmitter is to:
- Receive from a Participant:
- Input Audio from the microphone.
- Input Visual from the camera.
- Participant’s Avatar Model.
- Participant’s language preferences (e.g., EN-US, IT-CH).
- Send to the Server:
- Speech Object (for Authentication).
- Face Object (for Authentication).
- Input Portable Avatars containing:
- Language preferences (at the start).
- Avatar Model (at the start).
- Speech.
- Avatar Descriptors.
2 Reference Model of Client Transmitter
Figure 1 gives the Reference Model of Client Transmitter AIW. Red text refers to data sent at meeting start.
Figure 1 – Reference Model of Avatar
At the start, each participant provides:
- Language Selector
- Avatar Model.
- Speech Object (for Authentication).
- Face Object (for Authentication).
- Participant ID
During the videoconference:
- Audio-Visual Scene Description produces Speech Objects, Face Objects, Face Descriptors, Body Descriptors and Audio-Visual Scene Geometry.
- Automatic Speech Recognition produces Recognised Text.
- Personal Status Extraction produces Personal Status.
- Multiplexes Recognised Text, Face Descriptors, Body Descriptors, Personal Status
- Portable Avatar Multiplexing multiplexes Recognised Text, Personal Status, Input Speech, Face Descriptors, Body Descriptors, Language Selector, Avatar Model, and Participant ID.
- Videoconference Client Transmitter sends Portable Avatars to Avatar Videoconference Server that the Server processes and re-distributes to Client Receivers.
3 Input and Output Data of Client Transmitter
Table 1 gives the input and output data of the Client Transmitter AIW:
Table 1 – Input and output data of Client Transmitter AIW
Input | Description |
Input Text | Chat text used by a human to communicate with Virtual Meeting Secretary or other participants |
Language Selector | The language participant wishes to speak and hear. |
Input Audio | Audio of Speech of participants in a meeting room. |
Input Visual | Video of participants in a meeting room. |
Avatar Model | The avatar model selected by the participant. |
Output | Description |
Speech Object | An utterance of a Participant used by Server for authentication. |
PartixipantPortable Avatar | Portable Avatar produced by Client Transmitter. |
Face Object | Participant’s face used by Server for authentication. |
4 Functions of Client Transmitter’s AI Modules
Table 2 gives the functions of AI Modules of the Client Transmitter AIW.
Table 2 – AI Modules of Client Transmitter AIW
AIM | Function |
Audio-Visual Scene Description | 1. Receives Input Audio and Input Visual. 2. Provides Input Speech, Speech Object, Face Descriptors, Body Descriptors, Face Object. |
Automatic Speech Recognition | 1. Receives Input Speech. 2. Provides Recognised Text. |
Personal Status Extraction | 1. Receives Recognised Text, Speech, Face Descriptors, Body Descriptors,. 2. Provides the Participant’s Personal Status. |
Portable Avatar Multiplexing | 1. Receives Language Preference, Avatar Model, Input Text, Input Speech, Recognised Speech, Personal Status, Participant ID, Face Descriptors, Body Descriptors. 2. Provides Participant Portable Avatars. |
5 I/O Data of Client Transmitter’s AI Modules
Table 3 gives the AI Modules of Client Transmitter AIW.
Table 3 – AI Modules of Client Transmitter AIW
AIM | Input | Output |
Audio-Visual Scene Description | Input Audio Input Visual |
Input Speech Speech Object Face Object Face Descriptors Body Descriptors |
Automatic Speech Recognition | Speech Objects | Recognised Text |
Personal Status Extraction | Recognised Text Input Speech Face Object Body Object |
Personal Status |
Portable Avatar Multiplexing | Recognised Text Personal Status Input Speech Face Descriptors Body Descriptors Input Text Language Selector Avatar Model Participant ID |
Portable Avatars. |
6 AIW, AIM, and JSON Metadata of Videoconference Client Transmitter
Table 4 – AIW, WIMs, and JSON Metadata
AIW | AIMs/1 | AIMs/2 | Name | JSON | |
PAF-CTX | Videoconference Client Transmitter | X | |||
OSD-AVS | Audio-Visual Scene Description | X | |||
CAE-ASD | Audio Scene Description | X | |||
CAE-AAT | Audio Analysis Transform | X | |||
CAE-ASL | Audio Source Localisation | X | |||
CAE-ASE | Audio Separation and Enhancement | X | |||
CAE-AST | Audio Synthesis Transform | X | |||
CAE-AMX | Audio Descriptors Multiplexing | X | |||
OSD-VSD | Visual Scene Description | X | |||
OSD-AVA | Audio-Visual Alignment | X | |||
MMC-ASR | Automatic Speech Recognition | X | |||
MMC-PSE | Personal Status Extraction | X | |||
MMC-ITD | Entity Text Description | X | |||
MMC-ISD | Entity Speech Description | X | |||
PAF-IFD | Entity Face Description | X | |||
PAF-IBD | Entity Body Description | X | |||
MMC-PTI | PS-Text Interpretation | X | |||
MMC-PSI | PS-Speech Interpretation | X | |||
PAF-PFI | PS-Face Interpretation | X | |||
PAF-PGI | PS-Gesture Interpretation | X | |||
MMC-PMX | Personal Status Multiplexing | X | |||
PAF-PMX | Portable Avatar Multiplexing | X |