<-Go to AI Workflows Go to ToC Virtual Meeting Secretary->
1 Function | 2 Reference Model | 3 Input/Output Data |
4 Functions of AI Modules | 5 I/O Data of AI Modules |
6 AIW, AIM, and JSON Metadata |
7 Reference Software | 8 Conformance Texting | 9 Performance Assessment |
1 Functions
The functions of the Videoconference Client Transmitter are to:
- Receive from a Participant:
- Input Audio from the microphone.
- Input Visual from the camera.
- Participant’s Avatar Model.
- Participant’s Language Selector (e.g., EN-US, IT-CH).
- Send to the Server:
- Speech Object (for Authentication).
- Face Object (for Authentication).
- Input Portable Avatars containing:
- Language Selector (at the start).
- Avatar Model (at the start).
- Input Speech.
- Avatar Descriptors.
2 Reference Model
Figure 1 gives the Reference Model of Client Transmitter AIW. Red text refers to data sent at meeting start.
Figure 1 – Reference Model of Videoconference Client Transmitter (PAF-ABV)
At the start, each participant provides:
- Language Selector
- Avatar Model.
- Speech Object (for Authentication).
- Face Object (for Authentication).
- Participant ID
During the videoconference:
- Audio-Visual Scene Description produces Speech Objects, Face Objects, Face Descriptors, Body Descriptors and Audio-Visual Scene Geometry.
- Automatic Speech Recognition produces Recognised Text.
- Personal Status Extraction produces Personal Status.
- Portable Avatar Multiplexing multiplexes Recognised Text, Personal Status, Input Speech, Face Descriptors, Body Descriptors, Language Selector, Avatar Model, and Participant ID.
- Videoconference Client Transmitter sends Portable Avatars to Avatar Videoconference Server that the Server processes and re-distributes to Client Receivers.
3 Input and Output Data
Table 1 gives the input and output data of the Client Transmitter AIW:
Table 1 – Input and output data of Client Transmitter AIW
Input | Description |
Input Text | Chat text used by a participant to communicate with Virtual Meeting Secretary or other participants |
input Selector | The language(s) a participant wishes to speak and hear. |
Input Audio | Audio of a participants’ Speech in a meeting room. |
Input Visual | Visual of participants in a meeting room. |
Avatar Model | The avatar model selected by the participant. |
Output | Description |
Speech Object | A participant’s utterance used by Server for authentication. |
Participant Portable Avatar | Portable Avatar produced by Client Transmitter. |
Face Object | Participant’s face used by Server for authentication. |
4 Functions of AI Modules
Table 2 gives the functions of AI Modules of the Client Transmitter AIW.
Table 2 – AI Modules of Client Transmitter AIW
AIM | Function |
Audio-Visual Scene Description | 1. Receives Input Audio and Input Visual. 2. Provides Input Speech, Speech Object, Face Descriptors, Body Descriptors, Face Object. |
Automatic Speech Recognition | 1. Receives Input Speech. 2. Provides Recognised Text. |
Personal Status Extraction | 1. Receives Recognised Text, Speech, Face Descriptors, Body Descriptors. 2. Provides the Participant’s Personal Status. |
Portable Avatar Multiplexing | 1. Receives Language Selector, Avatar Model, Input Text, Input Speech, Recognised Text, Personal Status, Participant ID, Face Descriptors, Body Descriptors. 2. Provides Participant Portable Avatars. |
5 I/O Data of AI Modules
Table 3 gives the AI Modules of Client Transmitter AIW.
Table 3 – AI Modules of Client Transmitter AIW
AIM | Input | Output |
Audio-Visual Scene Description | Input Audio Input Visual |
Input Speech Speech Object Face Object Face Descriptors Body Descriptors |
Automatic Speech Recognition | Input Speech | Recognised Text |
Personal Status Extraction | Recognised Text Input Speech Face Descriptors Body Descriptors |
Personal Status |
Portable Avatar Multiplexing | Recognised Text Personal Status Input Speech Face Descriptors Body Descriptors Input Text Language Selector Avatar Model Participant ID |
Portable Avatar. |
6 AIW, AIM, and JSON Metadata
Table 4 – AIW, WIMs, and JSON Metadata
AIW | AIMs/1 | AIMs/2 | AIMs/3 | Name | JSON |
PAF-CTX | Videoconference Client Transmitter | X | |||
OSD-AVS | Audio-Visual Scene Description | X | |||
CAE-ASD | Audio Scene Description | X | |||
CAE-AAT | Audio Analysis Transform | X | |||
CAE-ASL | Audio Source Localisation | X | |||
CAE-ASE | Audio Separation and Enhancement | X | |||
CAE-AST | Audio Synthesis Transform | X | |||
CAE-AMX | Audio Descriptors Multiplexing | X | |||
OSD-VSD | Visual Scene Description | X | |||
OSD-AVA | Audio-Visual Alignment | X | |||
MMC-ASR | Automatic Speech Recognition | X | |||
MMC-PSE | Personal Status Extraction | X | |||
MMC-ITD | Entity Text Description | X | |||
MMC-ISD | Entity Speech Description | X | |||
PAF-IFD | Entity Face Description | X | |||
PAF-IBD | Entity Body Description | X | |||
MMC-PTI | PS-Text Interpretation | X | |||
MMC-PSI | PS-Speech Interpretation | X | |||
PAF-PFI | PS-Face Interpretation | X | |||
PAF-PGI | PS-Gesture Interpretation | X | |||
MMC-PMX | Personal Status Multiplexing | X | |||
PAF-PMX | Portable Avatar Multiplexing | X |
7 Reference Software
8 Conformance Testing
9 Performance Assessment
<-Go to AI Workflows Go to ToC Virtual Meeting Secretary->