MPAI-PAF V1.3 AIWs Videoconference Client Transmitter

<-Go to AI Workflows Go to ToC Virtual Meeting Secretary->

1 Function	2 Reference Model	3 Input/Output Data
4 Functions of AI Modules	5 I/O Data of AI Modules	6 AIW, AIM, and JSON Metadata
7 Reference Software	8 Conformance Texting	9 Performance Assessment

1 Functions

The functions of the Videoconference Client Transmitter are to:

Receive from a Participant:
- Input Audio from the microphone.
- Input Visual from the camera.
- Participant’s Avatar Model.
- Participant’s Language Selector (e.g., EN-US, IT-CH).
Send to the Server:
- Speech Object (for Authentication).
- Face Object (for Authentication).
- Input Portable Avatars containing:
  - Language Selector (at the start).
  - Avatar Model (at the start).
  - Input Speech.
  - Avatar Descriptors.

2 Reference Model

Figure 1 gives the Reference Model of Client Transmitter AIW. Red text refers to data sent at meeting start.

Figure 1 – Reference Model of Videoconference Client Transmitter (PAF-ABV)

At the start, each participant provides:

Language Selector
Avatar Model.
Speech Object (for Authentication).
Face Object (for Authentication).
Participant ID

During the videoconference:

Audio-Visual Scene Description produces Speech Objects, Face Objects, Face Descriptors, Body Descriptors and Audio-Visual Scene Geometry.
Automatic Speech Recognition produces Recognised Text.
Personal Status Extraction produces Personal Status.
Portable Avatar Multiplexing multiplexes Recognised Text, Personal Status, Input Speech, Face Descriptors, Body Descriptors, Language Selector, Avatar Model, and Participant ID.
Videoconference Client Transmitter sends Portable Avatars to Avatar Videoconference Server that the Server processes and re-distributes to Client Receivers.

3 Input and Output Data

Table 1 gives the input and output data of the Client Transmitter AIW:

Table 1 – Input and output data of Client Transmitter AIW

Input	Description
Input Text	Chat text used by a participant to communicate with Virtual Meeting Secretary or other participants
input Selector	The language(s) a participant wishes to speak and hear.
Input Audio	Audio of a participants’ Speech in a meeting room.
Input Visual	Visual of participants in a meeting room.
Avatar Model	The avatar model selected by the participant.
Output	Description
Speech Object	A participant’s utterance used by Server for authentication.
Participant Portable Avatar	Portable Avatar produced by Client Transmitter.
Face Object	Participant’s face used by Server for authentication.

4 Functions of AI Modules

Table 2 gives the functions of AI Modules of the Client Transmitter AIW.

Table 2 – AI Modules of Client Transmitter AIW

AIM	Function
Audio-Visual Scene Description	1. Receives Input Audio and Input Visual. 2. Provides Input Speech, Speech Object, Face Descriptors, Body Descriptors, Face Object.
Automatic Speech Recognition	1. Receives Input Speech. 2. Provides Recognised Text.
Personal Status Extraction	1. Receives Recognised Text, Speech, Face Descriptors, Body Descriptors. 2. Provides the Participant’s Personal Status.
Portable Avatar Multiplexing	1. Receives Language Selector, Avatar Model, Input Text, Input Speech, Recognised Text, Personal Status, Participant ID, Face Descriptors, Body Descriptors. 2. Provides Participant Portable Avatars.

5 I/O Data of AI Modules

Table 3 gives the AI Modules of Client Transmitter AIW.

Table 3 – AI Modules of Client Transmitter AIW

AIM	Input	Output
Audio-Visual Scene Description	Input Audio Input Visual	Input Speech Speech Object Face Object Face Descriptors Body Descriptors
Automatic Speech Recognition	Input Speech	Recognised Text
Personal Status Extraction	Recognised Text Input Speech Face Descriptors Body Descriptors	Personal Status
Portable Avatar Multiplexing	Recognised Text Personal Status Input Speech Face Descriptors Body Descriptors Input Text Language Selector Avatar Model Participant ID	Portable Avatar.

6 AIW, AIM, and JSON Metadata

Table 4 – AIW, WIMs, and JSON Metadata

AIW	AIMs/1	AIMs/2	AIMs/3	Name	JSON
PAF-CTX				Videoconference Client Transmitter	X
	OSD-AVS			Audio-Visual Scene Description	X
		CAE-ASD		Audio Scene Description	X
			CAE-AAT	Audio Analysis Transform	X
			CAE-ASL	Audio Source Localisation	X
			CAE-ASE	Audio Separation and Enhancement	X
			CAE-AST	Audio Synthesis Transform	X
			CAE-AMX	Audio Descriptors Multiplexing	X
		OSD-VSD		Visual Scene Description	X
		OSD-AVA		Audio-Visual Alignment	X
	MMC-ASR			Automatic Speech Recognition	X
	MMC-PSE			Personal Status Extraction	X
		MMC-ETD		Entity Text Description	X
		MMC-ESD		Entity Speech Description	X
		PAF-EFD		Entity Face Description	X
		PAF-EBD		Entity Body Description	X
		MMC-PTI		PS-Text Interpretation	X
		MMC-PSI		PS-Speech Interpretation	X
		PAF-PFI		PS-Face Interpretation	X
		PAF-PGI		PS-Gesture Interpretation	X
		MMC-PMX		Personal Status Multiplexing	X
	PAF-PMX			Portable Avatar Multiplexing	X

7 Reference Software

8 Conformance Testing

Table 2 provides the Conformance Testing Method for PAF-CTX AIM.

If a schema contains references to other schemas, conformance of data for the primary schema implies that any data referencing a secondary schema shall also validate against the relevant schema, if present and conform with the Qualifier, if present.

Table 2 – Conformance Testing Method for PAF-CTX AIM

Receives	Input Text	Shall validate against Text Object Schema. Text Data shall conform with Text Qualifier.
	input Selector	Shall validate against Selector Schema.
	Input Audio	Shall validate against Audio Object Schema. Speech Data shall conform with Audio Qualifier.
	Input Visual	Shall validate against Visual Object Schema. Visual Data shall conform with Visual Qualifier.
	Avatar Model	Shall validate against 3D Model Object Schema. Speech Data shall conform with 3D Model Qualifier.
Produces	Speech Object	Shall validate against Speech Object Schema. Speech Data shall conform with Speech Qualifier.
	Participant Portable Avatar	Shall validate against Portable Avatar Schema. Portable Avatar Data shall conform with respective Qualifiers.
	Face Object	Shall validate against Visual Object Schema. Face Data shall conform with Visual Qualifier.

9 Performance Assessment