CAE CAV HMC MMC OSD PAF

AI Workflows AI Modules

1       AI Workflows

1.1      Avatar Videoconference Server

PAF-AVS is a PAAI composed of collaborating PAAIs:

Portable Avatar Demultiplexing Makes available the components of all avatars received by PAF-AVS.
Text and Speech Translation Translates the Speech Objects based on avatars’ Language preferences.
Service Participant Authentication Uses participants’ speech and faces to authenticate a participant as a legitimate videoconference service user.
Portable Avatar Multiplexing Combines into Portable Avatars dispatched to participants the following:
– Modified avatar components.
– Avatar IDs (at session start).
– Scene descriptors (at session start)
– Selected avatars’ Positions and Orientation (at session start).

Figure 1 – Reference Model of Avatar Videoconference Server

The following links analyse the AI Modules:

PAF-AVS performs Interpretation Level Operations (MMC-TST and PAF-SPA).

1.2      Videoconference Client Receiver

PAF-VCR is a PAAI composed of collaborating PAAIs performing the following operations:

Portable Avatar Demultiplexing Makes available the components of all avatars received by PAF-AVS.
Visual Scene Creation Identifies the Spatial Attitude of the mouth of each avatar.
Audio Scene Creation Adds the Speech Object to the mouth of each avatar.
Audio-Visual Scene Rendering Renders the speech and visual component of the scene from a user-selected Point of View

Figure 1 – Reference Model of Videoconference Client Receiver

The following links analyse the AI Modules:

PAF-VCT performs Descriptors Level Operations.

1.3      Videoconference Client Transmitter

PAF-VCT is a PAAI composed of the following collaborating PAAIs:

AV Scene Description Digitally represents the audio-visual scene removing the audio component and retaining the speech component.
Automatic Speech Recognition Converts the Input Speech into Text.
Personal Status Extraction Extracts the Personal Status from input Text-Speech-Face-Gesture.
Personal Status Multiplexing Uses Text and Personal Status to improve the quality of the produced Avatar and multiplexes Input Selector, Avatar Model, and Participant ID with other data internal to the AIW.

Figure 1 – Reference Model of Videoconference Client Transmitter (PAF-ABV)

The following links analyse the AI Modules:

PAF-VCT performs Descriptors Level Operations.

2       AI Modules

2.1        Audio-Visual Scene Rendering

PAF-AVR:

Receives Point of View To be used in rendering the scene and its objects.
AV Scene Descriptors jointly with or alternatively with Portable Avatar (PA).
Portable Avatar Jointly with or alternatively to AV Scene Descriptors.
Transforms Portable Avatar Into generic Audio-Visual Scene Descriptors if input is PA.
Produces Output Speech Resulting from the rendering of Speech Scene Descriptors from human-selected Point of View.
Output Audio Resulting from the rendering of Audio Scene Descriptors from human-selected Point of View.
Output Visual Resulting from the rendering of Visual Scene Descriptors from human-selected Point of View.

A PAF-AVR implementation requires graphic rendering capabilities to render the Audio-Visual Scene and the Avatar from the user-selected Point of View.

PAF-AVR performs Descriptors Level Operation.

2.2      Face Identity Recognition

PAF-FIR:

Receives Text Object Text that is related with the Face to be identified.
Image Visual Object Image containing Face to be identified.
Face Time Time when the face should be identified.
Visual Scene Geometry Of the scene where the Face is located.
Finds Bounding Boxes That include Faces, using spatial information.
Applies Face ID algorithm That references a specific Face Taxonomy.
Finds The best match Between the Faces and those in a database.
Produces Face Identities Face Instance Identifiers.
Bounding Boxes Bounding Boxes that include faces.

PAF-FIR performs Descriptors (Bounding Boxes) and Interpretation (Face IDs) Level Operations.

2.3      Personal Status Display

PAF-PSD

Receives Machine ID ID to be used to identify the Avatar in Portable Avatar.
Text Object Text associated to Avatar in Portable Avatar.
Personal Status Personal Status associated to Avatar in Portable Avatar.
Avatar Model 3D Model associated to Avatar in Portable Avatar.
Speech Model Speech Model Associated to Avatar in Portable Avatar.
Synthesises Speech Object Possibly through an input Speech Model.
Generates Face Descriptors Using Speech Object, input Avatar Model, and Face Personal Status.
Produces Portable Avatar Including Machine ID, Speech Model, Text Object, Avatar Model and internally generated Speech Object and Avatar
Enables PAF-AVR To render the Portable Avatar produced by PAF-PSD.

PAF-PSD performs Descriptors Level Operation.

1.4 Service Participant Authentication

PAF-SPA

Receives Participant ID ID of a Participant in a session of a Service. From an upstream AIM or another AIW.
Face Visual Object Face of Participant.
Speech Object Speech segment of Participant.
Recognises Face ID From Face Objects
Speech ID From Speech Object
Uses Speech & Face ID To search an Service ID database.
Produces Subscriber ID ID of Service Subscriber

PAF-SPA can be implemented using Neural Networks.

PAF-SPA performs Interpretation Level Operation.