| CAE | CAV | HMC | MMC | OSD | PAF |
| AI Workflows | AI Modules |
1 AI Workflows
1.1 Avatar Videoconference Server
PAF-AVS is a PAAI composed of collaborating PAAIs:
| Portable Avatar Demultiplexing | Makes available the components of all avatars received by PAF-AVS. |
| Text and Speech Translation | Translates the Speech Objects based on avatars’ Language preferences. |
| Service Participant Authentication | Uses participants’ speech and faces to authenticate a participant as a legitimate videoconference service user. |
| Portable Avatar Multiplexing | Combines into Portable Avatars dispatched to participants the following: – Modified avatar components. – Avatar IDs (at session start). – Scene descriptors (at session start) – Selected avatars’ Positions and Orientation (at session start). |
![]()
Figure 1 – Reference Model of Avatar Videoconference Server
The following links analyse the AI Modules:
- Portable Avatar Demultiplexing
- Text and Speech Translation
- Service Participant Authentication
- Portable Avatar Multiplexing
PAF-AVS performs Interpretation Level Operations (MMC-TST and PAF-SPA).
1.2 Videoconference Client Receiver
PAF-VCR is a PAAI composed of collaborating PAAIs performing the following operations:
| Portable Avatar Demultiplexing | Makes available the components of all avatars received by PAF-AVS. |
| Visual Scene Creation | Identifies the Spatial Attitude of the mouth of each avatar. |
| Audio Scene Creation | Adds the Speech Object to the mouth of each avatar. |
| Audio-Visual Scene Rendering | Renders the speech and visual component of the scene from a user-selected Point of View |

Figure 1 – Reference Model of Videoconference Client Receiver
The following links analyse the AI Modules:
- Portable Avatar Demultiplexing
- Visual Scene Creation
- Audio-Visual Scene Creation
- Audio-Visual Scene Rendering
PAF-VCT performs Descriptors Level Operations.
1.3 Videoconference Client Transmitter
PAF-VCT is a PAAI composed of the following collaborating PAAIs:
| AV Scene Description | Digitally represents the audio-visual scene removing the audio component and retaining the speech component. |
| Automatic Speech Recognition | Converts the Input Speech into Text. |
| Personal Status Extraction | Extracts the Personal Status from input Text-Speech-Face-Gesture. |
| Personal Status Multiplexing | Uses Text and Personal Status to improve the quality of the produced Avatar and multiplexes Input Selector, Avatar Model, and Participant ID with other data internal to the AIW. |

Figure 1 – Reference Model of Videoconference Client Transmitter (PAF-ABV)
The following links analyse the AI Modules:
- Audio-Visual Scene Description
- Automatic Speech Recognition
- Personal Status Extraction
- Portable Avatar Multiplexing
PAF-VCT performs Descriptors Level Operations.
2 AI Modules
2.1 Audio-Visual Scene Rendering
PAF-AVR:
| Receives | Point of View | To be used in rendering the scene and its objects. |
| AV Scene Descriptors | jointly with or alternatively with Portable Avatar (PA). | |
| Portable Avatar | Jointly with or alternatively to AV Scene Descriptors. | |
| Transforms | Portable Avatar | Into generic Audio-Visual Scene Descriptors if input is PA. |
| Produces | Output Speech | Resulting from the rendering of Speech Scene Descriptors from human-selected Point of View. |
| Output Audio | Resulting from the rendering of Audio Scene Descriptors from human-selected Point of View. | |
| Output Visual | Resulting from the rendering of Visual Scene Descriptors from human-selected Point of View. |
A PAF-AVR implementation requires graphic rendering capabilities to render the Audio-Visual Scene and the Avatar from the user-selected Point of View.
PAF-AVR performs Descriptors Level Operation.
2.2 Face Identity Recognition
PAF-FIR:
| Receives | Text Object | Text that is related with the Face to be identified. |
| Image Visual Object | Image containing Face to be identified. | |
| Face Time | Time when the face should be identified. | |
| Visual Scene Geometry | Of the scene where the Face is located. | |
| Finds | Bounding Boxes | That include Faces, using spatial information. |
| Applies | Face ID algorithm | That references a specific Face Taxonomy. |
| Finds | The best match | Between the Faces and those in a database. |
| Produces | Face Identities | Face Instance Identifiers. |
| Bounding Boxes | Bounding Boxes that include faces. |
PAF-FIR performs Descriptors (Bounding Boxes) and Interpretation (Face IDs) Level Operations.
2.3 Personal Status Display
PAF-PSD
| Receives | Machine ID | ID to be used to identify the Avatar in Portable Avatar. |
| Text Object | Text associated to Avatar in Portable Avatar. | |
| Personal Status | Personal Status associated to Avatar in Portable Avatar. | |
| Avatar Model | 3D Model associated to Avatar in Portable Avatar. | |
| Speech Model | Speech Model Associated to Avatar in Portable Avatar. | |
| Synthesises | Speech Object | Possibly through an input Speech Model. |
| Generates | Face Descriptors | Using Speech Object, input Avatar Model, and Face Personal Status. |
| Produces | Portable Avatar | Including Machine ID, Speech Model, Text Object, Avatar Model and internally generated Speech Object and Avatar |
| Enables | PAF-AVR | To render the Portable Avatar produced by PAF-PSD. |
PAF-PSD performs Descriptors Level Operation.
1.4 Service Participant Authentication
PAF-SPA
| Receives | Participant ID | ID of a Participant in a session of a Service. From an upstream AIM or another AIW. |
| Face Visual Object | Face of Participant. | |
| Speech Object | Speech segment of Participant. | |
| Recognises | Face ID | From Face Objects |
| Speech ID | From Speech Object | |
| Uses | Speech & Face ID | To search an Service ID database. |
| Produces | Subscriber ID | ID of Service Subscriber |
PAF-SPA can be implemented using Neural Networks.
PAF-SPA performs Interpretation Level Operation.