MPAI-PAF V1.5 AIMs Response and Scene Rendering

Go To MPAI-PAF AI Modules

1 Function	2 Reference Model	3 Input/Output Data
4 SubAIMs	5 JSON Metadata	6 Profiles
7 Reference Software	8 Conformance Texting	9 Performance Assessment

1 Functions

The Response and Scene Rendering (PAF-RSR) AIM produces Audio, Speech, and Visual data representing a speaking avatar in an Audio-Visual Scene from a selection of input AV Scene Descriptors, input Point of View of the viewer, input Speech or Text, input Speech Model, input Avatar Model, and input Personal Status:

Receives	Point of View	Selected by viewer to see/hear the scene.
	AV Scene Descriptors	Describing the Audio-Visual Scene.
	Avatar PoV	The Point of View of the Avatar in the AV Scene.
	Input Speech	Speech alternative to Speech synthesised from Text Object.
	Speech Model	The Speech Model used to synthesise Speech.
	Text Object	To be used to: 1. Synthesise into Speech using a provided or an internal Speech Model 2. Generate Face Descriptors and Body Descriptors.
	Personal Status	To be used as additional information to synthesise Speech and generate Face and Body Descriptors.
	Avatar Model	When a specific Avatar Model is externally provided.
Produces	Output Speech	The output Speech Object .
	Audio Speech	The Audio of the Audio-Visual Scene Descriptors.
	Output Visual	The rendering of the animated avatar and the visual scene.

2 Reference Model

Figure 1depicts the Reference Model of the Avatar Animation and Speech (PAF-AAS).

Figure 1– The Response and Scene Rendering (PAF-RSR) AIM

3 Input/Output Data

Table 1 specifies the Input and Output Data of the Response and Scene Rendering (PAF-RSR) AIM.

Table 1– I/O Data of the Response and Scene Rendering (PAF-RSR) AIM

Input	Description
Point of View	Selected by viewer to see/hear the scene.
Audio-Visual Scene Descriptors	Describing the Audio-Visual Scene.
Avatar PoV	The Point of View of the Avatar in the AV Scene.
Input Speech	Speech alternative to Speech synthesised from Text Object.
Speech Model	The Speech Model used to synthesise Speech.
Text Object	For speech synthesis and Face and Body Descriptors.
Personal Status	additional information to synthesise Speech and generate Face and Body Descriptors.
Avatar Model	When a specific Avatar Model is externally provided.
Output	Description
Output Speech	The output Speech Object .
Output Audio	The Audio component of the Audio-Visual Scene Descriptors.
Output Visual	The rendering of the animated avatar and the visual scene.

4 SubAIMs

A Response and Scene Rendering (PAF-RSR) AIM may be implemented as as a composite AIM depicted in Figure 2

Figure 2 – Response and Scene Rendering (PAF-RSR) Composite AIM Reference Model

The AIMs composing the Response and Scene Rendering (PAF-RSR) Composite AIM are:

AIM	AIMs	Names	JSON
PAF-RSR		Response and Scene Rendering	X
	PAF-PSD	Personal Status Display	X
	MMC-TTS	Text-To-Speech	X
	PAF-EFD	Entity Face Description	X
	PAF-EBD	Entity Body Description	X
	PAF-SAR	Scene and Avatar Rendering	X

5 JSON Metadata

https://schemas.mpai.community/PAF/V1.5/AIMs/AvatarAnimationAndSpeech.json

6 Profiles

No Profiles

7 Reference Software

8 Conformance Testing

Table 2 provides the Conformance Testing Method for PAF-PSRAIM.

If a schema contains references to other schemas, conformance of data for the primary schema implies that any data referencing a secondary schema shall also validate against the relevant schema, if present and conform with the Qualifier, if present.

Table 2 – Conformance Testing Method for PAF-AAS AIM

Input	Description
Point of View	Shall validate against Point of View Schema.
Audio-Visual Scene Descriptors	Shall validate against Audio-Visual Scene Descriptors Schema.
Avatar PoV	Shall validate against Point of View Schema.
Input Speech	Shall validate against Speech Object Schema.
Speech Model	Shall validate against ML Model Schema.
Text Object	Shall validate against Text Object Schema.
Personal Status	Shall validate against Personal Status Schema.
Avatar Model	Shall validate against 3D Model Schema.
Output	Description
Output Speech	Shall validate against Speech Object Schema.
Output Audio	Shall validate against Audio Object Schema.
Output Visual	Shall validate against Visual Object Schema.

9 Performance Assessment