| 1 Function | 2 Reference Model | 3 Input/Output Data |
| 4 SubAIMs | 5 JSON Metadata | 6 Profiles |
| 7 Reference Software | 8 Conformance Texting | 9 Performance Assessment |
1 Functions
The Response and Scene Rendering (PAF-RSR) AIM produces Audio, Speech, and Visual data representing a speaking avatar in an Audio-Visual Scene from a selection of input AV Scene Descriptors, input Point of View of the viewer, input Speech or Text, input Speech Model, input Avatar Model, and input Personal Status:
| Receives | Point of View | Selected by viewer to see/hear the scene. |
| AV Scene Descriptors | Describing the Audio-Visual Scene. | |
| Avatar PoV | The Point of View of the Avatar in the AV Scene. | |
| Input Speech | Speech alternative to Speech synthesised from Text Object. | |
| Speech Model | The Speech Model used to synthesise Speech. | |
| Text Object | To be used to: 1. Synthesise into Speech using a provided or an internal Speech Model 2. Generate Face Descriptors and Body Descriptors. |
|
| Personal Status | To be used as additional information to synthesise Speech and generate Face and Body Descriptors. | |
| Avatar Model | When a specific Avatar Model is externally provided. | |
| Produces | Output Speech | The output Speech Object . |
| Audio Speech | The Audio of the Audio-Visual Scene Descriptors. | |
| Output Visual | The rendering of the animated avatar and the visual scene. |
2 Reference Model
Figure 1depicts the Reference Model of the Avatar Animation and Speech (PAF-AAS).
Figure 1– The Response and Scene Rendering (PAF-RSR) AIM
3 Input/Output Data
Table 1 specifies the Input and Output Data of the Response and Scene Rendering (PAF-RSR) AIM.
Table 1– I/O Data of the Response and Scene Rendering (PAF-RSR) AIM
| Input | Description |
| Point of View | Selected by viewer to see/hear the scene. |
| Audio-Visual Scene Descriptors | Describing the Audio-Visual Scene. |
| Avatar PoV | The Point of View of the Avatar in the AV Scene. |
| Input Speech | Speech alternative to Speech synthesised from Text Object. |
| Speech Model | The Speech Model used to synthesise Speech. |
| Text Object | For speech synthesis and Face and Body Descriptors. |
| Personal Status | additional information to synthesise Speech and generate Face and Body Descriptors. |
| Avatar Model | When a specific Avatar Model is externally provided. |
| Output | Description |
| Output Speech | The output Speech Object . |
| Output Audio | The Audio component of the Audio-Visual Scene Descriptors. |
| Output Visual | The rendering of the animated avatar and the visual scene. |
4 SubAIMs
A Response and Scene Rendering (PAF-RSR) AIM may be implemented as as a composite AIM depicted in Figure 2

Figure 2 – Response and Scene Rendering (PAF-RSR) Composite AIM Reference Model
The AIMs composing the Response and Scene Rendering (PAF-RSR) Composite AIM are:
| AIM | AIMs | Names | JSON |
| PAF-RSR | Response and Scene Rendering | X | |
| PAF-PSD | Personal Status Display | X | |
| MMC-TTS | Text-To-Speech | X | |
| PAF-EFD | Entity Face Description | X | |
| PAF-EBD | Entity Body Description | X | |
| PAF-SAR | Scene and Avatar Rendering | X |
5 JSON Metadata
https://schemas.mpai.community/PAF/V1.5/AIMs/AvatarAnimationAndSpeech.json
6 Profiles
No Profiles
7 Reference Software
8 Conformance Testing
Table 2 provides the Conformance Testing Method for PAF-PSRAIM.
If a schema contains references to other schemas, conformance of data for the primary schema implies that any data referencing a secondary schema shall also validate against the relevant schema, if present and conform with the Qualifier, if present.
Table 2 – Conformance Testing Method for PAF-AAS AIM
| Input | Description |
| Point of View | Shall validate against Point of View Schema. |
| Audio-Visual Scene Descriptors | Shall validate against Audio-Visual Scene Descriptors Schema. |
| Avatar PoV | Shall validate against Point of View Schema. |
| Input Speech | Shall validate against Speech Object Schema. |
| Speech Model | Shall validate against ML Model Schema. |
| Text Object | Shall validate against Text Object Schema. |
| Personal Status | Shall validate against Personal Status Schema. |
| Avatar Model | Shall validate against 3D Model Schema. |
| Output | Description |
| Output Speech | Shall validate against Speech Object Schema. |
| Output Audio | Shall validate against Audio Object Schema. |
| Output Visual | Shall validate against Visual Object Schema. |