Function
Ref. Model
I/O Data
SubAIMs
JSON MData
Profiles
Ref. Software
Conformance
Performance
1 Functions
The Response and Scene Rendering (PAF‑RSR) AIM produces Audio, Speech, and Visual data representing a speaking avatar in an Audio-Visual Scene from a selection of input AV Scene Descriptors, input Point of View of the viewer, input Speech or Text, input Speech Model, input Avatar Model, and input Personal Status:
| Receives | Point of View | Selected by viewer to see/hear the scene. |
| AV Scene Descriptors | Describing the Audio-Visual Scene. | |
| Avatar PoV | The Point of View of the Avatar in the AV Scene. | |
| Input Speech | Speech alternative to Speech synthesised from Text Object. | |
| Speech Model | The Speech Model used to synthesise Speech. | |
| Text Object | To be used to: (1) synthesise into Speech using a provided or an internal Speech Model; (2) generate Face Descriptors and Body Descriptors. | |
| Personal Status | Additional information to synthesise Speech and generate Face and Body Descriptors. | |
| Avatar Model | When a specific Avatar Model is externally provided. | |
| Produces | Output Speech | The output Speech Object. |
| Output Audio | The Audio of the Audio-Visual Scene Descriptors. | |
| Output Visual | The rendering of the animated avatar and the visual scene. |
2 Reference Model
Figure 1 depicts the Reference Model of the Response and Scene Rendering (PAF‑RSR) AIM.

Figure 1 – The Response and Scene Rendering (PAF‑RSR) AIM
3 I/O Data
Table 1 specifies the Input and Output Data of the Response and Scene Rendering (PAF‑RSR) AIM.
| Point of View | Selected by viewer to see/hear the scene. |
| Audio-Visual Scene Descriptors | Describing the Audio-Visual Scene. |
| Avatar PoV | The Point of View of the Avatar in the AV Scene. |
| Input Speech | Speech alternative to Speech synthesised from Text Object. |
| Speech Model | The Speech Model used to synthesise Speech. |
| Text Object | For speech synthesis and Face and Body Descriptors generation. |
| Personal Status | Additional information to synthesise Speech and generate Face and Body Descriptors. |
| Avatar Model | When a specific Avatar Model is externally provided. |
| Output Speech | The output Speech Object. |
| Output Audio | The Audio component of the Audio-Visual Scene Descriptors. |
| Output Visual | The rendering of the animated avatar and the visual scene. |
4 SubAIMs
4.1 Reference Model
A Response and Scene Rendering (PAF‑RSR) AIM may be implemented as a Composite AIM depicted in Figure 2.

Figure 2 – The Response and Scene Rendering (PAF‑RSR) Composite AIM
4.2 Operation
The Personal Status De-multiplexer extracts available Media Personal Statuses and provides them to Text–To-Speech, Entity State Description, and Entity Body Description. These add Personal Status to Machine Speech, Machine Face Descriptors, and Machine Body Descriptors. Scene and Avatar Rendering adds Speech and Avatar to The input Scene Descriptors and renders the resulting Scene as Speech, Audio, and Visual.
4.3 Functions of AI Modules
Table 2 specifies the Function of the AI Modules.
Table 2 – Functions of End AI Modules
| Personal Status De-Multiplexing | Extracts available media-specific Personal Statuses. |
| Text‑To‑Speech | Synthesises speech from text. |
| Entity Face Description | Describes the Face of the Avatar. |
| Entity Body Description | Describes the Body of the Avatar, |
| Scene and Avatar Rendering | Renders Avatar in the Scene provided as input. |
4.4 I/O Data of AI Modules
Table 2 provides the links to the AIM specifications and JSON schemas. AIM1 indicates the Composite AIM and AIM2 its SubAIMs.
| PAF‑RSR | Response and Scene Rendering | X | |
| PAF‑PSD | Personal Status De-Multiplexing | X | |
| MMC‑TTS | Text‑To‑Speech | X | |
| PAF‑EFD | Entity Face Description | X | |
| PAF‑EBD | Entity Body Description | X | |
| PAF‑SAR | Scene and Avatar Rendering | X |
5 JSON Metadata
https://schemas.mpai.community/PAF/V1.6/AIMs/ResponseAndSceneRendering.json
6 Profiles
No Profiles.
7 Reference Software
Not part of this specification.
8 Conformance Testing
Table 3 provides the Conformance Testing Method for the PAF‑RSR AIM.
If a schema contains references to other schemas, conformance of data for the primary schema implies that any data referencing a secondary schema shall also validate against the relevant schema, if present, and conform with the Qualifier, if present.
| Point of View | Shall validate against Point of View schema. | |
| Audio-Visual Scene Descriptors | Shall validate against Audio-Visual Scene Descriptors schema. | |
| Avatar PoV | Shall validate against Point of View schema. | |
| Input Speech | Shall validate against Speech Object schema. | |
| Speech Model | Shall validate against ML Model schema. | |
| Text Object | Shall validate against Text Object schema. | |
| Personal Status | Shall validate against Personal Status schema. | |
| Avatar Model | Shall validate against 3D Model schema. | |
| Output Speech | Shall validate against Speech Object schema. | |
| Output Audio | Shall validate against Audio Object schema. | |
| Output Visual | Shall validate against Visual Object schema. |
9 Performance Assessment
Not part of this specification.