| 1 Function | 2 Reference Model | 3 Input/Output Data |
| 4 SubAIMs | 5 JSON Metadata | 6 Profiles |
| 7 Reference Software | 8 Conformance Texting | 9 Performance Assessment |
1 Functions
The Audio-Visual Alignment (PSD-AVA) AIM provides the Descriptors of an Audio-Visual Scene whose Audio Objects, Speech Objects, 3D Model Objects, and Visual Objects have compatible Identifiers if they have the same Position.
| Receives | Speech Scene Descriptors | Descriptors of potentially present Speech Scene. |
| Audio Scene Descriptors | Descriptors of potentially present Audio Scene. | |
| Visual Scene Descriptors | Descriptors of potentially present Visual Scene. | |
| 3D Model Scene Descriptors | Descriptors of potentially present 3D Model Scene. | |
| Context Capture Directive | Instructions from A-User Control | |
| Aligns | Speech, Audio, and Visual Objects | Sharing the same Spatial Attitude |
| Produces | Speech Scene Descriptors | Aligned Descriptors of present Speech Objects in Speech Scene. |
| Audio Scene Descriptors | Aligned Descriptors of present Audio Objects in Audio Scene. | |
| Visual Scene Descriptors | Aligned Descriptors of present Visual Objects in Visual Scene. | |
| 3D Model Scene Descriptors | Aligned Descriptors of present 3D Model Objects in 3D Model Scene. | |
| Context Capture Directive | Report to A-User Control |
Aligned prefixed to 3D Model or Visual Object/Audio or Speech Object, means that the Audio or Speech Object/3D Model or Visual Object are co-located.
2 Reference Model
Figure 1 specifies the Reference Model of the Audio-Visual Alignment (OSD-AVA) AIM.

Figure 1 – Reference Model of the Audio-Visual Alignment (OSD-AVA) AIM
3 Input/Output Data
Table 1 specifies the Input and Output Data of the Audio-Visual Alignment (PGM-AVA) AIM.
Table 1 – I/O Data of the Audio-Visual Alignment AIM
| Input | Description |
| Speech Scene Descriptors | The Objects and the Spatial Attitudes of the Scene’s Speech Objects. |
| Audio Scene Descriptors | The Objects and the Spatial Attitudes of the Scene’s Audio Objects. |
| Visual Scene Descriptors | The Objects and the Spatial Attitudes of the Scene’s Visual Objects |
| 3D Model Scene Descriptors | The Objects and the Spatial Attitudes of the Scene’s 3D Model Objects |
| Output | Description |
| Speech Scene Descriptors | The Aligned Objects and the Spatial Attitudes of the Scene’s Speech Objects. |
| Audio Scene Descriptors | The Aligned Objects and the Spatial Attitudes of the Scene’s Audio Objects. |
| Visual Scene Descriptors | The Aligned Objects and the Spatial Attitudes of the Scene’s Visual Objects |
| 3D Model Scene Descriptors | The Aligned Objects and the Spatial Attitudes of the Scene’s 3D Model Objects |
4 SubAIMs
No SubAIMs.
5 JSON Metadata
https://schemas.mpai.community/OSD/V1.4/AIMs/AudioVisualAlignment.json
6 Profiles
No profiles.
7 Reference Software
8 Conformance Testing
Table 2 provides the Conformance Testing Method for OSD-AVA AIM.
If a schema contains references to other schemas, conformance of data for the primary schema implies that any data referencing a secondary schema shall also validate against the relevant schema, if present and conform with the Qualifier, if present.
Table 2 – Conformance Testing Method for OSD-AVA AIM
| Receives | Speech Scene Descriptors | Shall validate against Speech Scene Descriptors schema |
| Audio Scene Descriptors | Shall validate against Audio Scene Descriptors schema | |
| Visual Scene Descriptors | Shall validate against Visual Scene Descriptors schema | |
| 3D Model Scene Descriptors | Shall validate against 3D Model Scene Descriptors schema | |
| Produces | Audio-Visual Scene Descriptors | Shall validate against AV Scene Descriptors schema |
9 Performance Assessment