Go To MPAI-OSD AI Modules

1 Function 2 Reference Model 3 Input/Output Data
4 SubAIMs 5 JSON Metadata 6 Profiles
7 Reference Software 8 Conformance Texting 9 Performance Assessment

1 Functions

The Audio-Visual Alignment (PSD-AVA) AIM provides the Descriptors of an Audio-Visual Scene whose Audio Objects,  Speech Objects, 3D Model Objects, and Visual Objects have compatible Identifiers if they have the same Position.

Receives Speech Scene Descriptors Descriptors of potentially present Speech Scene.
Audio Scene Descriptors Descriptors of potentially present Audio Scene.
Visual Scene Descriptors Descriptors of potentially present Visual Scene.
3D Model Scene Descriptors Descriptors of potentially present 3D Model Scene.
 Context Capture Directive Instructions from A-User Control
Aligns Speech, Audio, and Visual Objects Sharing the same Spatial Attitude
Produces Speech Scene Descriptors Aligned Descriptors of present Speech Objects in Speech Scene.
Audio Scene Descriptors Aligned Descriptors of present Audio Objects in Audio Scene.
Visual Scene Descriptors Aligned Descriptors of present Visual Objects in Visual Scene.
3D Model Scene Descriptors Aligned Descriptors of present 3D Model Objects in 3D Model Scene.
 Context Capture Directive Report to A-User Control

Aligned prefixed to 3D Model or Visual Object/Audio or Speech Object, means that the Audio or Speech Object/3D Model or Visual Object are co-located.

2 Reference Model

Figure 1 specifies the Reference Model of the Audio-Visual Alignment (OSD-AVA) AIM.

Figure 1 – Reference Model of the Audio-Visual Alignment (OSD-AVA) AIM

3 Input/Output Data

Table 1 specifies the Input and Output Data of the Audio-Visual Alignment (PGM-AVA) AIM.

Table 1 – I/O Data of the Audio-Visual Alignment AIM

Input Description
Speech Scene Descriptors The Objects and the Spatial Attitudes of the Scene’s Speech Objects.
Audio Scene Descriptors The Objects and the Spatial Attitudes of the Scene’s Audio Objects.
Visual Scene Descriptors The Objects and the Spatial Attitudes of the Scene’s Visual Objects
3D Model Scene Descriptors The Objects and the Spatial Attitudes of the Scene’s 3D Model Objects
Output Description
Speech Scene Descriptors The Aligned Objects and the Spatial Attitudes of the Scene’s Speech Objects.
Audio Scene Descriptors The Aligned Objects and the Spatial Attitudes of the Scene’s Audio Objects.
Visual Scene Descriptors The Aligned Objects and the Spatial Attitudes of the Scene’s Visual Objects
3D Model Scene Descriptors The Aligned Objects and the Spatial Attitudes of the Scene’s 3D Model Objects

4 SubAIMs

No SubAIMs.

5 JSON Metadata

https://schemas.mpai.community/OSD/V1.4/AIMs/AudioVisualAlignment.json

6 Profiles

No profiles.

7 Reference Software

8 Conformance Testing

Table 2 provides the Conformance Testing Method for OSD-AVA AIM.

If a schema contains references to other schemas, conformance of data for the primary schema implies that any data referencing a secondary schema shall also validate against the relevant schema, if present and conform with the Qualifier, if present.

Table 2 – Conformance Testing Method for OSD-AVA AIM

Receives Speech Scene Descriptors Shall validate against Speech Scene Descriptors schema
Audio Scene Descriptors Shall validate against Audio Scene Descriptors schema
Visual Scene Descriptors Shall validate against Visual Scene Descriptors schema
3D Model Scene Descriptors Shall validate against 3D Model Scene Descriptors schema
Produces Audio-Visual Scene Descriptors Shall validate against AV Scene Descriptors schema

9 Performance Assessment

Go To MPAI-OSD AI Modules