MPAI-OSD V1.4 AIM Audio-Visual Alignment

Go To MPAI-OSD AI Modules

1 Function	2 Reference Model	3 Input/Output Data
4 SubAIMs	5 JSON Metadata	6 Profiles
7 Reference Software	8 Conformance Texting	9 Performance Assessment

1 Functions

The Audio-Visual Alignment (PSD-AVA) AIM provides the Descriptors of an Audio-Visual Scene whose Audio Objects, Speech Objects, 3D Model Objects, and Visual Objects have compatible Identifiers if they have the same Position.

Receives	Speech Scene Descriptors	Descriptors of potentially present Speech Scene.
	Audio Scene Descriptors	Descriptors of potentially present Audio Scene.
	Visual Scene Descriptors	Descriptors of potentially present Visual Scene.
	3D Model Scene Descriptors	Descriptors of potentially present 3D Model Scene.
	Context Capture Directive	Instructions from A-User Control
Aligns	Speech, Audio, and Visual Objects	Sharing the same Spatial Attitude
Produces	Speech Scene Descriptors	Aligned Descriptors of present Speech Objects in Speech Scene.
	Audio Scene Descriptors	Aligned Descriptors of present Audio Objects in Audio Scene.
	Visual Scene Descriptors	Aligned Descriptors of present Visual Objects in Visual Scene.
	3D Model Scene Descriptors	Aligned Descriptors of present 3D Model Objects in 3D Model Scene.
	Context Capture Directive	Report to A-User Control

Aligned prefixed to 3D Model or Visual Object/Audio or Speech Object, means that the Audio or Speech Object/3D Model or Visual Object are co-located.

2 Reference Model

Figure 1 specifies the Reference Model of the Audio-Visual Alignment (OSD-AVA) AIM.

Figure 1 – Reference Model of the Audio-Visual Alignment (OSD-AVA) AIM

3 Input/Output Data

Table 1 specifies the Input and Output Data of the Audio-Visual Alignment (PGM-AVA) AIM.

Table 1 – I/O Data of the Audio-Visual Alignment AIM

Input	Description
Speech Scene Descriptors	The Objects and the Spatial Attitudes of the Scene’s Speech Objects.
Audio Scene Descriptors	The Objects and the Spatial Attitudes of the Scene’s Audio Objects.
Visual Scene Descriptors	The Objects and the Spatial Attitudes of the Scene’s Visual Objects
3D Model Scene Descriptors	The Objects and the Spatial Attitudes of the Scene’s 3D Model Objects
Output	Description
Speech Scene Descriptors	The Aligned Objects and the Spatial Attitudes of the Scene’s Speech Objects.
Audio Scene Descriptors	The Aligned Objects and the Spatial Attitudes of the Scene’s Audio Objects.
Visual Scene Descriptors	The Aligned Objects and the Spatial Attitudes of the Scene’s Visual Objects
3D Model Scene Descriptors	The Aligned Objects and the Spatial Attitudes of the Scene’s 3D Model Objects

4 SubAIMs

No SubAIMs.

5 JSON Metadata

https://schemas.mpai.community/OSD/V1.4/AIMs/AudioVisualAlignment.json

6 Profiles

No profiles.

7 Reference Software

8 Conformance Testing

Table 2 provides the Conformance Testing Method for OSD-AVA AIM.

If a schema contains references to other schemas, conformance of data for the primary schema implies that any data referencing a secondary schema shall also validate against the relevant schema, if present and conform with the Qualifier, if present.

Table 2 – Conformance Testing Method for OSD-AVA AIM

Receives	Speech Scene Descriptors	Shall validate against Speech Scene Descriptors schema
	Audio Scene Descriptors	Shall validate against Audio Scene Descriptors schema
	Visual Scene Descriptors	Shall validate against Visual Scene Descriptors schema
	3D Model Scene Descriptors	Shall validate against 3D Model Scene Descriptors schema
Produces	Audio-Visual Scene Descriptors	Shall validate against AV Scene Descriptors schema

9 Performance Assessment