MPAI-OSD V1.2 AIM Audio-Visual Alignment

Go To MPAI-OSD AI Modules

1 Function	2 Reference Model	3 Input/Output Data
4 SubAIMs	5 JSON Metadata	6 Profiles
7 Reference Software	8 Conformance Texting	9 Performance Assessment

1 Functions

Audio-Visual Alignment (OSD-AVA):

Receives	Speech Scene Descriptors	Descriptors of potentially present Speech Scene.
	Audio Scene Descriptors	Descriptors of potentially present Audio Scene.
	Visual Scene Descriptors	Descriptors of Visual Scene.
Aligns	Speech, Audio, and Visual Objects	Sharing the same Spatial Attitude
Produces	Audio-Visual Scene Descriptors	Where Speech Objects, Audio Objects, and Visual Objects having the same Spatial Attitude have the same or compatible Identifiers.

2 Reference Model

Figure 1 specifies the Reference Model of the Audio-Visual Alignment (OSD-AVA) AIM.

Figure 1 – Reference Model of the Audio-Visual Alignment (OSD-AVA) AIM

3 Input/Output Data

Table 1 specifies the Input and Output Data of the Audio-Visual Alignment (OSD-AVA) AIM.

Table 1 – I/O Data of the Audio-Visual Alignment AIM

Input	Description
Speech Scene Descriptors	The IDs and the geometry of the Speech Objects of the Scene.
Audio Scene Descriptors	The IDs and the geometry of the Audio Objects of the Scene.
Visual Scene Descriptors	The IDs and the geometry of the Audio Objects of the Scene.
Output	Description
Audio-Visual Scene Descriptors	The IDs and the geometry of the Audio, Visual and Audio-Visual Objects of the Scene.

4 SubAIMs

No SubAIMs.

5 JSON Metadata

https://schemas.mpai.community/OSD/V1.2/AIMs/AudioVisualAlignment.json

6 Profiles

No profiles.

7 Reference Software

7.1 Disclaimers

This OSD-AVA Reference Software Implementation is released with the BSD-3-Clause licence.
The purpose of this Reference Software is to show a working Implementation of OSD-AVA, not to provide a ready-to-use product.
MPAI disclaims the suitability of this Reference Software for any other purposes and does not guarantee that it is secure.
Use of this Reference Software may require acceptance of licences from the respective repositories. Users shall verify that they have the right to use any third-party software required by this Reference Software.

7.2 Guide to OSD-AVA code

OSD-AVA arranges the output Visual Objects and Speech Objects with corresponding Time information: scene cuts/transitions and speakers’ turns. Each Object is bounded by two adjacent times from a list of unique times that are either 1) scene cuts/transitions or 2) starts and ends of speakers’ turns.

Use of this Reference Software for the OSD-AVA AI Module is for developers who are familiar with Python, Docker, and RabbitMQ.

OSD-AVA computes segments as unique intervals from scene bounds and from speech segments. Moreover, OSD-AVA outputs visual objects and speech objects.

The OSD-AVA Reference Software is found at the MPAI gitlab site. It contains:

src: a folder with the Python code implementing the AIM
Dockerfile: a Docker file containing only the libraries required to build the Docker image and run the container
requirements.txt: dependencies installed in the Docker image.

7.3 Acknowledgements

This version of the MMC-ASR Reference Software has been developed by the MPAI AI Framework Development Committee (AIF-DC).

8 Conformance Testing

Table 2 provides the Conformance Testing Method for OSD-AVA AIM.

If a schema contains references to other schemas, conformance of data for the primary schema implies that any data referencing a secondary schema shall also validate against the relevant schema, if present and conform with the Qualifier, if present.

Table 2 – Conformance Testing Method for OSD-AVA AIM

Receives	Speech Scene Descriptors	Shall validate against Speech Scene Descriptors schema
	Audio Scene Descriptors	Shall validate against Audio Scene Descriptors schema
	Visual Scene Descriptors	Shall validate against Visual Scene Descriptors schema
Produces	Audio-Visual Scene Descriptors	Shall validate against AV Scene Descriptors schema

9 Performance Assessment