Go To MPAI-OSD AI Modules

1 Function 2 Reference Model 3 Input/Output Data
4 SubAIMs 5 JSON Metadata 6 Profiles
7 Reference Software 8 Conformance Texting 9 Performance Assessment

1 Functions

Audio-Visual Alignment (OSD-AVA) V1.3 provides Descriptors of an Audio-Visual Scene whose Audio and Visual Objects that are co-located have a a similar Spatial Attitude.

Receives Speech Scene Descriptors Descriptors of potentially present Speech Scene.
Audio Scene Descriptors Descriptors of potentially present Audio Scene.
Visual Scene Descriptors Descriptors of Visual Scene.
Aligns Speech, Audio, and Visual Objects Sharing the same Spatial Attitude
Produces Audio-Visual Scene Descriptors Where Speech Objects, Audio Objects and Visual Objects having the same Spatial Attitude have the same or compatible Identifiers.

2 Reference Model

Figure 1 specifies the Reference Model of the Audio-Visual Alignment (OSD-AVA) AIM.

Figure 1 – Reference Model of the Audio-Visual Alignment (OSD-AVA) AIM

3 Input/Output Data

Table 1 specifies the Input and Output Data of the Audio-Visual Alignment (OSD-AVA) AIM.

Table 1 – I/O Data of the Audio-Visual Alignment AIM

Input Description
Speech Scene Descriptors The IDs and the geometry of the Speech Objects of the Scene.
Audio Scene Descriptors The IDs and the geometry of the Audio Objects of the Scene.
Visual Scene Descriptors The IDs and the geometry of the Audio Objects of the Scene.
Output Description
Audio-Visual Scene Descriptors The IDs and the geometry of the Audio, Visual and Audio-Visual Objects of the Scene.

4 SubAIMs

No SubAIMs.

5 JSON Metadata


6 Profiles

No profiles.

7 Reference Software

7.1 Disclaimers

  1. This OSD-AVA Reference Software Implementation is released with the BSD-3-Clause licence.
  2. The purpose of this Reference Software is to show a working Implementation of OSD-AVA, not to provide a ready-to-use product.
  3. MPAI disclaims the suitability of this Reference Software for any other purposes and does not guarantee that it is secure.
  4. Use of this Reference Software may require acceptance of licences from the respective repositories. Users shall verify that they have the right to use any third-party software required by this Reference Software.

7.2 Guide to OSD-AVA code

OSD-AVA arranges the output Visual Objects and Speech Objects with corresponding Time information: scene cuts/transitions and speakers’ turns. Each Object is bounded by two adjacent times from a list of unique times that are either 1) scene cuts/transitions or 2) starts and ends of speakers’ turns.

Use of this Reference Software for the OSD-AVA AI Module is for developers who are familiar with Python, Docker, and RabbitMQ.

OSD-AVA computes segments as unique intervals from scene bounds and from speech segments. Moreover, OSD-AVA outputs visual objects and speech objects.

The OSD-AVA Reference Software is found at the MPAI gitlab site. It contains:

  1. src: a folder with the Python code implementing the AIM
  2. Dockerfile: a Docker file containing only the libraries required to build the Docker image and run the container
  3. requirements.txt: dependencies installed in the Docker image.

7.3 Acknowledgements

This version of the MMC-ASR Reference Software has been developed by the MPAI AI Framework Development Committee (AIF-DC).

8 Conformance Testing

Table 2 provides the Conformance Testing Method for OSD-AVA AIM.

If a schema contains references to other schemas, conformance of data for the primary schema implies that any data referencing a secondary schema shall also validate against the relevant schema, if present and conform with the Qualifier, if present.

Table 2 – Conformance Testing Method for OSD-AVA AIM

Receives Speech Scene Descriptors Shall validate against Speech Scene Descriptors schema
Audio Scene Descriptors Shall validate against Audio Scene Descriptors schema
Visual Scene Descriptors Shall validate against Visual Scene Descriptors schema
Produces Audio-Visual Scene Descriptors Shall validate against AV Scene Descriptors schema

9 Performance Assessment


Go To MPAI-OSD AI Modules