Go To MPAI-OSD AI Modules

1     Function 2     Reference Model 3     Input/Output Data
4     SubAIMs 5     JSON Metadata 6     Profiles
7     Reference Software 8     Conformance Texting 9     Performance Assessment

1     Functions

Audio-Visual Alignment (OSD-AVA):

Receives Speech Scene Descriptors
Audio Scene Descriptors
Visual Scene Descriptors
Aligns The Speech, Audio, and Visual Objects sharing the same Spatial Attitude
Produces Audio-Visual Scene Descriptors where Speech Objects, Audio Objects,  and Visual Objects having the same Spatial Attitude have the same or compatible Identifiers.

2     Reference Model

Figure 1 specifies the Reference Model of the Audio-Visual Alignment (OSD-AVA) AIM.

Figure 1 – Reference Model of the Audio-Visual Alignment (OSD-AVA) AIM

3    Input/Output Data

Table 1 specifies the Input and Output Data of the Audio-Visual Alignment (OSD-AVA) AIM.

Table 1 – I/O Data of the Audio-Visual Alignment AIM

Input Description
Speech Scene Descriptors The IDs and the geometry of the Speech Objects of the Scene.
Audio Scene Descriptors The IDs and the geometry of the Audio Objects of the Scene.
Visual Scene Descriptors The IDs and the geometry of the Audio Objects of the Scene.
Output Description
Audio-Visual Scene Descriptors The IDs and the geometry of the Audio, Visual and Audio-Visual Objects of the Scene.

4     SubAIMs

No SubAIMs.

5     JSON Metadata

https://schemas.mpai.community/OSD/V1.1/AIMs/AudioVisualAlignment.json

6     Profiles

No profiles.

7     Reference Software

7.1    Disclaimers

  1. This OSD-AVA Reference Software Implementation is released with the BSD-3-Clause licence.
  2. The purpose of this Reference Software is to show a working Implementation of OSD-AVA, not to provide a ready-to-use product.
  3. MPAI disclaims the suitability of this Reference Software for any other purposes and does not guarantee that it is secure.
  4. Use of this Reference Software may require acceptance of licences from the respective repositories. Users shall verify that they have the right to use any third-party software required by this Reference Software.

7.2 Guide to OSD-AVA code

OSD-AVA arranges the output Visual Objects and Speech Objects with corresponding Time information: scene cuts/transitions and speakers’ turns. Each Object is bounded by two adjacent times from a list of unique times that are either 1) scene cuts/transitions or 2) starts and ends of speakers’ turns.

Use of this Reference Software for the OSD-AVA AI Module is for developers who are familiar with Python, Docker, and RabbitMQ.

OSD-AVA computes segments as unique intervals from scene bounds and from speech segments. Moreover, OSD-AVA outputs visual objects and speech objects.

The OSD-AVA Reference Software is found at the MPAI gitlab site. It contains:

  1. src: a folder with the Python code implementing the AIM
  2. Dockerfile: a Docker file containing only the libraries required to build the Docker image and run the container
  3. requirements.txt: dependencies installed in the Docker image.

7.3    Acknowledgements

This version of the MMC-ASR Reference Software has been developed by the MPAI AI Framework Development Committee (AIF-DC).

8     Conformance Testing

9     Performance Assessment

 

Go To MPAI-OSD AI Modules