Go To MPAI-OSD AI Modules

1     Function 2     Reference Model 3     Input/Output Data
4     SubAIMs 5     JSON Metadata 6     Profiles
7     Reference Software 8     Conformance Texting 9     Performance Assessment

1     Functions

Audio-Visual Scene Description (OSD-AVS):

Receives Space-Time information of output Audio-Visual Scene Descriptors
Speech Objects
Audio Objects
Visual Objects
Creates Audio-Visual Scene Descriptors
Produces Audio-Visual Scene Descriptors

2     Reference Model

Figure 1 specified the Reference Model OF Audio-Visual Scene Description (OSD-AVS) aim.

Figure 1 – The Audio-Visual Scene Description (OSD-AVS) AIM

3    Input/Output Data

Table 1 specifies the Input and Output Data of the Audio-Visual Scene Description (OSD-AVS) AIM. Links are to the Data Type specifications.

Table 1 – I/O Data of the Audio-Visual Scene Description (OSD-AVS) AIM

Input Description
Space-Time Space-Time information of output Audio-Visual Scene Descriptors
Speech Object Speech Object
Audio Objects Audio Objects.
Visual Objects Visual Objects.
Audio-Visual Scene Descriptors The Audio-Visual Descriptors of the Scene part of the target Audio-Visual Scene.
Output Description
Audio-Visual Scene Descriptors The Audio-Visual Descriptors of the Scene.

4     SubAIMs

Figure 2 specified the Reference Model of Audio-Visual Scene Description (CAE-ASD) Composite AIM.

Figure 2 – The Audio-Visual Scene Description (OSD-AVS) Composite AIM

Table 2 provides the links to the specifications of the OSD-AVS AIMs.

Table 2 – AIMs of the Audio-Visual Scene Description (OSD-AVS) Composite AIM

AIMs Names JSON
MMC-SSD Speech Scene Description X
CAE-ASD Audio Scene Description X
OSD-VSD Visual Scene Description X
OSD-AVA Audio-Visual Alignment X

5     JSON Metadata

https://schemas.mpai.community/OSD/V1.1/AIMs/AudioVisualSceneDescription.json

6     Profiles

No Profiles.

7     Reference Software

7.1    Disclaimers

  1. This OSD-AVS Reference Software Implementation is released with the BSD-3-Clause licence.
  2. The purpose of this OSD-AVS Reference Software is to show a working Implementation of OSD-AVS, not to provide a ready-to-use product.
  3. MPAI disclaims the suitability of the Software for any other purposes and does not guarantee that it is secure.
  4. Use of this Reference Software may require acceptance of licences from the respective repositories. Users shall verify that they have the right to use any third-party software required by this Reference Software.

7.2    Guide to the OSD-AVS code

OSD-AVS arranges the aligned visual and speech objects from OSD-AVA into Audio-Visual Scene Descriptors.

Use of this Reference Software for the OSD-AVS AI Module is for developers who are familiar with Python, Docker, and RabbitMQ.

The OSD-AVS Reference Software is found at the MPAI gitlab site. It contains:

  1. src: a folder with the Python code implementing the AIM
  2. Dockerfile: a Docker file containing only the libraries required to build the Docker image and run the container
  3. requirements.txt: dependencies installed in the Docker image.

7.3 Acknowledgements

This version of the OSD-AVS Reference Software has been developed by the MPAI AI Framework Development Committee (AIF-DC).

8     Conformance Testing

9     Performance Assessment

Go To MPAI-OSD AI Modules