Go To MPAI-OSD AI Modules

1     Function 2     Reference Model 3     Input/Output Data
4     SubAIMs 5     JSON Metadata 6     Profiles
7     Reference Software 8     Conformance Texting 9     Performance Assessment

1     Functions

Audio-Visual Scene Description (OSD-AVS):

Receives Space-Time  Of output Audio-Visual Scene Descriptors,
Speech Objects  
Audio Objects  
Visual Objects  
Audio-Visual Scene Descriptors Of Scene to be augmented.
Augments Audio-Visual Scene Descriptors  
Produces Audio-Visual Scene Descriptors  

2     Reference Model

Figure 1 specified the Reference Model OF Audio-Visual Scene Description (OSD-AVS) aim.

Figure 1 – The Audio-Visual Scene Description (OSD-AVS) AIM

3    Input/Output Data

Table 1 specifies the Input and Output Data of the Audio-Visual Scene Description (OSD-AVS) AIM. Links are to the Data Type specifications.

Table 1 – I/O Data of the Audio-Visual Scene Description (OSD-AVS) AIM

Input Description
Space-Time Space-Time information of output Audio-Visual Scene Descriptors
Speech Object Speech Object
Audio Objects Audio Objects.
Visual Objects Visual Objects.
Audio-Visual Scene Descriptors The Audio-Visual Descriptors of the Scene part of the target Audio-Visual Scene.
Output Description
Audio-Visual Scene Descriptors The Audio-Visual Descriptors of the Scene.

4     SubAIMs

Figure 2 specified the Reference Model of Audio-Visual Scene Description (CAE-ASD) Composite AIM.

Figure 2 – The Audio-Visual Scene Description (OSD-AVS) Composite AIM

Table 2 provides the links to the specifications of the OSD-AVS AIMs.

Table 2 – AIMs of the Audio-Visual Scene Description (OSD-AVS) Composite AIM

AIMs Names JSON
MMC-SSD Speech Scene Description X
CAE-ASD Audio Scene Description X
OSD-VSD Visual Scene Description X
OSD-AVA Audio-Visual Alignment X

5     JSON Metadata

https://schemas.mpai.community/OSD/V1.2/AIMs/AudioVisualSceneDescription.json

6     Profiles

No Profiles.

7     Reference Software

7.1    Disclaimers

  1. This OSD-AVS Reference Software Implementation is released with the BSD-3-Clause licence.
  2. The purpose of this OSD-AVS Reference Software is to show a working Implementation of OSD-AVS, not to provide a ready-to-use product.
  3. MPAI disclaims the suitability of the Software for any other purposes and does not guarantee that it is secure.
  4. Use of this Reference Software may require acceptance of licences from the respective repositories. Users shall verify that they have the right to use any third-party software required by this Reference Software.

7.2    Guide to the OSD-AVS code

OSD-AVS arranges the aligned visual and speech objects into Audio-Visual Scene Descriptors.

Use of this Reference Software for the OSD-AVS AI Module is for developers who are familiar with Python, Docker, and RabbitMQ.

The OSD-AVS Reference Software is found at the MPAI gitlab site. It contains:

  1. src: a folder with the Python code implementing the AIM
  2. Dockerfile: a Docker file containing only the libraries required to build the Docker image and run the container
  3. requirements.txt: dependencies installed in the Docker image.

7.3 Acknowledgements

This OSD-AVS Reference Software has been developed by the MPAI AI Framework Development Committee (AIF-DC).

8     Conformance Testing

Table 2 provides the Conformance Testing Method for OSD-AVS AIM. AIM. Conformance Testing of the individual AIMs of the OSD-AVS Composite AIM are given by the individual AIM Specification.

Note that a schema may contain references to other schemas. In this case, validation of data for the primary schema implies that any data that refers to a secondary schema shall also validate.

Table 2 – Conformance Testing Method for OSD-AVS AIM

Receives Space-Time Shall validate against Space-Time schema.
Speech Objects Shall validate against Speech Objects schema.
Speech Data shall conform with Qualifier.
Audio Objects Shall validate against Audio Objects schema.
Audio Data shall conform with Qualifier.
Visual Objects Shall validate against Visual Objects schema.
Visual Data shall conform with Qualifier.
Produces Audio-Visual Scene Descriptors Shall validate against AV Scene Descriptors schema.

9     Performance Assessment

Go To MPAI-OSD AI Modules