Go to MPAI-OSD V1.5 AI Modules

Function
Ref. Model
I/O Data
SubAIMs
JSON MData
Profiles
Ref. Software
Conformance
Performance

1 Functions

The Audio-Visual Scene Description (OSD-MSD) Composite AIM receives Audio-Visual Objects, the Descriptors of the Scene the Objects belong to, and their Space-Time information as inputs and produces the Descriptors of a Scene composed of Audio-Visual Objects and Scenes. The OSD-MSD AIM may also receive an Alert conveying information on potential anomalies in the input Audio-Visual Objects.

Receives Space-Time Of output Audio-Visual Scene Descriptors.
Speech Objects Individual Speech Objects.
Audio Objects Individual Audio Objects.
Visual Objects Individual Visual Objects.
Audio-Visual Scene Descriptors Of Scene to be augmented.
Augments Audio-Visual Scene Descriptors With the input Objects.
Produces Audio-Visual Scene Descriptors The augmented Audio-Visual Scene Descriptors.

2 Reference Model

Figure 1 depicts the Reference Model of the Audio-Visual Scene Description (OSD-MSD) Composite AIM.

Audio-Visual Scene Description OSD-AVS AIM

Figure 1 – The Audio-Visual Scene Description (OSD-MSD) Composite AIM

3 I/O Data

Table 1 gives the Input and Output Data of the Audio-Visual Scene Description (OSD-MSD) Composite AIM.

Table 1 – I/O Data of the Audio-Visual Scene Description (OSD-MSD) Composite AIM

Input Description
Space-Time Space-Time information of output Audio-Visual Scene Descriptors.
Speech Object Speech Object.
Audio Objects Audio Objects.
Visual Objects Visual Objects.
Audio-Visual Scene Descriptors The Audio-Visual Descriptors of the Scene part of the target Audio-Visual Scene.
Output Description
Audio-Visual Scene Descriptors The Audio-Visual Descriptors of the Scene.

4 SubAIMs

4.1 Functions of SubAIMs

Figure 2 depicts the Reference Model of the Audio-Visual Scene Description (OSD-MSD) Composite AIM.

Audio-Visual Scene Description OSD-AVS Composite AIM

Figure 2 – The Audio-Visual Scene Description (OSD-MSD) Composite AIM

Table 2 gives the functions of the Audio-Visual Scene Description SubAIMs.

Table 2 – Functions of the Audio-Visual Scene Description (OSD-MSD) SubAIMs

SubAIM Function
Speech Scene Description Produces the Descriptors of a Scene composed of Speech Objects and Scenes.
Audio Scene Description Produces the Descriptors of a Scene composed of Audio Objects and Scenes.
Visual Scene Description Produces the Descriptors of a Scene composed of Visual Objects and Scenes.
Audio-Visual Alignment Produces the Descriptors of an Audio-Visual Scene whose Objects have compatible Identifiers if they have the same Position.

4.2 Operation

The OSD-MSD receives input media and Audio-visual Scene. It produces and Audio-Visual Scene.

4.3 I/O Data of SubAIMs

Table 3 gives, for each SubAIM, the Input and Output Data of the Audio-Visual Scene Description.

Table 3 – I/O Data of the Audio-Visual Scene Description (OSD-MSD) SubAIMs

SubAIM Input Output
Speech Scene Description Space-Time
Speech Object
Scene Descriptors
Speech Scene Descriptors
Audio Scene Description Space-Time
Audio Object
Scene Descriptors
Audio Scene Descriptors
Visual Scene Description Space-Time
Visual Object
Scene Descriptors
Visual Scene Descriptors
Audio-Visual Alignment Speech Scene Descriptors
Audio Scene Descriptors
Visual Scene Descriptors
Audio-Visual Scene Descriptors

4.4 AIMs and JSON Metadata

Table 4 provides the links to the AIM specifications and JSON schemas. AIM1 indicates the Composite AIM and AIM2 its SubAIMs.

Table 4 – AIMs and JSON Metadata of the Audio-Visual Scene Description (OSD-AVS)

AIM1 AIM2 Name JSON
OSD-MSD Audio-Visual Scene Description X
OSD-SSD Speech Scene Description X
OSD-ASD Audio Scene Description X
OSD-VSD Visual Scene Description X
OSD-AVA Audio-Visual Alignment X

5 JSON Metadata

https://schemas.mpai.community/OSD/V1.5/AIMs/AudioVisualSceneDescription.json

6 Profiles

No Profiles.

7 Reference Software

7.1 Disclaimers

  1. This OSD-AVS Reference Software Implementation is released with the BSD-3-Clause licence.
  2. The purpose of this OSD-AVS Reference Software is to show a working Implementation of OSD-AVS, not to provide a ready-to-use product.
  3. MPAI disclaims the suitability of the Software for any other purposes and does not guarantee that it is secure.
  4. Use of this Reference Software may require acceptance of licences from the respective repositories. Users shall verify that they have the right to use any third-party software required by this Reference Software.

7.2 Guide to the OSD-AVS code

OSD-NSS arranges the aligned visual and speech objects into Audio-Visual Scene Descriptors.

Use of this Reference Software for the OSD-MSD AI Module is for developers who are familiar with Python, Docker, and RabbitMQ.

The OSD-msd Reference Software is found at the MPAI gitlab site. It contains:

  1. src: a folder with the Python code implementing the AIM.
  2. Dockerfile: a Docker file containing only the libraries required to build the Docker image and run the container.
  3. requirements.txt: dependencies installed in the Docker image.

7.3 Acknowledgements

This OSD-AVS Reference Software has been developed by the MPAI AI Framework Development Committee (AIF-DC).

8 Conformance Testing

Table 5 provides the Conformance Testing Method for the Audio-Visual Scene Description (OSD-msd) Composite AIM. Conformance Testing of the individual SubAIMs of the OSD-msd Composite AIM are given by the individual AIM specifications.

If a schema contains references to other schemas, conformance of data for the primary schema implies that any data referencing a secondary schema shall also validate against the relevant schema, if present, and conform with the Qualifier, if present.

Table 5 – Conformance Testing Method for the Audio-Visual Scene Description (OSD-msd) Composite AIM

Receives Space-Time Shall validate against Space-Time schema.
Speech Objects Shall validate against Speech Objects schema. Speech Data shall conform with Qualifier.
Audio Objects Shall validate against Audio Objects schema. Audio Data shall conform with Qualifier.
Visual Objects Shall validate against Visual Objects schema. Visual Data shall conform with Qualifier.
Produces Audio-Visual Scene Descriptors Shall validate against Audio-Visual Scene Descriptors schema.

9 Performance Assessment

Not part of this specification.

Go to MPAI-OSD V1.5 AI Modules