1    Functions

The Audio-Visual Scene Description (OSD-AVS) provides standard descriptors of an Audio-Visual Scene.

  1. Receives Input Audio and Input Visual.
  2. Produces Audio-Visual Scene Descriptors.

2      Reference Architecture

The Reference Architecture is depicted in Figure 1.

Figure 1 – The Audio-Visual Scene Description AIM

3      I/O Data

Table 1 specifies the Input and Output Data of the Visual Scene Description AIM. Links are to the Data Type specifications.

Table 1 – I/O Data of the Visual Scene Description AIM

Input Description
Input Visual Visual Scene captured by AIM.
Input Audio Audio Scene captured by AIM.
Output Description
Audio-Visual Scene Descriptors The Visual Descriptors of the Visual Scene.

4     SubAIMs

Table 2 – Visual Scene Description AIMs

AIM Name JSON
OSD-AVS Audio-Visual Scene Description X
CAE-ASD Audio Scene Description X
CAE-AAT Audio Analysis Transform X
CAE-ASL Audio Source Localisation X
CAE-ASE Audio Separation and Enhancement X
CAE-AST Audio Synthesis Transform X
CAE-AMX Audio Descriptor Multiplexing X
OSD-VSD Visual Scene Description X
OSD-AVA Audio-Visual Alignment X

5     JSON Metadata

https://schemas.mpai.community/OSD/V1.1/AIMs/AudioVisualSceneDescription.json

6     Profiles

No Profiles.