1    Functions

Audio-Visual Scene Description (OSD-AVS) provides standard descriptors of an Audio-Visual Scene.

Receives Audio Object
Visual Objects
Creates Audio-Visual or Visual Scene Descriptors if there is only one Visual or Audio-Visual Object, respectively, and no Scene Geometry.
Produces Audio-Visual Scene Descriptors

2      Reference Architecture

The Reference Architecture is depicted in Figure 1.

Figure 1 – The Audio-Visual Scene Description AIM

3      I/O Data

Table 1 specifies the Input and Output Data of the Visual Scene Description AIM. Links are to the Data Type specifications.

Table 1 – I/O Data of the Visual Scene Description AIM

Input Description
Audio Objects Audio Objects.
Visual Objects Visual Objects.
Output Description
Audio-Visual Scene Descriptors The Audio-Visual Descriptors of the Scene.

4     SubAIMs

Audio Scene Description (CAE-ASD) is a Composite AIM with the structure is depicted in Figure 2.

Figure 2 – The Audio-Visual Scene Description (OSD-AVS) Composite AIM

Table 2 provides the links to the specifications of the OSD-AVS Basic AIMs.

Table 2 – BASIC AIMs of the Audio-Visual Scene Description (OSD-AVS) Composite AIM

AIMs   Names
CAE-ASD Audio Scene Description
CAE-AAT Audio Analysis Transform
CAE-ASL Audio Source Localisation
CAE-ASE Audio Separation and Enhancement
CAE-AST Audio Synthesis Transform
CAE-AMX Audio Descriptors Multiplexing
OSD-VSD Visual Scene Description
OSD-AVA Audio-Visual Alignment

5     JSON Metadata

https://schemas.mpai.community/OSD/V1.1/AIMs/AudioVisualSceneDescription.json

6     Profiles

No Profiles.