The Audio-Visual Scene Description Composite AIM is specified in the following six sections.

1     Functions of Audio-Visual Object Description

2     Reference Architecture of Audio-Visual Object Description

3     Input/output data of Audio-Visual Object Description

4     Functions of Audio-Visual Object Description AI Modules

5     I/O Data of Audio-Visual Object Description AI Modules

6     Specification of Audio-Visual Object Description AIMs and JSON Metadata

1      Functions of Audio-Visual Object Description

The Audio-Visual Scene Description (OSD-AVD) Composite AIM receives two independently developed Audio Scene Descriptors and Visual Scene Descriptors in the same Virtual Space and produces Audio-Visual Scene Descriptors whose co-located Audio Objects and Visual Objects have the same or related identifiers.

2      Reference Architecture of Audio-Visual Object Description

Figure 1 gives the Reference Model of Audio-Visual Scene Description.

Figure 1 – Reference Model of Audio-Visual Scene Description

3      Input/output data of Audio-Visual Object Description

Table 1 gives the input/output data of Audio-Visual Scene Description.

Table 1 – I/O data of Audio-Visual Scene Description

Input data From Comment
Input Audio A real environment The Input Audio and Input Visual originate from the same scene
Input Visual A real environment The Input Audio and Input Visual originate from the same scene
Output data To Comments
Audio-Visual Scene Descriptors Downstream AIM The co-located Audio and Visual Objects in the Scene convey the same or related identifiers.

4      Functions of Audio-Visual Object Description AI Modules

Table 2 gives functions of the AIMs.

Table 2 – AI Modules of Audio-Visual Scene Description

AIM Modules
Audio Scene Description Produces the Audio Scene Descriptors (Geometry+Objects).
Visual Scene Description Produces the Visual Scene Descriptors (Geometry+Objects).
Audio-Visual Alignment Identifies co-located Audio and Visual Objects.

Assigns the same or related Identifiers to the co-located Audio and Visual Objects.

Updates the Audio-Visual Scene Geometry.

Audio-Visual Scene Multiplexing Multiplexes the new Audio-Visual Scene Geometry and the Audio and Visual Objects.

 

5      I/O Data of Audio-Visual Object Description AI Modules

Table 3 gives the list of the AIMs with their functions.

Table 3 – AI Modules of Audio-Visual Scene Description

AIM Receives Produces
Audio Scene Description Input Audio Audio Object
Audio Scene Geometry
Visual Scene Description Input Visual Visual Object
Visual Scene Geometry
Audio-Visual Alignment Audio Scene Geometry
Visual Scene Geometry
Audio-Visual Scene Geometry
Audio-Visual Scene Multiplexing Audio Objects
Visual Objects
Audio-Visual Scene Geometry
Audio-Visual Scene Descriptors

6      Specification of Audio-Visual Object Description AIMs and JSON Metadata

Table 4 – AIM and JSON Metadata

OSD-AVS Audio-Visual Scene Description X
CAE-ASD Audio Scene Description X
OSD-VSD Visual Scene Description X
OSD-AVA Audio-Visual Alignment X
OSD-AMX Audio-Visual Scene Multiplexing X