The Audio-Visual Scene Description Composite AIM is specified in the following six sections.
1 Functions of Audio-Visual Object Description
2 Reference Architecture of Audio-Visual Object Description
3 Input/output data of Audio-Visual Object Description
4 Functions of Audio-Visual Object Description AI Modules
5 I/O Data of Audio-Visual Object Description AI Modules
6 Specification of Audio-Visual Object Description AIMs and JSON Metadata
1 Functions of Audio-Visual Object Description
The Audio-Visual Scene Description (OSD-AVD) Composite AIM receives two independently developed Audio Scene Descriptors and Visual Scene Descriptors in the same Virtual Space and produces Audio-Visual Scene Descriptors whose co-located Audio Objects and Visual Objects have the same or related identifiers.
2 Reference Architecture of Audio-Visual Object Description
Figure 1 gives the Reference Model of Audio-Visual Scene Description.

Figure 1 – Reference Model of Audio-Visual Scene Description
3 Input/output data of Audio-Visual Object Description
Table 1 gives the input/output data of Audio-Visual Scene Description.
Table 1 – I/O data of Audio-Visual Scene Description
| Input data | From | Comment |
| Input Audio | A real environment | The Input Audio and Input Visual originate from the same scene |
| Input Visual | A real environment | The Input Audio and Input Visual originate from the same scene |
| Output data | To | Comments |
| Audio-Visual Scene Descriptors | Downstream AIM | The co-located Audio and Visual Objects in the Scene convey the same or related identifiers. |
4 Functions of Audio-Visual Object Description AI Modules
Table 2 gives functions of the AIMs.
Table 2 – AI Modules of Audio-Visual Scene Description
| AIM | Modules |
| Audio Scene Description | Produces the Audio Scene Descriptors (Geometry+Objects). |
| Visual Scene Description | Produces the Visual Scene Descriptors (Geometry+Objects). |
| Audio-Visual Alignment | Identifies co-located Audio and Visual Objects.
Assigns the same or related Identifiers to the co-located Audio and Visual Objects. Updates the Audio-Visual Scene Geometry. |
| Audio-Visual Scene Multiplexing | Multiplexes the new Audio-Visual Scene Geometry and the Audio and Visual Objects. |
5 I/O Data of Audio-Visual Object Description AI Modules
Table 3 gives the list of the AIMs with their functions.
Table 3 – AI Modules of Audio-Visual Scene Description
6 Specification of Audio-Visual Object Description AIMs and JSON Metadata
Table 4 – AIM and JSON Metadata
| OSD-AVS | Audio-Visual Scene Description | X | |
| – | CAE-ASD | Audio Scene Description | X |
| – | OSD-VSD | Visual Scene Description | X |
| – | OSD-AVA | Audio-Visual Alignment | X |
| – | OSD-AMX | Audio-Visual Scene Multiplexing | X |