1     Functions

Audio-Visual Alignment (OSD-AVS) produces the Audio-Visual Scene Descriptors from Input Audio and Input Visual of a scene.

3      Reference Architecture

Figure 1 depicts the OSD-AVS Reference Model.

Figure 1 – OSD-AVS Reference Model

4      I/O Data

Table 1 specifies the I/O Data of OSD-AVS .

Table 1 – I/O Data of OSD-AVS

Input Description
Input Audio The digital representation of the captured audio.
Input Visual The digital representation of the captured visual.
Output Description
Audio-Visual Scene Descriptors The Descriptors of the Audio-Visual Scene,

5      SubAIMs

No SubAIMs.

6     JSON Metadata

https://schemas.mpai.community/OSD/V1.1/AIMs/AudioVisualAlignment.json