1     Function 2     Reference Model 3     Input/Output Data
4     SubAIMs 5     JSON Metadata 6     Profiles
7     Reference Software 8     Conformance Texting 9     Performance Assessment

1     Functions

Audio-Visual Alignment (OSD-AVA):

Receives Speech Scene Descriptors
Audio Scene Descriptors
Visual Scene Descriptors
Aligns The Speech, Audio, and Visual Objects sharing the same Spatial Attitude
Produces Audio-Visual Scene Descriptors where Speech Objects, Audio Objects,  and Visual Objects having the same Spatial Attitude have the same or compatible Identifiers.

2     Reference Model

Figure 1 specifies the Reference Model of the Audio-Visual Alignment (OSD-AVA) AIM.

Figure 1 – Reference Model of the Audio-Visual Alignment (OSD-AVA) AIM

3    Input/Output Data

Table 1 specifies the Input and Output Data of the Audio-Visual Alignment (OSD-AVA) AIM.

Table 1 – I/O Data of the Audio-Visual Alignment AIM

Input Description
Speech Scene Descriptors The IDs and the geometry of the Speech Objects of the Scene.
Audio Scene Descriptors The IDs and the geometry of the Audio Objects of the Scene.
Visual Scene Descriptors The IDs and the geometry of the Audio Objects of the Scene.
Output Description
Audio-Visual Scene Descriptors The IDs and the geometry of the Audio, Visual and Audio-Visual Objects of the Scene.

4     SubAIMs

No SubAIMs.

5     JSON Metadata

https://schemas.mpai.community/OSD/V1.1/AIMs/AudioVisualAlignment.json

6     Profiles

No profiles.

7     Reference Software

7.1    Disclaimers

  1. The purpose of this Reference Software is to show a working Implementation of MMC-ASR, not to provide a ready-to-use product.
  2. MPAI disclaims the suitability of this Reference Software for any other purposes and does not guarantee that it is secure.
  3. Users shall verify that they have the right to use any third-party software that this Reference Software may require.

7.2 Guide to OSD-AVA Reference Software

7.3 Acknowledgement

8     Conformance Testing

9     Performance Assessment