1 Function | 2 Reference Model | 3 Input/Output Data |
4 SubAIMs | 5 JSON Metadata | 6 Profiles |
7 Reference Software | 8 Conformance Texting | 9 Performance Assessment |
1 Functions
Audio-Visual Alignment (OSD-AVA):
Receives | Speech Scene Descriptors | Descriptors of potentially present Speech Scene. |
Audio Scene Descriptors | Descriptors of potentially present Audio Scene. | |
Visual Scene Descriptors | Descriptors of Visual Scene. | |
Aligns | Speech, Audio, and Visual Objects | Sharing the same Spatial Attitude |
Produces | Audio-Visual Scene Descriptors | Where Speech Objects, Audio Objects, and Visual Objects having the same Spatial Attitude have the same or compatible Identifiers. |
2 Reference Model
Figure 1 specifies the Reference Model of the Audio-Visual Alignment (OSD-AVA) AIM.
Figure 1 – Reference Model of the Audio-Visual Alignment (OSD-AVA) AIM
3 Input/Output Data
Table 1 specifies the Input and Output Data of the Audio-Visual Alignment (OSD-AVA) AIM.
Table 1 – I/O Data of the Audio-Visual Alignment AIM
Input | Description |
Speech Scene Descriptors | The IDs and the geometry of the Speech Objects of the Scene. |
Audio Scene Descriptors | The IDs and the geometry of the Audio Objects of the Scene. |
Visual Scene Descriptors | The IDs and the geometry of the Audio Objects of the Scene. |
Output | Description |
Audio-Visual Scene Descriptors | The IDs and the geometry of the Audio, Visual and Audio-Visual Objects of the Scene. |
4 SubAIMs
No SubAIMs.
5 JSON Metadata
https://schemas.mpai.community/OSD/V1.2/AIMs/AudioVisualAlignment.json
6 Profiles
No profiles.
7 Reference Software
7.1 Disclaimers
- This OSD-AVA Reference Software Implementation is released with the BSD-3-Clause licence.
- The purpose of this Reference Software is to show a working Implementation of OSD-AVA, not to provide a ready-to-use product.
- MPAI disclaims the suitability of this Reference Software for any other purposes and does not guarantee that it is secure.
- Use of this Reference Software may require acceptance of licences from the respective repositories. Users shall verify that they have the right to use any third-party software required by this Reference Software.
7.2 Guide to OSD-AVA code
OSD-AVA arranges the output Visual Objects and Speech Objects with corresponding Time information: scene cuts/transitions and speakers’ turns. Each Object is bounded by two adjacent times from a list of unique times that are either 1) scene cuts/transitions or 2) starts and ends of speakers’ turns.
Use of this Reference Software for the OSD-AVA AI Module is for developers who are familiar with Python, Docker, and RabbitMQ.
OSD-AVA computes segments as unique intervals from scene bounds and from speech segments. Moreover, OSD-AVA outputs visual objects and speech objects.
The OSD-AVA Reference Software is found at the MPAI gitlab site. It contains:
- src: a folder with the Python code implementing the AIM
- Dockerfile: a Docker file containing only the libraries required to build the Docker image and run the container
- requirements.txt: dependencies installed in the Docker image.
7.3 Acknowledgements
This version of the MMC-ASR Reference Software has been developed by the MPAI AI Framework Development Committee (AIF-DC).
8 Conformance Testing
Table 2 provides the Conformance Testing Method for OSD-AVA AIM.
If a schema contains references to other schemas, conformance of data for the primary schema implies that any data referencing a secondary schema shall also validate against the relevant schema, if present and conform with the Qualifier, if present.
Table 2 – Conformance Testing Method for OSD-AVA AIM
Receives | Speech Scene Descriptors | Shall validate against Speech Scene Descriptors schema |
Audio Scene Descriptors | Shall validate against Audio Scene Descriptors schema | |
Visual Scene Descriptors | Shall validate against Visual Scene Descriptors schema | |
Produces | Audio-Visual Scene Descriptors | Shall validate against AV Scene Descriptors schema |
9 Performance Assessment