1 Function | 2 Reference Model | 3 Input/Output Data |
4 SubAIMs | 5 JSON Metadata | 6 Profiles |
7 Reference Software | 8 Conformance Texting | 9 Performance Assessment |
1 Functions
Audio-Visual Scene Description (OSD-AVS):
Receives | Space-Time | Of output Audio-Visual Scene Descriptors, |
Speech Objects | ||
Audio Objects | ||
Visual Objects | ||
Audio-Visual Scene Descriptors | Of Scene to be augmented. | |
Augments | Audio-Visual Scene Descriptors | |
Produces | Audio-Visual Scene Descriptors |
2 Reference Model
Figure 1 specified the Reference Model OF Audio-Visual Scene Description (OSD-AVS) aim.
Figure 1 – The Audio-Visual Scene Description (OSD-AVS) AIM
3 Input/Output Data
Table 1 specifies the Input and Output Data of the Audio-Visual Scene Description (OSD-AVS) AIM. Links are to the Data Type specifications.
Table 1 – I/O Data of the Audio-Visual Scene Description (OSD-AVS) AIM
Input | Description |
Space-Time | Space-Time information of output Audio-Visual Scene Descriptors |
Speech Object | Speech Object |
Audio Objects | Audio Objects. |
Visual Objects | Visual Objects. |
Audio-Visual Scene Descriptors | The Audio-Visual Descriptors of the Scene part of the target Audio-Visual Scene. |
Output | Description |
Audio-Visual Scene Descriptors | The Audio-Visual Descriptors of the Scene. |
4 SubAIMs
Figure 2 specified the Reference Model of Audio-Visual Scene Description (CAE-ASD) Composite AIM.
Figure 2 – The Audio-Visual Scene Description (OSD-AVS) Composite AIM
Table 2 provides the links to the specifications of the OSD-AVS AIMs.
Table 2 – AIMs of the Audio-Visual Scene Description (OSD-AVS) Composite AIM
AIMs | Names | JSON |
MMC-SSD | Speech Scene Description | X |
CAE-ASD | Audio Scene Description | X |
OSD-VSD | Visual Scene Description | X |
OSD-AVA | Audio-Visual Alignment | X |
5 JSON Metadata
https://schemas.mpai.community/OSD/V1.2/AIMs/AudioVisualSceneDescription.json
6 Profiles
No Profiles.
7 Reference Software
7.1 Disclaimers
- This OSD-AVS Reference Software Implementation is released with the BSD-3-Clause licence.
- The purpose of this OSD-AVS Reference Software is to show a working Implementation of OSD-AVS, not to provide a ready-to-use product.
- MPAI disclaims the suitability of the Software for any other purposes and does not guarantee that it is secure.
- Use of this Reference Software may require acceptance of licences from the respective repositories. Users shall verify that they have the right to use any third-party software required by this Reference Software.
7.2 Guide to the OSD-AVS code
OSD-AVS arranges the aligned visual and speech objects into Audio-Visual Scene Descriptors.
Use of this Reference Software for the OSD-AVS AI Module is for developers who are familiar with Python, Docker, and RabbitMQ.
The OSD-AVS Reference Software is found at the MPAI gitlab site. It contains:
- src: a folder with the Python code implementing the AIM
- Dockerfile: a Docker file containing only the libraries required to build the Docker image and run the container
- requirements.txt: dependencies installed in the Docker image.
7.3 Acknowledgements
This OSD-AVS Reference Software has been developed by the MPAI AI Framework Development Committee (AIF-DC).
8 Conformance Testing
Table 2 provides the Conformance Testing Method for OSD-AVS AIM. AIM. Conformance Testing of the individual AIMs of the OSD-AVS Composite AIM are given by the individual AIM Specification.
If a schema contains references to other schemas, conformance of data for the primary schema implies that any data referencing a secondary schema shall also validate against the relevant schema, if present and conform with the Qualifier, if present.
Table 2 – Conformance Testing Method for OSD-AVS AIM
Receives | Space-Time | Shall validate against Space-Time schema. |
Speech Objects | Shall validate against Speech Objects schema. Speech Data shall conform with Qualifier. |
|
Audio Objects | Shall validate against Audio Objects schema. Audio Data shall conform with Qualifier. |
|
Visual Objects | Shall validate against Visual Objects schema. Visual Data shall conform with Qualifier. |
|
Produces | Audio-Visual Scene Descriptors | Shall validate against AV Scene Descriptors schema. |