| 1 Definition | 2 Functional Requirements | 3 Syntax |
| 4 Semantics | 5 Conformance Testing | 6 Performance Assessment |
1 Definition
A Data Type including the Audio-Visual Scene’s Objects and Sub-Scenes and their arrangement in the Scene.
2 Functional Requirements
Audio-Visual Scene Descriptors includes Scenes in addition to Objects.
3 Syntax
https://schemas.mpai.community/OSD/V1.3/data/AudioVisualSceneDescriptors.json
4 Semantics
| Label | Size | Description |
| Header | N1 Bytes | Audio-Visual Scene Descriptors Header |
| – Standard-AVSceneDescriptors | 9 Bytes | The characters “OSD-AVS-V” |
| – Version | N2 Bytes | Major version – 1 or 2 characters |
| – Dot-separator | 1 Byte | The character “.” |
| – Subversion | N3 Bytes | Minor version – 1 or 2 characters |
| MInstanceID | N4 Bytes | Identifier of M-Instance. |
| AVBasicSceneDescriptorsID | N5 Bytes | Identifier of the AV Object. |
| ObjectCount | N6 Bytes | Number of Objects in Scene |
| AVSceneSpaceTime | N7 Bytes | Data about Space and Time |
| SpeechObjects[] | N8 Bytes | Set of Speech Objects |
| – SpeechObject | N9 Bytes | Speech Object |
| – SpeechObjectSpaceTime | N10 Bytes | Space-Time of Speech Object |
| AudioObjects[] | N11 Bytes | Set of Audio Objects |
| – AudioObject | N12 Bytes | ID of Audio Object |
| – AudioObjectSpaceTime | N13 Bytes | Space-Time of Audio Object |
| VisualObjects[] | N14 Bytes | Set of Visual Objects |
| – VisualObjectID | N15 Bytes | ID of Visual Object |
| – VisualObjectSpaceTime | N16 Bytes | Space-Time of Visual Object |
| AudioVisualObjects[] | N17 Bytes | Set of Audio-Visual Objects |
| – AudioVisualObjectID | N18 Bytes | ID of Audio-Visual Object |
| – AudioObjectSpaceTime | N19 Bytes | Space-Time of Audio-Visual Object |
| SubSceneCount | N20 Bytes | Number of Sub-Scenes in Scene |
| SpeechSubScenes[] | N21 Bytes | Set of Speech Objects |
| – SpeechSubScene | N22 Bytes | Speech SubScene |
| – SpeechSubSceneSpaceTime | N23 Bytes | Space-Time of Speech SubScene |
| AudioSubScenes[] | N24 Bytes | Set of Audio SubScenes |
| – AudioSubScene | N25 Bytes | ID of Audio SubScene |
| – AudioSubSceneSpaceTime | N26 Bytes | Space-Time of Audio SubScene |
| VisualSubScenes[] | N27 Bytes | Set of Visual SubScenes |
| – VisualSubSceneID | N28 Bytes | ID of Visual SubScene |
| – VisualSubSceneSpaceTime | N29 Bytes | Space-Time of Visual SubScene |
| AudioVisualSubScenes[] | N30 Bytes | Set of Audio-Visual SubScenes |
| – AudioVisualSubSceneID | N31 Bytes | ID of Audio-Visual SubScene |
| – AudioSubSceneSpaceTime | N31 Bytes | Space-Time of Audio-Visual SubScene |
| DescrMetadata | N33 Bytes | Descriptive Metadata |
5 Conformance Testing
A Data instance Conforms with Audio-Visual Scene Descriptors (OSD-AVS) V1.3 if:
- The Data validates against the Audio-Visual Scene Descriptors’ JSON Schema.
- All Data in the Audio-Visual Scene Descriptors’ JSON Schema
- Have the specified type
- Validate against their JSON Schemas
- Conform with their Data Qualifiers if present.