Definition
Audio-Visual Scene Descriptors is a Data Type including the Geometry and the Objects of an Audio-Visual Scene.
Syntax
https://schemas.mpai.community/OSD/V1.0/data/AudioVisualSceneDescriptors.json
Semantics
| Label | Size | Description | ||
| HEADER | 9 Bytes | |||
| – Standard | 7 Bytes | The string OSD-AVS | ||
| – Version | 1 Byte | Major version | ||
| – Subversion | 1 Byte | Minor version | ||
| AVDID | 16 Bytes | UUID Identifier of the total set of Audio-Visual Scene Descriptors. | ||
| Time | 17 Bytes | Collects various data expressed with bits | ||
| – TimeType | 0 bit | 0=Relative: time starts at 0000/00/00T00:00 1=Absolute: time starts at 1970/01/01T00:00. |
||
| – Reserved | 1-7 bits | reserved | ||
| – StartTime | 8 Bytes | Start time of current Audio-Visual Scene Descriptors (in microseconds). | ||
| – EndTime | 8 Bytes | End time of current Audio-Visual Scene Descriptors (in microseconds). | ||
| AVObjectCount | 1 Byte | Number of Objects in Scene. | ||
| AVObjectData | N1 Bytes | Data associated to each Object. | ||
| – AVObjectID | 1 Byte | ID of a specific Object in the Scene. | ||
| – SamplingRate | 0-3 bits | 0: 8kHz, 1: 16kHz, 2: 24kHz, 3: 32kHz, 4: 44.1kHz, 5: 48kHz, 6: 64kHz, 7: 96kHz, 8: 192kHz | ||
| – SampleType | 4-5 bits | 0:16bit, 1:24bit, 2:32bit, 3:64bit) | ||
| – Reserved | 6-7 bits | |||
| – SpatialAttitude | N2 Bytes | According to MPAI-OSD V1 | ||
| – AudioObject | N3 Bytes | |||
| – FormatID | 1 Byte | Audio Object Format Identifier | ||
| – SpeakerID | 1 Byte | Instance ID of Speaker | ||
| – Length | 4 Bytes | Number of Bytes in Audio Object | ||
| – DataInObject | N4 Bytes | Data of Audio Object | ||
| – VisualObject | N5 Bytes | |||
| – FormatID | 1 Byte | Visual Object Format Identifier | ||
| – FaceID | 1 Byte | Instance ID of Face | ||
| – Length | 4 Bytes | Number of Bytes in Audio Object | ||
| – DataInObject | N6 Bytes | Data of Visual Object | ||