Definition

Audio-Visual Scene Descriptors is a Data Type including the Geometry and the Objects of an Audio-Visual Scene.

Syntax

https://schemas.mpai.community/OSD/V1.0/data/AudioVisualSceneDescriptors.json

Semantics

Label Size Description
HEADER 9 Bytes
– Standard 7 Bytes The string OSD-AVS
– Version 1 Byte Major version
– Subversion 1 Byte Minor
AVDID 16 Bytes UUID Identifier of the total set of Audio-Visual Scene Descriptors.
Time 17 Bytes Collects various data expressed with bits
– TimeType 0 bit 0=Relative: time starts at 0000/00/00T00:00
1=Absolute: time starts at 1970/01/01T00:00.
– Reserved 1-7 bits reserved
– StartTime 8 Bytes Start time of current Audio-Visual Scene Descriptors (in microseconds).
– EndTime 8 Bytes End time of current Audio-Visual Scene Descriptors (in microseconds).
AVObjectCount 1 Byte Number of Objects in Scene.
AVObjectData N1 Bytes Data associated to each Object.
– AVObjectID 1 Byte ID of a specific Object in the Scene.
– SamplingRate 0-3 bits 0: 8kHz, 1: 16kHz, 2: 24kHz, 3: 32kHz, 4: 44.1kHz, 5: 48kHz, 6: 64kHz, 7: 96kHz, 8: 192kHz
– SampleType 4-5 bits 0:16bit, 1:24bit, 2:32bit, 3:64bit)
– Reserved 6-7 bits
– SpatialAttitude N2 Bytes According to MPAI-OSD V1
AudioObject N3 Bytes
   – FormatID 1 Byte Audio Object Format Identifier
   – Length 4 Bytes Number of Bytes in Audio Object
   – DataInObject N4 Bytes Data of Audio Object
VisualObject N5 Bytes
   – FormatID 1 Byte Visual Object Format Identifier
   – Length 4 Bytes Number of Bytes in Audio Object
   – DataInObject N6 Bytes Data of Visual Object