Definition
Audio-Visual Scene Descriptors is a Data Type including the Geometry and the Objects of an Audio-Visual Scene.
Syntax
https://schemas.mpai.community/OSD/V1.0/data/AudioVisualSceneDescriptors.json
Semantics
Label | Size | Description | ||
HEADER | 9 Bytes | |||
– Standard | 7 Bytes | The string OSD-AVS | ||
– Version | 1 Byte | Major version | ||
– Subversion | 1 Byte | Minor version | ||
AVDID | 16 Bytes | UUID Identifier of the total set of Audio-Visual Scene Descriptors. | ||
Time | 17 Bytes | Collects various data expressed with bits | ||
– TimeType | 0 bit | 0=Relative: time starts at 0000/00/00T00:00 1=Absolute: time starts at 1970/01/01T00:00. |
||
– Reserved | 1-7 bits | reserved | ||
– StartTime | 8 Bytes | Start time of current Audio-Visual Scene Descriptors (in microseconds). | ||
– EndTime | 8 Bytes | End time of current Audio-Visual Scene Descriptors (in microseconds). | ||
AVObjectCount | 1 Byte | Number of Objects in Scene. | ||
AVObjectData | N1 Bytes | Data associated to each Object. | ||
– AVObjectID | 1 Byte | ID of a specific Object in the Scene. | ||
– SamplingRate | 0-3 bits | 0: 8kHz, 1: 16kHz, 2: 24kHz, 3: 32kHz, 4: 44.1kHz, 5: 48kHz, 6: 64kHz, 7: 96kHz, 8: 192kHz | ||
– SampleType | 4-5 bits | 0:16bit, 1:24bit, 2:32bit, 3:64bit) | ||
– Reserved | 6-7 bits | |||
– SpatialAttitude | N2 Bytes | According to MPAI-OSD V1 | ||
– AudioObject | N3 Bytes | |||
– FormatID | 1 Byte | Audio Object Format Identifier | ||
– SpeakerID | 1 Byte | Instance ID of Speaker | ||
– Length | 4 Bytes | Number of Bytes in Audio Object | ||
– DataInObject | N4 Bytes | Data of Audio Object | ||
– VisualObject | N5 Bytes | |||
– FormatID | 1 Byte | Visual Object Format Identifier | ||
– FaceID | 1 Byte | Instance ID of Face | ||
– Length | 4 Bytes | Number of Bytes in Audio Object | ||
– DataInObject | N6 Bytes | Data of Visual Object |