6.6.8 Audio-Visual Scene Descriptors.
6.6.9 Visual Scene Descriptors.
6.6.10 Lidar Scene Descriptors.
6.6.11 RADAR Scene Descriptors.
6.6.12 Ultrasound Scene Descriptors.
6.6.13 Offline Maps Scene Descriptors.
6.6.14 Audio Scene Descriptors.
6.6.15 Traffic Signal Descriptors.
6.6.16 Basic Environment Representation.
1.1.1 Input Audio
1.1.1.1 Definition
Multichannel Audio provided by a Microphone Array used to:
- Create Audio Scene Descriptors to:
- Enable extraction of speech addressed by humans outside or inside the HCI.
- Incorporate outdoor Audio information into the Basic Environment Representation.
- Suppress noise and individual sound sources outside the passenger cabin.
1.1.1.2 Functional Requirements
Microphone (arrays) are used to capture the sound both outdoor and indoor for the purpose of creating Audio Scene Description to:
- Provide the location of sound sources.
- Enable extraction of speech addressed to CAV by humans.
- Remove unwanted noise from the passenger cabin.
- Incorporate Audio information into the Basic Environment Representation.
MPAI has developed specifications for Multichannel Audio, Multichannel Audio Stream, and Microphone Array Geometry.
1.1.1.3 Syntax
https://schemas.mpai.community/CAE1/V2.2/data/AudioFormatID.json
https://schemas.mpai.community/OSD/V1.1/data/SpaceTime.json
1.1.1.4 Semantics
Label | Size | Description |
Header | N1 Bytes | |
· Standard | 9 Bytes | The characters “CAV-IAU-V” |
· Version | N2 Bytes | Major version – 1 or 2 Bytes |
· Dot-separator | 1 Byte | The character “.” |
· Subversion | N3 Bytes | Minor version – 1 or 2 Bytes. |
InputAudioID | N4 Bytes | Identifier of LiDAR Sensor. |
InputAudioTimeSpaceAttributes | N5 Bytes | Time and Space of Input Audio Data. |
InputAudioData | N6 Bytes | |
· AudioFormatID | N7 Bytes | Format ID of Input Audio Data. |
· InputAudioDataLength | N8 Bytes | Data Length of Input Audio Data in Bytes. |
· InputAudioDataURI | N9 Bytes | Location of Input Audio Data. |
InputAudioAttributes[] | N10 Bytes | |
· AudioAttributeID | N11 Bytes | ID of Attribute of Input Audio Data |
· AudioAttributeFormatID | N12 Bytes | ID of Attribute Format of Input Audio Data |
· InputAudioAttributeLength | N13 Bytes | Number of Bytes in Input Audio Attribute Data |
· InputAudioAttributeDataURI | N14 Bytes | URI of Data of Input Audio Attribute Data |
DescrMetadata | N1 Bytes | Descriptive Metadata |
1.1.1.5 Data Formats
Input Audio requires:
- Audio Format.
- Audio Attribute Format.
1.1.1.6 To Respondents
Respondents are invited to:
- Comment or elaborate on the relevance and applicability of the above-mentioned three standards to CAV.
- Comment on the Functional Requirements.
- Propose motivated Functional Requirements for an Audio Array Format suitable to create a 3D sound field representation of the Environment for the stated purposes.
- Propose Data Formats and Attributes for use in the future Technical Specification: Data Types, Formats, and Attributes (MPAI-TFA).
1.1.2 Input Visual
1.1.2.1 Definition
Digital representation of information captured in the visible range of the electromagnetic field from single camera or an array of cameras.
1.1.2.2 Functional Requirements
A visual scene can be captured by an array of visual sensors characterised by:
- Number and position of sensing devices.
- Number of horizontal and vertical sensors in a sensing device.
- Frame frequency.
- Colour space.
- Bit-depth information.
- Depth (distance from scene pixel) information.
Captured Data
- Provide pixel value, time and potentially depth.
- Are used to:
- Provide the position and orientation of individual visual objects.
- Provide Visual Scene Descriptors.
- Enable identification, tracking and representation of relevant visual objects, including humans.
1.1.2.3 Syntax
https://schemas.mpai.community/PAF/V1.2/data/VisualFormatID.json
https://schemas.mpai.community/OSD/V1.1/data/SpaceTime.json
1.1.2.4 Semantics
Label | Size | Description |
Header | N1 Bytes | |
· Standard | 8 Bytes | The characters “CAV-IVI-V” |
· Version | N2 Bytes | Major version – 1 or 2 Bytes |
· Dot-separator | 1 Byte | The character “.” |
· Subversion | N3 Bytes | Minor version – 1 or 2 Bytes. |
InputVisualID | N5 Bytes | Identifier of LiDAR Sensor. |
InputVisualTimeSpaceAttributes | N6 Bytes | Total duration of Input Visual Data. |
InputVisualData | N7 Bytes | |
· InputVisualFormatID | N8 Bytes | Format ID of Input Visual Data. |
· InputVisualDataLength | N9 Bytes | Data Length of Input Visual Data in Bytes. |
· InputVisualDataURI | N10 Bytes | Location of Input Visual Data. |
AudioObjectAttributes[] | N11 Bytes | |
· InputVisualAttributeID | N12 Bytes | ID of Attribute of Input Visual Data |
· InputVisualFormatAttributeID | N13 Bytes | ID of Attribute Format of Input Visual Data |
· InputVisualAttributeLength | N14 Bytes | Number of Bytes in Input Visual Data |
· InputVisualAttributeDataURI | N15 Bytes | URI of Data of Input Visual Data |
DescrMetadata | N16 Bytes | Descriptive Metadata |
1.1.2.5 Data Types
Input Visual required the following Formats:
- Visual Format
- Visual Attribute
- Visual Attribute Format
1.1.2.6 To Respondents
Respondents are invited to:
- Comment Functional Requirements or propose new ones.
- Comment on and propose formats (2D, 2D+ depth, or 3D visual sensors) use in the future Technical Specification: Data Types, Formats, and Attributes (MPAI-TFA).
1.1.3 Input RADAR
1.1.3.1 Definition
Data produced by a “time-of-flight”-based active sensor called Radio Detection and Ranging (RADAR) able to measure the distance and speed of objects from the time it takes for a signal emitted by the sensor to hit an object and be reflected.
RADAR operates in the mm range. It can detect vehicles (CAVs and trucks) because they typically reflect radar signals while smaller and less reflecting objects, e.g., pedestrians and motorcycles have a poor reflectance. In a busy environment, the reflections of big vehicles can overcome a motorcycle’s causing missed detection of important objects (e.g., a human next to a vehicle), while a can may produce an image out of proportion to its size.
1.1.3.2 Functional Requirements
The main features of Radar Data are:
- Ability to detect objects and measure speed @ ≤ 250 m (long range radar in the 76-77 GHz).
- Ability to provide a resolution of ~25-cm radial and ~1.5 degrees angular.
- Suitability to measure distance (short range radar in the 25 GHz band).
1.1.3.3 Syntax
https://schemas.mpai.community/CAV2/V1.0/data/RADARFormatID.json
https://schemas.mpai.community/OSD/V1.1/data/SpaceTime.json”
1.1.3.4 Semantics
Label | Size | Description |
Header | N1 Bytes | |
· Standard | 9 Bytes | The characters “CAV-IRA-V” |
· Version | N2 Bytes | Major version – 1 or 2 Bytes |
· Dot-separator | 1 Byte | The character “.” |
· Subversion | N3 Bytes | Minor version – 1 or 2 Bytes. |
InputRADARID | N5 Bytes | Identifier of RADAR Sensor. |
InputRADARTimeSpaceAttributes | N6 Bytes | Total duration of Input RADAR Data. |
InputRADARData | N7 Bytes | |
· InputRADARFormatID | N8 Bytes | Format ID of Input RADAR Data. |
· InputRADARDataLength | N9 Bytes | Data Length of Input RADAR Data in Bytes. |
· InputRADARDataURI | N10 Bytes | Location of Input RADAR Data. |
AudioObjectAttributes[] | N11 Bytes | |
· InputRADARAttributeID | N12 Bytes | ID of Attribute of Input RADAR Data |
· RADARAttributeFormatID | N13 Bytes | ID of Attribute Format of Input Audio Data |
· InputRADARAttributeLength | N14 Bytes | Number of Bytes in Input RADAR Data |
· InputRADARAttributeDataURI | N15 Bytes | URI of Data of Input RADAR Data |
1.1.3.5 Data Formats
Input RADAR requires:
- RADAR Format.
- RADAR Attribute Format.
1.1.3.6 To Respondents
Respondents are invited to:
- Identify functional requirements of the output data produced by RADAR sensors for indoor/outdoor (cabin) use.
- Propose Data Formats and Attributes for use in the future Technical Specification: Data Types, Formats, and Attributes (MPAI-TFA).
1.1.4 Input LiDAR
1.1.4.1 Definition
Data produced by a “time-of-flight”-based active sensor called Light Detection and Ranging (LiDAR) able to measure the distance and speed of objects from the time it takes for a signal emitted by the sensor to hit an object and be reflected.
1.1.4.2 Functional Requirements
Produces the distance of a voxel from the sensor, its grayscale by the intensity variation of the reflected light, its colour by using more than one wavelength, its velocity by using the Doppler shift in frequency caused by motion, or by taking the position at different times with an angular resolution ~0.1º vertical and ~1º horizontal with a maximum field capture ~40º vertical.
1.1.4.3 Syntax
https://schemas.mpai.community/CAV2/V1.0/data/LiDARFormatID.json
https://schemas.mpai.community/OSD/V1.1/data/SpaceTime.json
1.1.4.4 Semantics
Label | Size | Description |
Header | N1 Bytes | |
· Standard | 9 Bytes | The characters “CAV-ILI-V” |
· Version | N2 Bytes | Major version – 1 or 2 Bytes |
· Dot-separator | 1 Byte | The character “.” |
· Subversion | N3 Bytes | Minor version – 1 or 2 Bytes. |
InputLiDARID | N4 Bytes | Identifier of LiDAR Sensor. |
InputLiDARTimeSpaceAttributes | N5 Bytes | Total duration of Input LiDAR Data. |
· Duration | N6 Bytes | Duration of LiDAR Data Block. |
· SpatialAttitude | N7 Bytes | CAV’s Spatial Attitude when getting Data. |
InputLiDARData | N8 Bytes | |
· InputLiDARFormatID | N9 Bytes | Format ID of Input LiDAR Data. |
· InputLiDARDataLength | N10 Bytes | Data Length of Input LiDAR Data in Bytes. |
· InputLiDARDataURI | N11 Bytes | Location of Input LiDAR Data. |
AudioObjectAttributes[] | N12 Bytes | |
· InputLiDARAttributeID | N13 Bytes | ID of Attribute of Input LiDAR Data |
· LiDARAttributeFormatID | N14 Bytes | ID of Attribute Format of Input Audio Data |
· InputLiDARAttributeLength | N15 Bytes | Number of Bytes in Input LiDAR Data |
· InputLiDARAttributeDataURI | N16 Bytes | URI of Data of Input LiDAR Data |
1.1.4.5 Data Formats
Input LiDAR requires:
- LiDAR Format.
- LiDAR Attribute Format.
1.1.4.6 To Respondents
Respondents are invited to:
- Comment or extend the functional requirements of the data produced by LiDAR sensors for indoor/outdoor (cabin) use.
- Propose Data Formats and Attributes for use in the future Technical Specification: Data Types, Formats, and Attributes (MPAI-TFA).
1.1.5 Input Ultrasound
1.1.5.1 Definition
A Data Type representing analogue signals generated by information captured by an ultrasonic sensor.
1.1.5.2 Functional Requirements
An active time-of-flight sensor typically operating in the 40 kHz to 250 kHz range.
The main features of Ultrasound are:
- Ability to monitor the immediate surroundings of the vehicle (≤ 10 m).
- Operation frequency above 30 kHz.
- Low-resolution images.
1.1.5.3 Syntax
https://schemas.mpai.community/CAV2/V1.0/data/UltrasoundFormatID.json
https://schemas.mpai.community/OSD/V1.1/data/SpaceTime.json
1.1.5.4 Semantics
Label | Size | Description |
Header | N1 Bytes | |
· Standard | 9 Bytes | The characters “CAV-IUS-V” |
· Version | N2 Bytes | Major version – 1 or 2 Bytes |
· Dot-separator | 1 Byte | The character “.” |
· Subversion | N3 Bytes | Minor version – 1 or 2 Bytes. |
InputUltrasoundID | N4 Bytes | Identifier of Ultrasound Sensor. |
InputUltrasoundTimeSpaceAttributes | N5 Bytes | Total duration of Input Ultrasound Data. |
· Duration | N6 Bytes | Duration of Ultrasound Data Block. |
· SpatialAttitude | N7 Bytes | CAV’s Spatial Attitude when getting Data. |
InputUltrasoundData | N8 Bytes | |
· InputUltrasoundFormatID | N9 Bytes | Format ID of Input Ultrasound Data. |
· InputUltrasoundDataLength | N10 Bytes | Data Length of Input Ultrasound Data in Bytes. |
· InputUltrasoundDataURI | N11 Bytes | Location of Input Ultrasound Data. |
AudioObjectAttributes[] | N12 Bytes | |
· InputUltrasoundAttributeID | N13 Bytes | ID of Attribute of Input Ultrasound Data |
· UltrasoundAttributeFormatID | N14 Bytes | ID of Attribute Format of Input Audio Data |
· InputUltrasoundAttributeLength | N15 Bytes | Number of Bytes in Input Ultrasound Data |
· InputUltrasoundAttributeDataURI | N16 Bytes | URI of Data of Input Ultrasound Data |
1.1.5.5 Data Formats
Ultrasound Data Formats are required.
1.1.5.6 To Respondents
- Comment or elaborate on the functional requirements of Ultrasound images formats with the goal of achieving tracking and representation of objects for the Ultrasound Scene Description.
- Propose Data Formats and Attributes for use in the future Technical Specification: Data Types, Formats, and Attributes (MPAI-TFA).
1.1.6 GNSS Data
Global Navigation Satellite System (GNSS) Data for a constellation of satellites that transmit positioning and timing data to GNSS receivers to determine their location.
1.1.6.1 Functional Requirements
GNSS Data can come from four GNSSs – GPS (US), GLONASS (RU), Galileo (EU), BeiDou (CN) and two regional systems – QZSS (Japan) and IRNSS or NavIC (India). Position accuracy depends on the GNSS system.
1.1.6.2 Syntax
https://schemas.mpai.community/CAV2/V1.0/data/GNSSFormatID.json
https://schemas.mpai.community/OSD/V1.1/data/SpaceTime.json
1.1.6.3 Semantics
Label | Size | Description |
Header | N1 Bytes | |
· Standard | 9 Bytes | The characters “CAV-IGN-V” |
· Version | N2 Bytes | Major version – 1 or 2 Bytes |
· Dot-separator | 1 Byte | The character “.” |
· Subversion | N3 Bytes | Minor version – 1 or 2 Bytes. |
InputGNSSID | N4 Bytes | Identifier of GNSS Sensor. |
InputGNSSTimeSpaceAttributes | N5 Bytes | Total duration of Input GNSS Data. |
· Duration | N6 Bytes | Duration of GNSS Data Block. |
· SpatialAttitude | N7 Bytes | CAV’s Spatial Attitude when getting Data. |
InputGNSSData | N8 Bytes | |
· InputGNSSFormatID | N9 Bytes | Format ID of Input GNSS Data. |
· InputGNSSDataLength | N10 Bytes | Data Length of Input GNSS Data in Bytes. |
· InputGNSSDataURI | N11 Bytes | Location of Input GNSS Data. |
AudioObjectAttributes[] | N12 Bytes | |
· InputGNSSAttributeID | N13 Bytes | ID of Attribute of Input GNSS Data |
· GNSSAttributeFormatID | N14 Bytes | ID of Attribute Format of Input Audio Data |
· InputGNSSAttributeLength | N15 Bytes | Number of Bytes in Input GNSS Data |
· InputGNSSAttributeDataURI | N16 Bytes | URI of Data of Input GNSS Data |
1.1.6.4 Data Formats
Some data formats are:
- GPS Exchange Format (GPX): an XML schema providing a common GPS data format that can be used to describe waypoints, tracks, and routes.
- Environment Geodetic System (WGS): definition of the coordinate system’s fundamental and derived constants, the ellipsoidal (normal) Earth Gravitational Model (EGM), a description of the associated Environment Magnetic Model (WMM), and a current list of local datum transformations.
- International GNSS Service (IGS) SSR: format used to disseminate real-time products to support the IGS (igs.org) Real-Time Service. The messages support multi-GNSS and include corrections for orbits, clocks, DCBs, phase-biases and ionospheric delays. Extensions are planned to also cover satellite attitude, phase centre offsets and variations and group delay variations.
1.1.6.5 To Respondents
Respondents are requested to:
- Comment on the functional requirements.
- Propose Data Formats and Attributes for use in the future Technical Specification: Data Types, Formats, and Attributes (MPAI-TFA).
1.1.7 Offline Map Data
1.1.7.1 Definition
An Offline Map or HD map or 3D map is a roadmap with cm-level accuracy and a high environmental fidelity reporting the positions of pedestrian crossings, traffic lights/signs, barriers etc. at the time the Offline Map has been created.
1.1.7.2 Functional Requirements
The features of an Offline Data Format used by a CAV should consider the features of data formats such as:
- Navigation Data Standards calls itself “The Environment wide standard for map data in automotive eco-systems”. Their NDS specification covers data model, storage format, interfaces, and protocols.
- SharedStreets Referencing System calls itself a global non-proprietary system for describing streets.
1.1.7.3 Syntax
1.1.7.4 Semantics
Label | Size | Description |
Header | N1 Bytes | |
· Standard | 9 Bytes | The characters “CAV-OLM-V” |
· Version | N2 Bytes | Major version – 1 or 2 Bytes |
· Dot-separator | 1 Byte | The character “.” |
· Subversion | N3 Bytes | Minor version – 1 or 2 Bytes |
OffLineMapSourceID | N4 Bytes | Identifier of Offline Map. |
OffLineMapDataFormatID | N5 Bytes | Format ID of Offline Map Data |
OffLineMapData | N7 Bytes | Offline Map Data. |
DescrMetadata | N11 Bytes | Descriptive Metadata. |
1.1.7.5 Data Formats
Several Data Formats are used in practice.
1.1.7.6 To Respondents
Respondents are requested to:
- Comment on the functional requirements of the Offline Map Data Format to support the most common offline map formats.
- Propose Data Formats and Attributes for use in the future Technical Specification: Data Types, Formats, and Attributes (MPAI-TFA).
1.1.8 Audio-Visual Scene Descriptors
1.1.8.1 Definition
Scene is a Data Type representing the outcome of the process involving:
- A specific Environment Sensing Technology (EST).
- Sensed data (EST Data).
- Processing the EST Data to represent the environment with a Scene.
1.1.8.2 Functional Requirements
To the extent possible, a Scene created from Data of a specific EST should have a compatible format to facilitate the fusion of the individual EST-based Scenes into the Basic Environment Representation to be passed to the Autonomous Motion Subsystem.
The operation of the Environment Sensing Subsystem unfolds as follows:
- A given EST produces EST Data at discrete Δt time increments that depend on the EST operating frequency. Different ESTs may use different Δt values.
- EST-specific Data are passed to the EST-specific Scene Description AIM.
- An EST-specific Scene Description AIM produces EST-specific Scene Descriptors. These may have a complex data structure that includes several elementary Data Types each having their own Data Formats.
- EST-specific Scene Descriptors enable an object-based, time-dependent, and constantly updated Scene Descriptors that may contain Objects potentially with different resolutions, e.g., an object at 100m and another object at 10m may be represented with different spatial and temporal resolutions.
- Scene Descriptors#1 produced from EST#1 Data may include Data Types not included in Scene Descriptors#2 produced from EST#2 Data. However, the Environment Sensing Subsystem (ESS) Data Fusion AIM is cognisant of both Data Formats.
- Scene Descriptors#1 from EST#1 Data may not represent the environment with the same Accuracy as or may provide values that conflict with the environment representation provided by Scene Descriptors#2 from EST#2 Data.
- The format of the Offline Maps should allow for lossless transformation of its EST Data into Scene Descriptors without loss of information so as to enable the fusion of its Scene Descriptors into the Basic Environment Representation produced by the ESS Data Fusion AIM.
- EST Scene Descriptors SD(t) at time t are obtained from:
- Using sensed EST Data at time t and previously computed Scene Descriptors SD(t-Δt), SD(t-2Δt) etc.
- Updating the Objects inherited from preceding SDs.
- Removing objects present in previous SDs and no longer present in SDs(t).
- Adding and assigning attributes to new Objects, i.e., entirely new Objects, the merge of two or more Objects, or the splitting of a previously merged Object.
- SD(t) is a list objects detected and confirmed at time t with their attributes.
- EST Scene Description AIMs keep memory of past Scene Descriptors. Recent Objects may retain all attributes while Objects in a farther past may have coarser attributes or not be available at all.
- EST-specific Scene Descriptors allow for the description of Object using one of a limited number of MPAI-standardised formats:
- The coordinates of a centre of gravity of an Object.
- The Bounding Box of the Object.
- 2D Scene Objects
- Static environment:
- Parametric free space representation represented as a single object.
- Alternative representations as individual static objects.
- Dynamic environment: object-based representation.
- 5D Scene Objects
- Static components of the scene
- Grid-based (elevation maps or Stixel Environment), represented as a single object.
- Object-based for traffic poles and signals (e.g., Stixel Environment, Multi-level surface map).
- Object-based for the dynamic parts (e.g., Stixel Environment, Multi-level surface map).
- 3D (Volumetric) Scene Objects
- Static components of the scene
- Voxel grids, meshes, possibly as a single.
- Object-based for traffic poles and signals (voxel grids, meshes).
- Dynamic components of the scene (point clouds, voxel grids, meshes, …)
- An EST-specific Scene can contain Objects with different formats.
- At a given time that depends on the operating frequency of a specific EST, the Scene described by the Audio-Visual Scene Descriptors represents an EST-specific snapshot of the environment.
MPAI has developed specification of Audio Object, Visual Object, Audio-Visual Object, and of Audio Scene Descriptors, Visual Scene Descriptors, and Audio-Visual Scene Descriptors supporting the functional requirements identified above.
Here the Syntax and Semantics of the Audio-Visual Basic Scene Descriptors where the Scene is defined as composition of Objects is reported from [13]. Other Scene Descriptors can easily be derived from this.
1.1.8.3 Syntax
This is provided by 5.5.9.3.
1.1.8.4 Semantics
This is provided by 5.5.9.4.
1.1.8.5 Data Formats and Attributes
Traffic Signalisation Descriptors can be considered as Attributes of the Scene and its Objects.
1.1.8.6 To Respondents
Respondents are:
- Invited to comment on the functional requirements identified above and on the MPAI specifications that provide the information identified in 6.8.2.
- Requested to propose motivated extensions or new technologies.
- Requested to propose Traffic Signalisation Descriptors as Attributes.
1.1.9 Visual Scene Descriptors
The Visual Scene Description AIM
- Receives the Spatial Attitude from MAS
- Retrieves the current Spatial Attitude.
- Receives or retrieves a specified subset of a prior Basic Environment Representation
- Provides Visual Scene Descriptors, a machine-readable description of the Visual Scene’s:
- Spatial Attitudes of the Visual Objects.
- Visual Objects.
To Respondents
Respondents are requested to propose functional requirements of Visual Scene Descriptors that provide the information identified in 6.6.8.2.
1.1.10 Lidar Scene Descriptors
The LiDAR Scene Description AIM receives LiDAR Data, Spatial Attitude from MAS, and a portion of a prior Basic Environment Representation and provides LiDAR Scene Descriptors.
To Respondents
Respondents are requested to propose functional requirements of LiDAR Scene Descriptors that provide the information identified in 6.6.8.2.
1.1.11 RADAR Scene Descriptors
The RADAR Scene Description AIM receives RADAR Data, Spatial Attitude from MAS, and a portion of a prior Basic Environment Representation and provides RADAR Scene Descriptors.
To Respondents
Respondents are requested to propose functional requirements of RADAR Scene Descriptors that provide the information identified in 6.6.8.2.
1.1.12 Ultrasound Scene Descriptors
The Ultrasound Scene Description AIM receives Ultrasound Data, Spatial Attitude from MAS, and a portion of a prior Basic Environment Representation and provides Ultrasound Scene Descriptors.
To Respondents
Respondents are requested to propose functional requirements of Ultrasound Scene Descriptors that provide the information identified in 6.6.8.2.
1.1.13 Offline Maps Scene Descriptors
The Offline Map Scene Description AIM receives Offline Map Data, Spatial Attitude from MAS, and a portion of a prior Basic Environment Representation and provides Offline Map Scene Descriptors.
To Respondents
Respondents are requested to propose functional requirements of Offline Map Scene Descriptors that provide the information identified in 6.6.8.2.
1.1.14 Audio Scene Descriptors
The Audio Scene Description AIM receives Audio Data, Spatial Attitude from MAS, and a portion of a prior Basic Environment Representation and provides Audio Scene Descriptors.
To Respondents
Respondents are requested to propose functional requirements of Audio Scene Descriptors that provide the information identified in 6.6.8.2.
1.1.15 Traffic Signal Descriptors
1.1.15.1 Definition
The digital representation of the traffic signalisations used at a U-Location. For the sake of simplicity, it is assumed that Traffic Signal Descriptors are derived using Audio and Visual Scene Descriptors. The content of this Subsection can be easily extended to apply to Scene Descriptors of other Environment Sensing Technology Data.
1.1.15.2 Functional Requirements
Traffic Signal Descriptors include:
- Position and Orientation of the traffic audio and visual signals at the U-Location:
- Road signs
- Traffic signs
- Traffic lights
- Walkways
- Lanes
- Traffic sound
- Semantics of the traffic signals.
Traffic Signal Descriptors can be used as Attributes of the MPAI-specified Scene Descriptors.
1.1.15.3 Syntax
https://schemas.mpai.community/OSD/V1.1/data/AudioVisualSceneDescriptors.json
1.1.15.4 Semantics
Label | Size | Description |
Header | N1 Bytes | |
· Standard | 9 Bytes | The characters “MMM-TSD-V” |
· Version | N2 Bytes | Major version – 1 or 2 Bytes |
· Dot-separator | 1 Byte | The character “.” |
· Subversion | N3 Bytes | Minor version – 1 or 2 Bytes |
TrafficSignalConfigurationID | N4 Bytes | Identifier of TSD. |
TrafficSignalConfigurationData | N5 Bytes | |
· AVScene Descriptors | N6 Bytes | AV Scene Descriptors with added Object semantics (Traffic Signal Descriptors) |
DescrMetadata | N7 Bytes | Descriptive Metadata |
1.1.15.5 Data Types and Formats
Traffic Signal Descriptors are Attributes of the Audio-Visual Scene’s Objects.
1.1.15.6 To Respondents
Respondents are requested to:
- Comment on, extend, or reformulate the Functional Requirements.
- Comment on the use MPAI Object and Space Descriptors for Traffic Signal Descriptors needs.
- Propose alternative Traffic Signal Descriptors solutions.
1.1.16 Basic Environment Representation
1.1.16.1 Definition
Basic Environment Representation (BER) is the digital representation of the environment traversed by a CAV is called. The BER results from the integration of all data sensed by a CAV:
- Spatial information (e.g., GNSS, odometry).
- Audio-Visual Scene Descriptors obtained the fusion of EST-specific Scene Descriptors.
- Road Topology.
- Environmental data (e.g., weather, temperature, air pressure, ice and water on the road, wind, fog etc.).
1.1.16.2 Functional Requirements
The functional requirements of the BER format are:
- Includes all available information that enables the Autonomous Motion Subsystem (AMS) to define a Path to be executed in a Decision Horizon Time.
- Describes the Environment in terms of Scene Descriptors (including static objects, e.g., from Offline Maps) and Topology (e.g., roads and lanes).
- Enables object tracking, inference of motion vectors, etc. by referencing the BERs of sufficient prior snapshots.
- Describes each Object with the following attributes:
- This is specified as Start and End Time of the validity of the Object Description.
- Object Identifier. An Identifier is assigned to an Object that is retained until the Object disappears.
- AIM Identifier. This identifies the AIM that provided the initial Data used to represent the Object.
- Object Format ID. MPAI is identifying a set of Object Format specifications that enable unambiguous reference to an Object Format.
- Identifiers of parent Objects corresponding to the current Object.
- Identifier of a parent Object that has spawned more than one current Object.
- ID of spatially corresponding Object of different Type.
- Spatial Attitude of Object.
- Object dimensionality (2D, 2.5D and 3D). Applicable only to Visual Objects.
- Visual Object shape.
- Semantic relationship with other Objects, e.g., identification of groups of Objects (platoon). The components of a platoon may broadcast Platooning Information, or a CAV may be able to deduce it by observing the behaviour of a group of CAVs over a period.
- Accuracy of all Object values.
- Allows for easy verification of the feasibility of a Trajectory (e.g., the AMS can easily check that the intended Trajectory of the ego CAV designed to reach the intended point does not collide with other Visual Objects in the Decision Horizon based on the current state of the BER).
- Has a scalable representation, i.e., it allows for:
- Gradual refinement of a BER when new EST-specific Scene Descriptors are added.
- Extraction of part of the BER based on a required Level of Detail (e.g., Object bounding boxes and their Spatial Attitudes).
- Easy addition of new data (e.g., adding shape of an Object when there was only the bounding box).
- Fast access to Object metadata, e.g.:
- Spatial Attitude.
- Shape (e.g., bounding box for a Visual Object).
- Selected (read) access to data required by different AIMs, e.g., the RADAR Scene Description AIM accesses the current BER to improve its description.
- Easy update of Objects and Scene from one snapshot to another.
- Possibility that a CAV communicates a subset of its BER to another CAV. E.g., Objects have different degrees of details, starting from bounding boxes and their Position Attributes, depending on the available bandwidth.
1.1.16.3 Syntax
1.1.16.4 Semantics
1.1.16.5 Data Formats
1.1.16.6 To Respondents
Respondents are requested to:
- Explore the use of the MPAI Audio-Visual Scene Descriptors to support the Basic Environment Representation by adding the missing functionalities.
- Comment on the Functional Requirements.
1.1.17 Alert
1.1.17.1 Definition
1.1.17.2 Functional Requirements
1.1.17.3 Syntax
1.1.17.4 Semantics
1.1.17.5 Data Formats
1.1.17.6 To Respondents