CAV-TEC V1.0 Data Types: Environment Sensing Subsystem (ESS)

Create Audio Scene Descriptors to:
- Enable extraction of speech addressed by humans outside or inside the HCI.
- Incorporate outdoor Audio information into the Basic Environment Representation.
Suppress noise and individual sound sources outside the passenger cabin.

1.1.1.2 Functional Requirements

Microphone (arrays) are used to capture the sound both outdoor and indoor for the purpose of creating Audio Scene Description to:

Provide the location of sound sources.
Enable extraction of speech addressed to CAV by humans.
Remove unwanted noise from the passenger cabin.
Incorporate Audio information into the Basic Environment Representation.

MPAI has developed specifications for Multichannel Audio, Multichannel Audio Stream, and Microphone Array Geometry.

1.1.1.3 Syntax

https://schemas.mpai.community/CAE1/V2.2/data/AudioFormatID.json

https://schemas.mpai.community/OSD/V1.1/data/SpaceTime.json

1.1.1.4 Semantics

Label	Size	Description
Header	N1 Bytes
· Standard	9 Bytes	The characters “CAV-IAU-V”
· Version	N2 Bytes	Major version – 1 or 2 Bytes
· Dot-separator	1 Byte	The character “.”
· Subversion	N3 Bytes	Minor version – 1 or 2 Bytes.
InputAudioID	N4 Bytes	Identifier of LiDAR Sensor.
InputAudioTimeSpaceAttributes	N5 Bytes	Time and Space of Input Audio Data.
InputAudioData	N6 Bytes
· AudioFormatID	N7 Bytes	Format ID of Input Audio Data.
· InputAudioDataLength	N8 Bytes	Data Length of Input Audio Data in Bytes.
· InputAudioDataURI	N9 Bytes	Location of Input Audio Data.
InputAudioAttributes[]	N10 Bytes
· AudioAttributeID	N11 Bytes	ID of Attribute of Input Audio Data
· AudioAttributeFormatID	N12 Bytes	ID of Attribute Format of Input Audio Data
· InputAudioAttributeLength	N13 Bytes	Number of Bytes in Input Audio Attribute Data
· InputAudioAttributeDataURI	N14 Bytes	URI of Data of Input Audio Attribute Data
DescrMetadata	N1 Bytes	Descriptive Metadata

1.1.1.5 Data Formats

Input Audio requires:

Audio Format.
Audio Attribute Format.

1.1.1.6 To Respondents

Respondents are invited to:

Comment or elaborate on the relevance and applicability of the above-mentioned three standards to CAV.
Comment on the Functional Requirements.
Propose motivated Functional Requirements for an Audio Array Format suitable to create a 3D sound field representation of the Environment for the stated purposes.
Propose Data Formats and Attributes for use in the future Technical Specification: Data Types, Formats, and Attributes (MPAI-TFA).

1.1.2 Input Visual

1.1.2.1 Definition

Digital representation of information captured in the visible range of the electromagnetic field from single camera or an array of cameras.

1.1.2.2 Functional Requirements

A visual scene can be captured by an array of visual sensors characterised by:

Number and position of sensing devices.
Number of horizontal and vertical sensors in a sensing device.
Frame frequency.
Colour space.
Bit-depth information.
Depth (distance from scene pixel) information.

Captured Data

Provide pixel value, time and potentially depth.
Are used to:
1. Provide the position and orientation of individual visual objects.
2. Provide Visual Scene Descriptors.
3. Enable identification, tracking and representation of relevant visual objects, including humans.

1.1.2.3 Syntax

https://schemas.mpai.community/PAF/V1.2/data/VisualFormatID.json

https://schemas.mpai.community/OSD/V1.1/data/SpaceTime.json

1.1.2.4 Semantics

Label	Size	Description
Header	N1 Bytes
· Standard	8 Bytes	The characters “CAV-IVI-V”
· Version	N2 Bytes	Major version – 1 or 2 Bytes
· Dot-separator	1 Byte	The character “.”
· Subversion	N3 Bytes	Minor version – 1 or 2 Bytes.
InputVisualID	N5 Bytes	Identifier of LiDAR Sensor.
InputVisualTimeSpaceAttributes	N6 Bytes	Total duration of Input Visual Data.
InputVisualData	N7 Bytes
· InputVisualFormatID	N8 Bytes	Format ID of Input Visual Data.
· InputVisualDataLength	N9 Bytes	Data Length of Input Visual Data in Bytes.
· InputVisualDataURI	N10 Bytes	Location of Input Visual Data.
AudioObjectAttributes[]	N11 Bytes
· InputVisualAttributeID	N12 Bytes	ID of Attribute of Input Visual Data
· InputVisualFormatAttributeID	N13 Bytes	ID of Attribute Format of Input Visual Data
· InputVisualAttributeLength	N14 Bytes	Number of Bytes in Input Visual Data
· InputVisualAttributeDataURI	N15 Bytes	URI of Data of Input Visual Data
DescrMetadata	N16 Bytes	Descriptive Metadata

1.1.2.5 Data Types

Input Visual required the following Formats:

Visual Format
Visual Attribute
Visual Attribute Format

1.1.2.6 To Respondents

Respondents are invited to:

Comment Functional Requirements or propose new ones.
Comment on and propose formats (2D, 2D+ depth, or 3D visual sensors) use in the future Technical Specification: Data Types, Formats, and Attributes (MPAI-TFA).

1.1.3 Input RADAR

1.1.3.1 Definition

Data produced by a “time-of-flight”-based active sensor called Radio Detection and Ranging (RADAR) able to measure the distance and speed of objects from the time it takes for a signal emitted by the sensor to hit an object and be reflected.

RADAR operates in the mm range. It can detect vehicles (CAVs and trucks) because they typically reflect radar signals while smaller and less reflecting objects, e.g., pedestrians and motorcycles have a poor reflectance. In a busy environment, the reflections of big vehicles can overcome a motorcycle’s causing missed detection of important objects (e.g., a human next to a vehicle), while a can may produce an image out of proportion to its size.

1.1.3.2 Functional Requirements

The main features of Radar Data are:

Ability to detect objects and measure speed @ ≤ 250 m (long range radar in the 76-77 GHz).
Ability to provide a resolution of ~25-cm radial and ~1.5 degrees angular.
Suitability to measure distance (short range radar in the 25 GHz band).

1.1.3.3 Syntax

https://schemas.mpai.community/CAV2/V1.0/data/RADARFormatID.json

https://schemas.mpai.community/OSD/V1.1/data/SpaceTime.json”

1.1.3.4 Semantics

Label	Size	Description
Header	N1 Bytes
· Standard	9 Bytes	The characters “CAV-IRA-V”
· Version	N2 Bytes	Major version – 1 or 2 Bytes
· Dot-separator	1 Byte	The character “.”
· Subversion	N3 Bytes	Minor version – 1 or 2 Bytes.
InputRADARID	N5 Bytes	Identifier of RADAR Sensor.
InputRADARTimeSpaceAttributes	N6 Bytes	Total duration of Input RADAR Data.
InputRADARData	N7 Bytes
· InputRADARFormatID	N8 Bytes	Format ID of Input RADAR Data.
· InputRADARDataLength	N9 Bytes	Data Length of Input RADAR Data in Bytes.
· InputRADARDataURI	N10 Bytes	Location of Input RADAR Data.
AudioObjectAttributes[]	N11 Bytes
· InputRADARAttributeID	N12 Bytes	ID of Attribute of Input RADAR Data
· RADARAttributeFormatID	N13 Bytes	ID of Attribute Format of Input Audio Data
· InputRADARAttributeLength	N14 Bytes	Number of Bytes in Input RADAR Data
· InputRADARAttributeDataURI	N15 Bytes	URI of Data of Input RADAR Data

1.1.3.5 Data Formats

Input RADAR requires:

RADAR Format.
RADAR Attribute Format.

1.1.3.6 To Respondents

Respondents are invited to:

Identify functional requirements of the output data produced by RADAR sensors for indoor/outdoor (cabin) use.
Propose Data Formats and Attributes for use in the future Technical Specification: Data Types, Formats, and Attributes (MPAI-TFA).

1.1.4 Input LiDAR

1.1.4.1 Definition

Data produced by a “time-of-flight”-based active sensor called Light Detection and Ranging (LiDAR) able to measure the distance and speed of objects from the time it takes for a signal emitted by the sensor to hit an object and be reflected.

1.1.4.2 Functional Requirements

Produces the distance of a voxel from the sensor, its grayscale by the intensity variation of the reflected light, its colour by using more than one wavelength, its velocity by using the Doppler shift in frequency caused by motion, or by taking the position at different times with an angular resolution ~0.1º vertical and ~1º horizontal with a maximum field capture ~40º vertical.

1.1.4.3 Syntax

https://schemas.mpai.community/CAV2/V1.0/data/LiDARFormatID.json

https://schemas.mpai.community/OSD/V1.1/data/SpaceTime.json

1.1.4.4 Semantics

Label	Size	Description
Header	N1 Bytes
· Standard	9 Bytes	The characters “CAV-ILI-V”
· Version	N2 Bytes	Major version – 1 or 2 Bytes
· Dot-separator	1 Byte	The character “.”
· Subversion	N3 Bytes	Minor version – 1 or 2 Bytes.
InputLiDARID	N4 Bytes	Identifier of LiDAR Sensor.
InputLiDARTimeSpaceAttributes	N5 Bytes	Total duration of Input LiDAR Data.
· Duration	N6 Bytes	Duration of LiDAR Data Block.
· SpatialAttitude	N7 Bytes	CAV’s Spatial Attitude when getting Data.
InputLiDARData	N8 Bytes
· InputLiDARFormatID	N9 Bytes	Format ID of Input LiDAR Data.
· InputLiDARDataLength	N10 Bytes	Data Length of Input LiDAR Data in Bytes.
· InputLiDARDataURI	N11 Bytes	Location of Input LiDAR Data.
AudioObjectAttributes[]	N12 Bytes
· InputLiDARAttributeID	N13 Bytes	ID of Attribute of Input LiDAR Data
· LiDARAttributeFormatID	N14 Bytes	ID of Attribute Format of Input Audio Data
· InputLiDARAttributeLength	N15 Bytes	Number of Bytes in Input LiDAR Data
· InputLiDARAttributeDataURI	N16 Bytes	URI of Data of Input LiDAR Data

1.1.4.5 Data Formats

Input LiDAR requires:

LiDAR Format.
LiDAR Attribute Format.

1.1.4.6 To Respondents

Respondents are invited to:

Comment or extend the functional requirements of the data produced by LiDAR sensors for indoor/outdoor (cabin) use.
Propose Data Formats and Attributes for use in the future Technical Specification: Data Types, Formats, and Attributes (MPAI-TFA).

1.1.5 Input Ultrasound

1.1.5.1 Definition

A Data Type representing analogue signals generated by information captured by an ultrasonic sensor.

1.1.5.2 Functional Requirements

An active time-of-flight sensor typically operating in the 40 kHz to 250 kHz range.

The main features of Ultrasound are:

Ability to monitor the immediate surroundings of the vehicle (≤ 10 m).
Operation frequency above 30 kHz.
Low-resolution images.

1.1.5.3 Syntax

https://schemas.mpai.community/CAV2/V1.0/data/UltrasoundFormatID.json

https://schemas.mpai.community/OSD/V1.1/data/SpaceTime.json

1.1.5.4 Semantics

Label	Size	Description
Header	N1 Bytes
· Standard	9 Bytes	The characters “CAV-IUS-V”
· Version	N2 Bytes	Major version – 1 or 2 Bytes
· Dot-separator	1 Byte	The character “.”
· Subversion	N3 Bytes	Minor version – 1 or 2 Bytes.
InputUltrasoundID	N4 Bytes	Identifier of Ultrasound Sensor.
InputUltrasoundTimeSpaceAttributes	N5 Bytes	Total duration of Input Ultrasound Data.
· Duration	N6 Bytes	Duration of Ultrasound Data Block.
· SpatialAttitude	N7 Bytes	CAV’s Spatial Attitude when getting Data.
InputUltrasoundData	N8 Bytes
· InputUltrasoundFormatID	N9 Bytes	Format ID of Input Ultrasound Data.
· InputUltrasoundDataLength	N10 Bytes	Data Length of Input Ultrasound Data in Bytes.
· InputUltrasoundDataURI	N11 Bytes	Location of Input Ultrasound Data.
AudioObjectAttributes[]	N12 Bytes
· InputUltrasoundAttributeID	N13 Bytes	ID of Attribute of Input Ultrasound Data
· UltrasoundAttributeFormatID	N14 Bytes	ID of Attribute Format of Input Audio Data
· InputUltrasoundAttributeLength	N15 Bytes	Number of Bytes in Input Ultrasound Data
· InputUltrasoundAttributeDataURI	N16 Bytes	URI of Data of Input Ultrasound Data

1.1.5.5 Data Formats

Ultrasound Data Formats are required.

1.1.5.6 To Respondents

Respondents are invited to:

Comment or elaborate on the functional requirements of Ultrasound images formats with the goal of achieving tracking and representation of objects for the Ultrasound Scene Description.
Propose Data Formats and Attributes for use in the future Technical Specification: Data Types, Formats, and Attributes (MPAI-TFA).

1.1.6 GNSS Data

Global Navigation Satellite System (GNSS) Data for a constellation of satellites that transmit positioning and timing data to GNSS receivers to determine their location.

1.1.6.1 Functional Requirements

GNSS Data can come from four GNSSs – GPS (US), GLONASS (RU), Galileo (EU), BeiDou (CN) and two regional systems – QZSS (Japan) and IRNSS or NavIC (India). Position accuracy depends on the GNSS system.

1.1.6.2 Syntax

https://schemas.mpai.community/CAV2/V1.0/data/GNSSFormatID.json

https://schemas.mpai.community/OSD/V1.1/data/SpaceTime.json

1.1.6.3 Semantics

Label	Size	Description
Header	N1 Bytes
· Standard	9 Bytes	The characters “CAV-IGN-V”
· Version	N2 Bytes	Major version – 1 or 2 Bytes
· Dot-separator	1 Byte	The character “.”
· Subversion	N3 Bytes	Minor version – 1 or 2 Bytes.
InputGNSSID	N4 Bytes	Identifier of GNSS Sensor.
InputGNSSTimeSpaceAttributes	N5 Bytes	Total duration of Input GNSS Data.
· Duration	N6 Bytes	Duration of GNSS Data Block.
· SpatialAttitude	N7 Bytes	CAV’s Spatial Attitude when getting Data.
InputGNSSData	N8 Bytes
· InputGNSSFormatID	N9 Bytes	Format ID of Input GNSS Data.
· InputGNSSDataLength	N10 Bytes	Data Length of Input GNSS Data in Bytes.
· InputGNSSDataURI	N11 Bytes	Location of Input GNSS Data.
AudioObjectAttributes[]	N12 Bytes
· InputGNSSAttributeID	N13 Bytes	ID of Attribute of Input GNSS Data
· GNSSAttributeFormatID	N14 Bytes	ID of Attribute Format of Input Audio Data
· InputGNSSAttributeLength	N15 Bytes	Number of Bytes in Input GNSS Data
· InputGNSSAttributeDataURI	N16 Bytes	URI of Data of Input GNSS Data

1.1.6.4 Data Formats

Some data formats are:

GPS Exchange Format (GPX): an XML schema providing a common GPS data format that can be used to describe waypoints, tracks, and routes.
Environment Geodetic System (WGS): definition of the coordinate system’s fundamental and derived constants, the ellipsoidal (normal) Earth Gravitational Model (EGM), a description of the associated Environment Magnetic Model (WMM), and a current list of local datum transformations.
International GNSS Service (IGS) SSR: format used to disseminate real-time products to support the IGS (igs.org) Real-Time Service. The messages support multi-GNSS and include corrections for orbits, clocks, DCBs, phase-biases and ionospheric delays. Extensions are planned to also cover satellite attitude, phase centre offsets and variations and group delay variations.

1.1.6.5 To Respondents

Respondents are requested to:

Comment on the functional requirements.
Propose Data Formats and Attributes for use in the future Technical Specification: Data Types, Formats, and Attributes (MPAI-TFA).

1.1.7 Offline Map Data

1.1.7.1 Definition

An Offline Map or HD map or 3D map is a roadmap with cm-level accuracy and a high environmental fidelity reporting the positions of pedestrian crossings, traffic lights/signs, barriers etc. at the time the Offline Map has been created.

1.1.7.2 Functional Requirements

The features of an Offline Data Format used by a CAV should consider the features of data formats such as:

Navigation Data Standards calls itself “The Environment wide standard for map data in automotive eco-systems”. Their NDS specification covers data model, storage format, interfaces, and protocols.
SharedStreets Referencing System calls itself a global non-proprietary system for describing streets.

1.1.7.3 Syntax

1.1.7.4 Semantics

Label	Size	Description
Header	N1 Bytes
· Standard	9 Bytes	The characters “CAV-OLM-V”
· Version	N2 Bytes	Major version – 1 or 2 Bytes
· Dot-separator	1 Byte	The character “.”
· Subversion	N3 Bytes	Minor version – 1 or 2 Bytes
OffLineMapSourceID	N4 Bytes	Identifier of Offline Map.
OffLineMapDataFormatID	N5 Bytes	Format ID of Offline Map Data
OffLineMapData	N7 Bytes	Offline Map Data.
DescrMetadata	N11 Bytes	Descriptive Metadata.

1.1.7.5 Data Formats

Several Data Formats are used in practice.

1.1.7.6 To Respondents

Respondents are requested to:

Comment on the functional requirements of the Offline Map Data Format to support the most common offline map formats.
Propose Data Formats and Attributes for use in the future Technical Specification: Data Types, Formats, and Attributes (MPAI-TFA).

1.1.8 Audio-Visual Scene Descriptors

1.1.8.1 Definition

Scene is a Data Type representing the outcome of the process involving:

A specific Environment Sensing Technology (EST).
Sensed data (EST Data).
Processing the EST Data to represent the environment with a Scene.

1.1.8.2 Functional Requirements

To the extent possible, a Scene created from Data of a specific EST should have a compatible format to facilitate the fusion of the individual EST-based Scenes into the Basic Environment Representation to be passed to the Autonomous Motion Subsystem.

The operation of the Environment Sensing Subsystem unfolds as follows:

A given EST produces EST Data at discrete Δt time increments that depend on the EST operating frequency. Different ESTs may use different Δt values.
EST-specific Data are passed to the EST-specific Scene Description AIM.
An EST-specific Scene Description AIM produces EST-specific Scene Descriptors. These may have a complex data structure that includes several elementary Data Types each having their own Data Formats.
EST-specific Scene Descriptors enable an object-based, time-dependent, and constantly updated Scene Descriptors that may contain Objects potentially with different resolutions, e.g., an object at 100m and another object at 10m may be represented with different spatial and temporal resolutions.
Scene Descriptors#1 produced from EST#1 Data may include Data Types not included in Scene Descriptors#2 produced from EST#2 Data. However, the Environment Sensing Subsystem (ESS) Data Fusion AIM is cognisant of both Data Formats.
Scene Descriptors#1 from EST#1 Data may not represent the environment with the same Accuracy as or may provide values that conflict with the environment representation provided by Scene Descriptors#2 from EST#2 Data.
The format of the Offline Maps should allow for lossless transformation of its EST Data into Scene Descriptors without loss of information so as to enable the fusion of its Scene Descriptors into the Basic Environment Representation produced by the ESS Data Fusion AIM.
EST Scene Descriptors SD(t) at time t are obtained from:
- Using sensed EST Data at time t and previously computed Scene Descriptors SD(t-Δt), SD(t-2Δt) etc.
- Updating the Objects inherited from preceding SDs.
- Removing objects present in previous SDs and no longer present in SDs(t).
- Adding and assigning attributes to new Objects, i.e., entirely new Objects, the merge of two or more Objects, or the splitting of a previously merged Object.
SD(t) is a list objects detected and confirmed at time t with their attributes.
EST Scene Description AIMs keep memory of past Scene Descriptors. Recent Objects may retain all attributes while Objects in a farther past may have coarser attributes or not be available at all.
EST-specific Scene Descriptors allow for the description of Object using one of a limited number of MPAI-standardised formats:
- The coordinates of a centre of gravity of an Object.
- The Bounding Box of the Object.
- 2D Scene Objects
  - Static environment:
  - Parametric free space representation represented as a single object.
  - Alternative representations as individual static objects.
  - Dynamic environment: object-based representation.
- 5D Scene Objects
  - Static components of the scene
  - Grid-based (elevation maps or Stixel Environment), represented as a single object.
  - Object-based for traffic poles and signals (e.g., Stixel Environment, Multi-level surface map).
  - Object-based for the dynamic parts (e.g., Stixel Environment, Multi-level surface map).
- 3D (Volumetric) Scene Objects
  - Static components of the scene
  - Voxel grids, meshes, possibly as a single.
  - Object-based for traffic poles and signals (voxel grids, meshes).
  - Dynamic components of the scene (point clouds, voxel grids, meshes, …)

An EST-specific Scene can contain Objects with different formats.
At a given time that depends on the operating frequency of a specific EST, the Scene described by the Audio-Visual Scene Descriptors represents an EST-specific snapshot of the environment.

MPAI has developed specification of Audio Object, Visual Object, Audio-Visual Object, and of Audio Scene Descriptors, Visual Scene Descriptors, and Audio-Visual Scene Descriptors supporting the functional requirements identified above.

Here the Syntax and Semantics of the Audio-Visual Basic Scene Descriptors where the Scene is defined as composition of Objects is reported from [13]. Other Scene Descriptors can easily be derived from this.

1.1.8.3 Syntax

This is provided by 5.5.9.3.

1.1.8.4 Semantics

This is provided by 5.5.9.4.

1.1.8.5 Data Formats and Attributes

Traffic Signalisation Descriptors can be considered as Attributes of the Scene and its Objects.

1.1.8.6 To Respondents

Respondents are:

Invited to comment on the functional requirements identified above and on the MPAI specifications that provide the information identified in 6.8.2.
Requested to propose motivated extensions or new technologies.
Requested to propose Traffic Signalisation Descriptors as Attributes.

1.1.9 Visual Scene Descriptors

The Visual Scene Description AIM

Receives the Spatial Attitude from MAS
Retrieves the current Spatial Attitude.
Receives or retrieves a specified subset of a prior Basic Environment Representation
Provides Visual Scene Descriptors, a machine-readable description of the Visual Scene’s:
Spatial Attitudes of the Visual Objects.
Visual Objects.

To Respondents

Respondents are requested to propose functional requirements of Visual Scene Descriptors that provide the information identified in 6.6.8.2.

1.1.10 Lidar Scene Descriptors

The LiDAR Scene Description AIM receives LiDAR Data, Spatial Attitude from MAS, and a portion of a prior Basic Environment Representation and provides LiDAR Scene Descriptors.

To Respondents

Respondents are requested to propose functional requirements of LiDAR Scene Descriptors that provide the information identified in 6.6.8.2.

1.1.11 RADAR Scene Descriptors

The RADAR Scene Description AIM receives RADAR Data, Spatial Attitude from MAS, and a portion of a prior Basic Environment Representation and provides RADAR Scene Descriptors.

To Respondents

Respondents are requested to propose functional requirements of RADAR Scene Descriptors that provide the information identified in 6.6.8.2.

1.1.12 Ultrasound Scene Descriptors

The Ultrasound Scene Description AIM receives Ultrasound Data, Spatial Attitude from MAS, and a portion of a prior Basic Environment Representation and provides Ultrasound Scene Descriptors.

To Respondents

Respondents are requested to propose functional requirements of Ultrasound Scene Descriptors that provide the information identified in 6.6.8.2.

1.1.13 Offline Maps Scene Descriptors

The Offline Map Scene Description AIM receives Offline Map Data, Spatial Attitude from MAS, and a portion of a prior Basic Environment Representation and provides Offline Map Scene Descriptors.

To Respondents

Respondents are requested to propose functional requirements of Offline Map Scene Descriptors that provide the information identified in 6.6.8.2.

1.1.14 Audio Scene Descriptors

The Audio Scene Description AIM receives Audio Data, Spatial Attitude from MAS, and a portion of a prior Basic Environment Representation and provides Audio Scene Descriptors.

To Respondents

Respondents are requested to propose functional requirements of Audio Scene Descriptors that provide the information identified in 6.6.8.2.

1.1.15 Traffic Signal Descriptors

1.1.15.1 Definition

The digital representation of the traffic signalisations used at a U-Location. For the sake of simplicity, it is assumed that Traffic Signal Descriptors are derived using Audio and Visual Scene Descriptors. The content of this Subsection can be easily extended to apply to Scene Descriptors of other Environment Sensing Technology Data.

1.1.15.2 Functional Requirements

Traffic Signal Descriptors include:

Position and Orientation of the traffic audio and visual signals at the U-Location:
- Road signs
- Traffic signs
- Traffic lights
- Walkways
- Lanes
- Traffic sound
Semantics of the traffic signals.

Traffic Signal Descriptors can be used as Attributes of the MPAI-specified Scene Descriptors.

1.1.15.3 Syntax

https://schemas.mpai.community/OSD/V1.1/data/AudioVisualSceneDescriptors.json

1.1.15.4 Semantics

Label	Size	Description
Header	N1 Bytes
· Standard	9 Bytes	The characters “MMM-TSD-V”
· Version	N2 Bytes	Major version – 1 or 2 Bytes
· Dot-separator	1 Byte	The character “.”
· Subversion	N3 Bytes	Minor version – 1 or 2 Bytes
TrafficSignalConfigurationID	N4 Bytes	Identifier of TSD.
TrafficSignalConfigurationData	N5 Bytes
· AVScene Descriptors	N6 Bytes	AV Scene Descriptors with added Object semantics (Traffic Signal Descriptors)
DescrMetadata	N7 Bytes	Descriptive Metadata

1.1.15.5 Data Types and Formats

Traffic Signal Descriptors are Attributes of the Audio-Visual Scene’s Objects.

1.1.15.6 To Respondents

Respondents are requested to:

Comment on, extend, or reformulate the Functional Requirements.
Comment on the use MPAI Object and Space Descriptors for Traffic Signal Descriptors needs.
Propose alternative Traffic Signal Descriptors solutions.

1.1.16 Basic Environment Representation

1.1.16.1 Definition

Basic Environment Representation (BER) is the digital representation of the environment traversed by a CAV is called. The BER results from the integration of all data sensed by a CAV:

Spatial information (e.g., GNSS, odometry).
Audio-Visual Scene Descriptors obtained the fusion of EST-specific Scene Descriptors.
Road Topology.
Environmental data (e.g., weather, temperature, air pressure, ice and water on the road, wind, fog etc.).

1.1.16.2 Functional Requirements

The functional requirements of the BER format are:

Includes all available information that enables the Autonomous Motion Subsystem (AMS) to define a Path to be executed in a Decision Horizon Time.
Describes the Environment in terms of Scene Descriptors (including static objects, e.g., from Offline Maps) and Topology (e.g., roads and lanes).
Enables object tracking, inference of motion vectors, etc. by referencing the BERs of sufficient prior snapshots.
Describes each Object with the following attributes:
- This is specified as Start and End Time of the validity of the Object Description.
- Object Identifier. An Identifier is assigned to an Object that is retained until the Object disappears.
- AIM Identifier. This identifies the AIM that provided the initial Data used to represent the Object.
- Object Format ID. MPAI is identifying a set of Object Format specifications that enable unambiguous reference to an Object Format.
- Identifiers of parent Objects corresponding to the current Object.
- Identifier of a parent Object that has spawned more than one current Object.
- ID of spatially corresponding Object of different Type.
- Spatial Attitude of Object.
- Object dimensionality (2D, 2.5D and 3D). Applicable only to Visual Objects.
- Visual Object shape.
- Semantic relationship with other Objects, e.g., identification of groups of Objects (platoon). The components of a platoon may broadcast Platooning Information, or a CAV may be able to deduce it by observing the behaviour of a group of CAVs over a period.
- Accuracy of all Object values.
Allows for easy verification of the feasibility of a Trajectory (e.g., the AMS can easily check that the intended Trajectory of the ego CAV designed to reach the intended point does not collide with other Visual Objects in the Decision Horizon based on the current state of the BER).
Has a scalable representation, i.e., it allows for:
- Gradual refinement of a BER when new EST-specific Scene Descriptors are added.
- Extraction of part of the BER based on a required Level of Detail (e.g., Object bounding boxes and their Spatial Attitudes).
- Easy addition of new data (e.g., adding shape of an Object when there was only the bounding box).
- Fast access to Object metadata, e.g.:
  - Spatial Attitude.
  - Shape (e.g., bounding box for a Visual Object).
- Selected (read) access to data required by different AIMs, e.g., the RADAR Scene Description AIM accesses the current BER to improve its description.
- Easy update of Objects and Scene from one snapshot to another.
- Possibility that a CAV communicates a subset of its BER to another CAV. E.g., Objects have different degrees of details, starting from bounding boxes and their Position Attributes, depending on the available bandwidth.

1.1.16.3 Syntax

1.1.16.4 Semantics

1.1.16.5 Data Formats

1.1.16.6 To Respondents

Respondents are requested to:

Explore the use of the MPAI Audio-Visual Scene Descriptors to support the Basic Environment Representation by adding the missing functionalities.
Comment on the Functional Requirements.

Cookie	Duration	Description
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Technical".
CookieLawInfoConsent	1 year	The cookie is set by the GDPR Cookie Consent plug-in and is used to store whether the user has consented to the use of cookies or not. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_pk_id.6.08a8	13 months	Used to store a few details about the user such as the unique visitor ID
_pk_ses.6.08a8	30 minutes	Short lived cookies used to temporarily store data for the visit

CAV-TEC V1.0 Data Types: Environment Sensing Subsystem (ESS)

1.1.1 Input Audio

1.1.1.1 Definition

1.1.1.2 Functional Requirements

1.1.1.3 Syntax

1.1.1.4 Semantics

1.1.1.5 Data Formats

1.1.1.6 To Respondents

1.1.2 Input Visual

1.1.2.1 Definition

1.1.2.2 Functional Requirements

1.1.2.3 Syntax

1.1.2.4 Semantics

1.1.2.5 Data Types

1.1.2.6 To Respondents

1.1.3 Input RADAR

1.1.3.1 Definition

1.1.3.2 Functional Requirements

1.1.3.3 Syntax

1.1.3.4 Semantics

1.1.3.5 Data Formats

1.1.3.6 To Respondents

1.1.4 Input LiDAR

1.1.4.1 Definition

1.1.4.2 Functional Requirements

1.1.4.3 Syntax

1.1.4.4 Semantics

1.1.4.5 Data Formats

1.1.4.6 To Respondents

1.1.5 Input Ultrasound

1.1.5.1 Definition

1.1.5.2 Functional Requirements

1.1.5.3 Syntax

1.1.5.4 Semantics

1.1.5.5 Data Formats

1.1.5.6 To Respondents

1.1.6 GNSS Data

1.1.6.1 Functional Requirements

1.1.6.2 Syntax

1.1.6.3 Semantics

1.1.6.4 Data Formats

1.1.6.5 To Respondents

1.1.7 Offline Map Data

1.1.7.1 Definition

1.1.7.2 Functional Requirements

1.1.7.3 Syntax

1.1.7.4 Semantics

1.1.7.5 Data Formats

1.1.7.6 To Respondents

1.1.8 Audio-Visual Scene Descriptors

1.1.8.1 Definition

1.1.8.2 Functional Requirements

1.1.8.3 Syntax

1.1.8.4 Semantics

1.1.8.5 Data Formats and Attributes

1.1.8.6 To Respondents

1.1.9 Visual Scene Descriptors

1.1.10 Lidar Scene Descriptors

1.1.11 RADAR Scene Descriptors

1.1.12 Ultrasound Scene Descriptors

1.1.13 Offline Maps Scene Descriptors

1.1.14 Audio Scene Descriptors

1.1.15 Traffic Signal Descriptors

1.1.15.1 Definition

1.1.15.2 Functional Requirements

1.1.15.3 Syntax

1.1.15.4 Semantics

1.1.15.5 Data Types and Formats

1.1.15.6 To Respondents

1.1.16 Basic Environment Representation

1.1.16.1 Definition

1.1.16.2 Functional Requirements

1.1.16.3 Syntax

1.1.16.4 Semantics

1.1.16.5 Data Formats

1.1.16.6 To Respondents

1.1.17 Alert

1.1.17.1 Definition

1.1.17.2 Functional Requirements

1.1.17.3 Syntax