1       Definition

Scene is a Data Type representing the outcome of the process involving:

  1. A specific Environment Sensing Technology (EST).
  2. Sensed data (EST Data).
  3. Processing the EST Data to represent the environment with a Scene.

2       Functional Requirements

To the extent possible, a Scene created from Data of a specific EST should have a compatible format to facilitate the fusion of the individual EST-based Scenes into the Basic Environment Representation to be passed to the Autonomous Motion Subsystem.

The operation of the Environment Sensing Subsystem unfolds as follows:

  1. A given EST produces EST Data at discrete Δt time increments that depend on the EST operating frequency. Different ESTs may use different Δt values.
  2. EST-specific Data are passed to the EST-specific Scene Description AIM.
  3. An EST-specific Scene Description AIM produces EST-specific Scene Descriptors. These may have a complex data structure that includes several elementary Data Types each having their own Data Formats.
  4. EST-specific Scene Descriptors enable an object-based, time-dependent, and constantly updated Scene Descriptors that may contain Objects potentially with different resolutions, e.g., an object at 100m and another object at 10m may be represented with different spatial and temporal resolutions.
  5. Scene Descriptors#1 produced from EST#1 Data may include Data Types not included in Scene Descriptors#2 produced from EST#2 Data. However, the Environment Sensing Subsystem (ESS) Data Fusion AIM is cognisant of both Data Formats.
  6. Scene Descriptors#1 from EST#1 Data may not represent the environment with the same Accuracy as or may provide values that conflict with the environment representation provided by Scene Descriptors#2 from EST#2 Data.
  7. The format of the Offline Maps should allow for lossless transformation of its EST Data into Scene Descriptors without loss of information so as to enable the fusion of its Scene Descriptors into the Basic Environment Representation produced by the ESS Data Fusion AIM.
  8. EST Scene Descriptors SD(t) at time t are obtained from:
    • Using sensed EST Data at time t and previously computed Scene Descriptors SD(t-Δt), SD(t-2Δt) etc.
    • Updating the Objects inherited from preceding SDs.
    • Removing objects present in previous SDs and no longer present in SDs(t).
    • Adding and assigning attributes to new Objects, i.e., entirely new Objects, the merge of two or more Objects, or the splitting of a previously merged Object.
  9. SD(t) is a list objects detected and confirmed at time t with their attributes.
  10. EST Scene Description AIMs keep memory of past Scene Descriptors. Recent Objects may retain all attributes while Objects in a farther past may have coarser attributes or not be available at all.
  11. EST-specific Scene Descriptors allow for the description of Object using one of a limited number of MPAI-standardised formats:
    • The coordinates of a centre of gravity of an Object.
    • The Bounding Box of the Object.
    • 2D Scene Objects
      • Static environment:
      • Parametric free space representation represented as a single object.
      • Alternative representations as individual static objects.
      • Dynamic environment: object-based representation.
    • 5D Scene Objects
      • Static components of the scene
      • Grid-based (elevation maps or Stixel Environment), represented as a single object.
      • Object-based for traffic poles and signals (e.g., Stixel Environment, Multi-level surface map).
      • Object-based for the dynamic parts (e.g., Stixel Environment, Multi-level surface map).
    • 3D (Volumetric) Scene Objects
      • Static components of the scene
      • Voxel grids, meshes, possibly as a single.
      • Object-based for traffic poles and signals (voxel grids, meshes).
      • Dynamic components of the scene (point clouds, voxel grids, meshes, …)
  1. An EST-specific Scene can contain Objects with different formats.
  2. At a given time that depends on the operating frequency of a specific EST, the Scene described by the Audio-Visual Scene Descriptors represents an EST-specific snapshot of the environment.


MPAI has developed specification of Audio ObjectVisual ObjectAudio-Visual Object, and of Audio Scene DescriptorsVisual Scene Descriptors, and Audio-Visual Scene Descriptors supporting the functional requirements identified above.

Here the Syntax and Semantics of the Audio-Visual Basic Scene Descriptors where the Scene is defined as composition of Objects is reported from [13]. Other Scene Descriptors can easily be derived from this.

3       Syntax

This is provided by

4       Semantics

This is provided by

5       Data Formats and Attributes

Traffic Signalisation Descriptors can be considered as Attributes of the Scene and its Objects.

6       To Respondents

Respondents are:

  1. Invited to comment on the functional requirements identified above and on the MPAI specifications that provide the information identified in 6.8.2.
  2. Requested to propose motivated extensions or new technologies.
  3. Requested to propose Traffic Signalisation Descriptors as Attributes.