1       Definition

Digital representation of information captured in the visible range of the electromagnetic field from single camera or an array of cameras.

2       Functional Requirements

A visual scene can be captured by an array of visual sensors characterised by:

  1. Number and position of sensing devices.
  2. Number of horizontal and vertical sensors in a sensing device.
  3. Frame frequency.
  4. Colour space.
  5. Bit-depth information.
  6. Depth (distance from scene pixel) information.

Captured Data

  1. Provide pixel value, time and potentially depth.
  2. Are used to:
    1. Provide the position and orientation of individual visual objects.
    2. Provide Visual Scene Descriptors.
    3. Enable identification, tracking and representation of relevant visual objects, including humans.

3       Syntax



4       Semantics

Label Size Description
Header N1 Bytes
·         Standard 8 Bytes The characters “CAV-IVI-V”
·         Version N2 Bytes Major version – 1 or 2 Bytes
·         Dot-separator 1 Byte The character “.”
·         Subversion N3 Bytes Minor version – 1 or 2 Bytes.
InputVisualID N5 Bytes Identifier of LiDAR Sensor.
InputVisualTimeSpaceAttributes N6 Bytes Total duration of Input Visual Data.
InputVisualData N7 Bytes
·         InputVisualFormatID N8 Bytes Format ID of Input Visual Data.
·         InputVisualDataLength N9 Bytes Data Length of Input Visual Data in Bytes.
·         InputVisualDataURI N10 Bytes Location of Input Visual Data.
AudioObjectAttributes[] N11 Bytes
·         InputVisualAttributeID N12 Bytes ID of Attribute of Input Visual Data
·         InputVisualFormatAttributeID N13 Bytes ID of Attribute Format of Input Visual Data
·         InputVisualAttributeLength N14 Bytes Number of Bytes in Input Visual Data
·         InputVisualAttributeDataURI N15 Bytes URI of Data of Input Visual Data
DescrMetadata N16 Bytes Descriptive Metadata

5       Data Types

Input Visual required the following Formats:

  1. Visual Format
  2. Visual Attribute
  3. Visual Attribute Format

6       To Respondents

Respondents are invited to:

  1. Comment Functional Requirements or propose new ones.
  2. Comment on and propose formats (2D, 2D+ depth, or 3D visual sensors) use in the future Technical Specification: Data Types, Formats, and Attributes (MPAI-TFA).