1       Definition

Digital representation of information captured in the visible range of the electromagnetic field from single camera or an array of cameras.

2       Functional Requirements

A visual scene can be captured by an array of visual sensors characterised by:

  1. Number and position of sensing devices.
  2. Number of horizontal and vertical sensors in a sensing device.
  3. Frame frequency.
  4. Colour space.
  5. Bit-depth information.
  6. Depth (distance from scene pixel) information.

Captured Data

  1. Provide pixel value, time and potentially depth to:
    1. Provide the position and orientation of individual visual objects.
    2. Provide Visual Scene Descriptors.
    3. Enable identification, tracking and representation of relevant visual objects, including humans.

3       Syntax

https://schemas.mpai.community/OSD/V1.1/data/InputVisual.json

4       Semantics

Label Size Description
Header N1 Bytes Input Visual Header
– Standard 9 Bytes The characters “OSD-IVI-V”
– Version N2 Bytes Major version – 1 or 2 characters
– Dot-separator 1 Byte The character “.”
– Subversion N3 Byte Minor version – 1 or 2 characters
MInstanceID N4 Bytes ID of the Virtual Space.
InputVisualID N5 Bytes ID of Input Visual.
VisualSensorID N6 Bytes ID of Visual Sensor.
InputVisualData Ny Bytes Set of Input Visual Data
– InputVisualQualifier N12 Bytes The Qualifier of Input Visual
– InputVisualPayload N13 Bytes The Payload of Input Visual
  – InputVisualDataLength N14 Bytes The Length of Input Visual Data in Bytes
  – InputVisualDataURI N15 Bytes URI of Input Visual Data
DescrMetadata N16 Bytes Descriptive Metadata

5         Data Formats

Visual Qualifiers (Sub-Types, Formats, and Attributes) are required.

 6       To Respondents

  1. Comment Functional Requirements or propose new ones.
  2. Comment on and propose Qualifiers: Sub-Types, Formats (2D, 2D+ depth, or 3D visual sensors), and Attributes .