This is the public page of the Visual Object and Scene Description (MPAI-OSD) standard. See the MPAI-OSD homepage.

The current goal of the project is to collect Use Cases sharing the goal of describing visual objects and, in some cases, locate them in the space.

By scene description we mean the usual description of objects and their attributes in a scene and the semantic description of the objects.

Below are some of use cases in need of  Visual Object and Scene Description.

New use cases are constantly identified that require new AIMs falling in the MPAI-OSD scope.

2.1 Audio Tape Irregularity
2.1.1 MPAI-CAE-ARP: Audio Recording Preservation
2.2 Identify object in a human’s hand
2.2.1 MPAI-MMC-MQA: Multimodal Question Answering.
2.2.2 MPAI-CAV-HCI: Human-CAV Interaction
2.2.3 Conversation About a Scene
2.3 Detecting emotion and meaning in human face
2.3.1 MPAI-MMC-CWE: Conversation with Emotion
2.4 MPAI-CAV
2.4.1 MPAI-MCS: Mixed-reality Collaborative Spaces
2.5 Tracking video game player’s movements
2.6 Correct Posture
2.7 Integrative genomic/video experiments (animals)


If you wish to participate in this work you have the following options

  1. Join MPAI
  2. Participate until the MPAI-SPG Functional Requirements are approved (after that only MPAI members can participate) by sending an email to the MPAI Secretariat.
  3. Keep an eye on this page.

Return to the MPAI-OSD page