Go to MPAI-OSD Data Types

Definition Functional Requirements Syntax Semantics Conformance Testing Performance Assessment

1      Definition

Enhanced Visual Scene Descriptors (EVD) provide an enriched representation of a scene described by Visual Scene Descriptors (VSD). EVDs include additional information generated by AI modules such as Visual Object Identification, Depth and Occlusion Estimation, Affordance Inference, and Visual Salience Mapping.

EVDs support downstream processing, orchestration, and interaction without duplicating the semantics or functionality of the generating AI modules.

2      Functional Requirements

The Enhanced Visual Scene Descriptors shall:

  • Provide a mechanism to extend a Visual Scene Descriptors instance without duplicating its structure.
  • Reference a base VSD instance through a unique identifier (BaseVSDID).
  • Allow entity-level enrichment by linking enhanced entities to VSD items.
  • Represent enhanced scene elements as entities with unique identifiers (EntityID).
  • Support enrichment of entities with:
    • visual object descriptors,
    • relative depth information,
    • occlusion state,
    • interaction potential,
    • salience.
  • Support representation of object identification outputs through association with VisualObject descriptors.
  • Support representation of spatial relationships through depth and occlusion information.
  • Support representation of inferred interaction capabilities through affordance descriptors.
  • Support visual affordances describing interaction possibilities including:
    • graspable,
    • pushable,
    • openable,
    • rotatable,
    • liftable,
    • insertable,
    • selectable.
  • Represent feasibility and constraints of interactions.
  • Provide confidence and compliance indicators for inferred affordances.
  • Support the inclusion of interaction potential for entities.
  • Support salience analysis through:
    • ranking of entities,
    • selection of salient entities.
  • Support interaction with execution environments through:
    • VisualCXEDirective,
    • VisualCXEStatus.
  • Support interaction with domain modules through:
    • DomainRequest,
    • DomainResponse.
  • Allow optional inclusion of processing and descriptive metadata.
  • Ensure that enhancements are:
    • consistent with the referenced VSD,
    • non-duplicative of AIM functionality,
    • composable across processing stages,
    • extensible to additional attributes and modalities.

5     Syntax

https://schemas.mpai.community/OSD/V1.5/data/EnhancedVisualSceneDescriptors.json

4      Semantics

Label Description
Header Identifies the schema version using the pattern “OSD‑EVD‑Vx.y”.
MInstanceID Identifies the virtual space associated with the descriptors.
UEnvironmentID Identifies the real space associated with the descriptors.
EnhancedVisualSceneDescriptorsID Unique identifier of the enhanced descriptor instance.
EnhancedVisualSceneDescriptorsSpaceTime Spatial and temporal scope of the enhanced descriptors.
BaseVSDID Identifier of the Visual Scene Descriptors instance being extended.
Entities Array of enhanced entities derived from visual objects in the base VSD.
EntityID Unique identifier of the enhanced entity.
VSDItemID Identifier of the corresponding VSD item being enriched.
VisualObject Visual object descriptor associated with the entity.
RelativeDepth Relative depth information associated with the entity.
OcclusionFlag Indicates whether the entity is occluded.
Affordance Describes possible interactions associated with the entity, expressed as an array of visual affordance items.
VisualAffordanceItem Describes interaction possibilities based on visual properties (e.g., graspable, pushable).
Tag (Affordance) Identifies the type of affordance.
Feasible Indicates whether the affordance can be executed.
Constraints Specifies conditions limiting the affordance.
ConstraintItem Describes a constraint affecting feasibility (e.g., occluded, out_of_reach, safety_violation).
Severity Indicates the severity of the constraint (info, warning, error).
Referent Identifier of the entity to which the affordance applies.
Confidence Degree of confidence in the affordance inference.
Compliance Indicates whether the affordance complies with applicable rules.
FallbackApplied Indicates whether a fallback action has been used.
FallbackTag Indicates the fallback affordance type.
InteractionPotential Describes the potential of the entity to support interaction.
Salience Indicates the perceptual prominence of the entity.
RankedEntities Array of entity identifiers ordered by salience.
SalientEntities Array of entity identifiers selected as most relevant.
VisualCXEDirective Directive issued to the execution environment based on visual analysis.
VisualCXEStatus Status returned by the execution environment.
DomainRequest Request issued to an external domain module.
DomainResponse Response returned by the domain module.
DataXMData Processing and exchange metadata associated with the descriptors.
DescrMetadata Additional descriptive metadata (free text, up to 2048 characters).

5     Conformance Testing

A Data instance Conforms with Enhanced Visual Scene Descriptors (OSD-EVD) if:

  1. The Data validates against the Enhanced Visual Scene Descriptors’ JSON Schema.
  2. All Data in the Enhanced Visual Scene Descriptors’ JSON Schema
    1. Have the specified type
    2. Validate against their JSON Schemas
    3. Conform with their Visual Data Qualifiers.

6     Performance Assessment