MPAI-OSD V1.5 Data Types - Enhanced Visual Scene Descriptors

1 Definition

Enhanced Visual Scene Descriptors (EVD) provide an enriched representation of a scene described by Visual Scene Descriptors (VSD). EVDs include additional information generated by AI modules such as Visual Object Identification, Depth and Occlusion Estimation, Affordance Inference, and Visual Salience Mapping.

EVDs support downstream processing, orchestration, and interaction without duplicating the semantics or functionality of the generating AI modules.

2 Functional Requirements

The Enhanced Visual Scene Descriptors shall:

Provide a mechanism to extend a Visual Scene Descriptors instance without duplicating its structure.
Reference a base VSD instance through a unique identifier (BaseVSDID).
Allow entity-level enrichment by linking enhanced entities to VSD items.
Represent enhanced scene elements as entities with unique identifiers (EntityID).
Support enrichment of entities with:
- visual object descriptors,
- relative depth information,
- occlusion state,
- interaction potential,
- salience.
Support representation of object identification outputs through association with VisualObject descriptors.
Support representation of spatial relationships through depth and occlusion information.
Support representation of inferred interaction capabilities through affordance descriptors.
Support visual affordances describing interaction possibilities including:
- graspable,
- pushable,
- openable,
- rotatable,
- liftable,
- insertable,
- selectable.
Represent feasibility and constraints of interactions.
Provide confidence and compliance indicators for inferred affordances.
Support the inclusion of interaction potential for entities.
Support salience analysis through:
- ranking of entities,
- selection of salient entities.
Support interaction with execution environments through:
- VisualCXEDirective,
- VisualCXEStatus.
Support interaction with domain modules through:
- DomainRequest,
- DomainResponse.
Allow optional inclusion of processing and descriptive metadata.
Ensure that enhancements are:
- consistent with the referenced VSD,
- non-duplicative of AIM functionality,
- composable across processing stages,
- extensible to additional attributes and modalities.

5 Syntax

https://schemas.mpai.community/OSD/V1.5/data/EnhancedVisualSceneDescriptors.json

4 Semantics

Label	Description
Header	Identifies the schema version using the pattern “OSD‑EVD‑Vx.y”.
MInstanceID	Identifies the virtual space associated with the descriptors.
UEnvironmentID	Identifies the real space associated with the descriptors.
EnhancedVisualSceneDescriptorsID	Unique identifier of the enhanced descriptor instance.
EnhancedVisualSceneDescriptorsSpaceTime	Spatial and temporal scope of the enhanced descriptors.
BaseVSDID	Identifier of the Visual Scene Descriptors instance being extended.
Entities	Array of enhanced entities derived from visual objects in the base VSD.
EntityID	Unique identifier of the enhanced entity.
VSDItemID	Identifier of the corresponding VSD item being enriched.
VisualObject	Visual object descriptor associated with the entity.
RelativeDepth	Relative depth information associated with the entity.
OcclusionFlag	Indicates whether the entity is occluded.
Affordance	Describes possible interactions associated with the entity, expressed as an array of visual affordance items.
VisualAffordanceItem	Describes interaction possibilities based on visual properties (e.g., graspable, pushable).
Tag (Affordance)	Identifies the type of affordance.
Feasible	Indicates whether the affordance can be executed.
Constraints	Specifies conditions limiting the affordance.
ConstraintItem	Describes a constraint affecting feasibility (e.g., occluded, out_of_reach, safety_violation).
Severity	Indicates the severity of the constraint (info, warning, error).
Referent	Identifier of the entity to which the affordance applies.
Confidence	Degree of confidence in the affordance inference.
Compliance	Indicates whether the affordance complies with applicable rules.
FallbackApplied	Indicates whether a fallback action has been used.
FallbackTag	Indicates the fallback affordance type.
InteractionPotential	Describes the potential of the entity to support interaction.
Salience	Indicates the perceptual prominence of the entity.
RankedEntities	Array of entity identifiers ordered by salience.
SalientEntities	Array of entity identifiers selected as most relevant.
VisualCXEDirective	Directive issued to the execution environment based on visual analysis.
VisualCXEStatus	Status returned by the execution environment.
DomainRequest	Request issued to an external domain module.
DomainResponse	Response returned by the domain module.
DataXMData	Processing and exchange metadata associated with the descriptors.
DescrMetadata	Additional descriptive metadata (free text, up to 2048 characters).

5 Conformance Testing

A Data instance Conforms with Enhanced Visual Scene Descriptors (OSD-EVD) if:

The Data validates against the Enhanced Visual Scene Descriptors’ JSON Schema.
All Data in the Enhanced Visual Scene Descriptors’ JSON Schema
1. Have the specified type
2. Validate against their JSON Schemas
3. Conform with their Visual Data Qualifiers.

Cookie	Duration	Description
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Technical".
CookieLawInfoConsent	1 year	The cookie is set by the GDPR Cookie Consent plug-in and is used to store whether the user has consented to the use of cookies or not. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_pk_id.6.08a8	13 months	Used to store a few details about the user such as the unique visitor ID
_pk_ses.6.08a8	30 minutes	Short lived cookies used to temporarily store data for the visit

MPAI-OSD V1.5 Data Types – Enhanced Visual Scene Descriptors

1 Definition

2 Functional Requirements

5 Syntax

4 Semantics

5 Conformance Testing

6 Performance Assessment

MPAI-OSD V1.5 Data Types – Enhanced Visual Scene Descriptors

1 Definition

2 Functional Requirements

5 Syntax

4 Semantics

5 Conformance Testing

6 Performance Assessment

Notice