| Definition | Functional Requirements | Syntax | Semantics | Conformance Testing | Performance Assessment |
1 Definition
Enhanced Visual Scene Descriptors (EVD) provide an enriched representation of a scene described by Visual Scene Descriptors (VSD). EVDs include additional information generated by AI modules such as Visual Object Identification, Depth and Occlusion Estimation, Affordance Inference, and Visual Salience Mapping.
EVDs support downstream processing, orchestration, and interaction without duplicating the semantics or functionality of the generating AI modules.
2 Functional Requirements
The Enhanced Visual Scene Descriptors shall:
- Provide a mechanism to extend a Visual Scene Descriptors instance without duplicating its structure.
- Reference a base VSD instance through a unique identifier (BaseVSDID).
- Allow entity-level enrichment by linking enhanced entities to VSD items.
- Represent enhanced scene elements as entities with unique identifiers (EntityID).
- Support enrichment of entities with:
- visual object descriptors,
- relative depth information,
- occlusion state,
- interaction potential,
- salience.
- Support representation of object identification outputs through association with VisualObject descriptors.
- Support representation of spatial relationships through depth and occlusion information.
- Support representation of inferred interaction capabilities through affordance descriptors.
- Support visual affordances describing interaction possibilities including:
- graspable,
- pushable,
- openable,
- rotatable,
- liftable,
- insertable,
- selectable.
- Represent feasibility and constraints of interactions.
- Provide confidence and compliance indicators for inferred affordances.
- Support the inclusion of interaction potential for entities.
- Support salience analysis through:
- ranking of entities,
- selection of salient entities.
- Support interaction with execution environments through:
- VisualCXEDirective,
- VisualCXEStatus.
- Support interaction with domain modules through:
- DomainRequest,
- DomainResponse.
- Allow optional inclusion of processing and descriptive metadata.
- Ensure that enhancements are:
- consistent with the referenced VSD,
- non-duplicative of AIM functionality,
- composable across processing stages,
- extensible to additional attributes and modalities.
5 Syntax
https://schemas.mpai.community/OSD/V1.5/data/EnhancedVisualSceneDescriptors.json
4 Semantics
| Label | Description |
|---|---|
| Header | Identifies the schema version using the pattern “OSD‑EVD‑Vx.y”. |
| MInstanceID | Identifies the virtual space associated with the descriptors. |
| UEnvironmentID | Identifies the real space associated with the descriptors. |
| EnhancedVisualSceneDescriptorsID | Unique identifier of the enhanced descriptor instance. |
| EnhancedVisualSceneDescriptorsSpaceTime | Spatial and temporal scope of the enhanced descriptors. |
| BaseVSDID | Identifier of the Visual Scene Descriptors instance being extended. |
| Entities | Array of enhanced entities derived from visual objects in the base VSD. |
| EntityID | Unique identifier of the enhanced entity. |
| VSDItemID | Identifier of the corresponding VSD item being enriched. |
| VisualObject | Visual object descriptor associated with the entity. |
| RelativeDepth | Relative depth information associated with the entity. |
| OcclusionFlag | Indicates whether the entity is occluded. |
| Affordance | Describes possible interactions associated with the entity, expressed as an array of visual affordance items. |
| VisualAffordanceItem | Describes interaction possibilities based on visual properties (e.g., graspable, pushable). |
| Tag (Affordance) | Identifies the type of affordance. |
| Feasible | Indicates whether the affordance can be executed. |
| Constraints | Specifies conditions limiting the affordance. |
| ConstraintItem | Describes a constraint affecting feasibility (e.g., occluded, out_of_reach, safety_violation). |
| Severity | Indicates the severity of the constraint (info, warning, error). |
| Referent | Identifier of the entity to which the affordance applies. |
| Confidence | Degree of confidence in the affordance inference. |
| Compliance | Indicates whether the affordance complies with applicable rules. |
| FallbackApplied | Indicates whether a fallback action has been used. |
| FallbackTag | Indicates the fallback affordance type. |
| InteractionPotential | Describes the potential of the entity to support interaction. |
| Salience | Indicates the perceptual prominence of the entity. |
| RankedEntities | Array of entity identifiers ordered by salience. |
| SalientEntities | Array of entity identifiers selected as most relevant. |
| VisualCXEDirective | Directive issued to the execution environment based on visual analysis. |
| VisualCXEStatus | Status returned by the execution environment. |
| DomainRequest | Request issued to an external domain module. |
| DomainResponse | Response returned by the domain module. |
| DataXMData | Processing and exchange metadata associated with the descriptors. |
| DescrMetadata | Additional descriptive metadata (free text, up to 2048 characters). |
5 Conformance Testing
A Data instance Conforms with Enhanced Visual Scene Descriptors (OSD-EVD) if:
- The Data validates against the Enhanced Visual Scene Descriptors’ JSON Schema.
- All Data in the Enhanced Visual Scene Descriptors’ JSON Schema
- Have the specified type
- Validate against their JSON Schemas
- Conform with their Visual Data Qualifiers.