(Tentative)

Definition Functional Requirements Syntax Semantics

Definition

Visual Spatial Directive (PGM-VSD)

  1. Is produced by the Domain Access (DA) AIM.
  2. Provides referential cues, user intent, and contextual constraints to guide interpretation of visual scenes.
  3. Enables Visual Spatial Reasoning (VSR) to resolve ambiguous entities, assess spatial feasibility, and generate primitives aligned with the User’s interaction goals.

Functional Requirements

Visual Spatial Directive conveys the following main information elements:

Function Description
Referent Cueing Includes ambiguous or partially resolved tokens (e.g., “that”, “there”) requiring spatial disambiguation; supports linkage to user gestures, gaze, or prior dialogue context
Intent Anchoring Includes the current user intent or goal motivating the spatial query; supports alignment with Prompt Creation and User State
Scene Anchoring Includes identifiers for the current scene or zone; supports multi-zone and multi-user environments
Constraint Injection Includes spatial or interaction constraints relevant to the query; supports filtering of unreachable, occluded, or non-viable entities
Resolution Guidance Includes optional hints or preferences for resolution strategy (e.g., prioritise gaze over proximity); supports modular resolution logic in VSR
Traceability Includes origin metadata and timestamp for audit and replay

Syntax

https://schemas.mpai.community/PGM1/V1.0/data/VisualSpatialDirective.json

Semantics

Label Description
Header Visual Spatial Directive Header
– Standard-VSD The characters “PGP-VSD-V”
– Version Major version – 1 or 2 characters
– Dot-separator The character “.” separating version components
– Subversion Minor version – 1 or 2 characters
VisualSpatialDirectiveID Unique identifier for this Visual Spatial Directive instance
SceneID Identifier of the scene or zone being queried
Intent User intent motivating the spatial query
ReferentCues Tokens requiring spatial disambiguation
├─ Token Ambiguous expression (e.g., “that”, “there”)
└─ ContextHint Optional semantic or gestural cue
Constraints Spatial or interaction constraints
├─ ConstraintType Type of constraint (e.g., reachable_from, visible_from)
└─ ConstraintValue Entity or zone involved
ResolutionHints Optional preferences for resolution strategy
├─ HintType e.g., prioritise_gaze, exclude_occluded
└─ HintValue Boolean or string
Trace Provenance metadata
├─ Origin Module or subsystem that generated the query
└─ Timestamp Time of creation