(Tentative)

Definition Functional Requirements Syntax Semantics

Definition

Visual Spatial Guide (PGM-VSG)

  1. Is produced by the Visual Spatial Reasoning (VSR) AIM
  2. Represents the User-centric visual spatial context derived from scene interpretation
  3. Enriches the User’s spoken or written input with spatial cues – such as object relevance, orientation, proximity, and affordance – to support natural language augmentation and grounding prior to prompt generation

Functional Requirements

Visual Spatial Guide conveys the following main information elements:

Function Description
Salient Object Selection Prioritises visual objects relevant to user intent or focus
Relative Position Mapping Translates 3D coordinates into user-relative descriptors
Proximity Framing Classifies objects as near, mid, or far for contextual emphasis
Affordance Labeling Attaches actionable properties (e.g., “can sit”, “can open”)
Occlusion Awareness Indicates visibility status of objects
Viewpoint Normalisation Adjusts spatial descriptors to match user orientation
Output Generation Produces Visual Adaptive Context Guide for Prompt Cre

Syntax

https://schemas.mpai.community/PGM1/V1.0/data/VisualSpatialGuide.json

Semantics

Label Description
Header Schema header with version tag
├─ Standard-VSG The characters “PGM-VSG-V”
├─ Version Major version – 1 or 2 characters
├─ Dot-separator The character “.” separating version components
└─ Subversion Minor version – 1 or 2 characters
VisualSpatialGuideID Unique identifier for this guide instance
MInstanceID Identifier of M-Instance
MEnvironmentID Identifier of M-Environment
SalientObjects List of user-relevant visual objects
├─ ObjectID Unique ID for each object
├─ Label Semantic label (e.g., “chair”, “door”)
├─ RelativePosition Azimuth, elevation, and distance from user
├─ Proximity Estimated closeness to the User
├─ Affordance Actionable property (e.g., “can sit”)
├─ Occlusion Whether the object is partially hidden
└─ NarrativeCue Optional natural language cue for prompt enrichment
SceneSummary Summary of visual scene conditions
├─ DominantObject Most visually prominent object
├─ LightingCondition Ambient lighting descriptor
└─ SpatialDensity Density of objects in scene
Trace Provenance metadata
├─ Origin Module that generated the guide
└─ Timestamp Time of guide creation