(Tentative)
| Definition | Functional Requirements | Syntax | Semantics |
Definition
Visual Spatial Guide (PGM-VSG)
- Is produced by the Visual Spatial Reasoning (VSR) AIM
- Represents the User-centric visual spatial context derived from scene interpretation
- Enriches the User’s spoken or written input with spatial cues – such as object relevance, orientation, proximity, and affordance – to support natural language augmentation and grounding prior to prompt generation
Functional Requirements
Visual Spatial Guide conveys the following main information elements:
| Function | Description |
| Salient Object Selection | Prioritises visual objects relevant to user intent or focus |
| Relative Position Mapping | Translates 3D coordinates into user-relative descriptors |
| Proximity Framing | Classifies objects as near, mid, or far for contextual emphasis |
| Affordance Labeling | Attaches actionable properties (e.g., “can sit”, “can open”) |
| Occlusion Awareness | Indicates visibility status of objects |
| Viewpoint Normalisation | Adjusts spatial descriptors to match user orientation |
| Output Generation | Produces Visual Adaptive Context Guide for Prompt Cre |
Syntax
https://schemas.mpai.community/PGM1/V1.0/data/VisualSpatialGuide.json
Semantics
| Label | Description |
| Header | Schema header with version tag |
| ├─ Standard-VSG | The characters “PGM-VSG-V” |
| ├─ Version | Major version – 1 or 2 characters |
| ├─ Dot-separator | The character “.” separating version components |
| └─ Subversion | Minor version – 1 or 2 characters |
| VisualSpatialGuideID | Unique identifier for this guide instance |
| MInstanceID | Identifier of M-Instance |
| MEnvironmentID | Identifier of M-Environment |
| SalientObjects | List of user-relevant visual objects |
| ├─ ObjectID | Unique ID for each object |
| ├─ Label | Semantic label (e.g., “chair”, “door”) |
| ├─ RelativePosition | Azimuth, elevation, and distance from user |
| ├─ Proximity | Estimated closeness to the User |
| ├─ Affordance | Actionable property (e.g., “can sit”) |
| ├─ Occlusion | Whether the object is partially hidden |
| └─ NarrativeCue | Optional natural language cue for prompt enrichment |
| SceneSummary | Summary of visual scene conditions |
| ├─ DominantObject | Most visually prominent object |
| ├─ LightingCondition | Ambient lighting descriptor |
| └─ SpatialDensity | Density of objects in scene |
| Trace | Provenance metadata |
| ├─ Origin | Module that generated the guide |
| └─ Timestamp | Time of guide creation |