(Tentative)
| Definition | Functional Requirements | Syntax | Semantics |
Definition
Visual Spatial Primitives represent spatial relationships between entities. They encode topological, directional, proximity-based, and functional relations derived from perceptual cues such as geometry, gesture, gaze, and semantic inference. These primitives serve as the low-level substrate for spatial reasoning, behaviour triggering, and multimodal alignment across agents and environments.
Functional Requirements
- Relational Encoding
- Must encode a spatial relationship using a controlled vocabulary:
- Topological: inside, outside, contains, etc.
- Directional: left_of, above, behind, etc.
- Proximity: near, far_from, adjacent_to, etc.
- Functional: reachable_from, visible_from, supports, etc.
- Entity Referencing
- Must include a SubjectID and ObjectID to define the entities involved in the relation.
- Must support referencing objects, zones, users, or abstract anchors.
- Confidence Scoring
- Must include a Confidence value (0.0–1.0) indicating certainty of the relation.
- Must support modulation of downstream behaviours, phrasing, or clarification logic.
- Scene Context
- Must optionally include a SceneReference to anchor the relation within a specific zone or environment.
- Must support multi-zone and multi-user contexts.
- Resolution Provenance
- Must include a ResolutionMethod indicating how the primitive was derived:
- GeometryBehaviour, GestureBehaviour, GazeBehaviour, ProximityBehaviour, SemanticInferenceBehaviour
- Must support traceability and explainability of perceptual logic.
- Temporal Anchoring
- Must include a Timestamp marking when the primitive was generated.
- Must conform to the Time.json schema for consistency.
- Instance-Level Metadata
- Must include a globally unique SpatialPrimitivesID for the full package.
- Must include a unique SpatialPrimitivesD for each individual primitive.
Syntax
https://schemas.mpai.community/PGM1/V1.0/data/VisualSpatialPrimitives.json
Semantics
| Label | Description |
| Header | Visual Spatial Primitives Header |
| – Standard-VSP | The characters “PGM-VSP-V” |
| – Version | Major version – 1 or 2 characters |
| – Dot-separator | The character “.” |
| – Subversion | Minor version – 1 or 2 characters |
| MInstanceID | Identifier of M-Instance. |
| MEnvironmentID | Identifier of M-Environment. |
| VisualSpatialPrimitivesID | Unique identifier for this Visual Spatial Primitive instance. Used for traceability, replay, and linking to specific SR or PC cycles. |
| Relation | Type of spatial relationship between entities. Drawn from a controlled vocabulary including: |
| – Topological: | inside, outside, contains, touches, overlaps, disjoint |
| – Directional: | left_of, right_of, above, below, in_front_of, behind |
| – Proximity: | near, far_from, adjacent_to, between |
| – Functional: | reachable_from, visible_from, accessible_via, supports, blocks |
| SubjectID | Identifier of the entity initiating the spatial relation. Typically an object, zone, or User. |
| ObjectID | Identifier of the entity receiving the spatial relation. Defines the spatial target or reference. |
| Confidence | Float (0.0–1.0) indicating certainty of the relationship. Used to modulate prompt phrasing, trigger clarification, or prioritise Behaviours. |
| SceneReference | Optional reference to the scene or zone where the relation was observed. Useful for multi-zone or multi-user environments. |
| ResolutionMethod | Strategy used to derive the primitive. Controlled vocabulary includes: |
| – GeometryBehaviour | Derived from spatial coordinates or bounding boxes |
| – GestureBehaviour | Inferred from user hand pose or motion vector |
| – GazeBehaviour | Inferred from gaze vector and head pose |
| – ProximityBehaviour | Based on distance thresholds |
| – SemanticInferenceBehaviour | Derived from contextual or verbal cues |
| Timestamp | Timestamp marking when the primitive was generated. |