(Tentative)

Definition Functional Requirements Syntax Semantics

Definition

Visual Spatial Primitives represent spatial relationships between entities. They encode topological, directional, proximity-based, and functional relations derived from perceptual cues such as geometry, gesture, gaze, and semantic inference. These primitives serve as the low-level substrate for spatial reasoning, behaviour triggering, and multimodal alignment across agents and environments.

Functional Requirements

  1. Relational Encoding
  • Must encode a spatial relationship using a controlled vocabulary:
    • Topological: inside, outside, contains, etc.
    • Directional: left_of, above, behind, etc.
    • Proximity: near, far_from, adjacent_to, etc.
    • Functional: reachable_from, visible_from, supports, etc.
  1. Entity Referencing
  • Must include a SubjectID and ObjectID to define the entities involved in the relation.
  • Must support referencing objects, zones, users, or abstract anchors.
  1. Confidence Scoring
  • Must include a Confidence value (0.0–1.0) indicating certainty of the relation.
  • Must support modulation of downstream behaviours, phrasing, or clarification logic.
  1. Scene Context
  • Must optionally include a SceneReference to anchor the relation within a specific zone or environment.
  • Must support multi-zone and multi-user contexts.
  1. Resolution Provenance
  • Must include a ResolutionMethod indicating how the primitive was derived:
    • GeometryBehaviour, GestureBehaviour, GazeBehaviour, ProximityBehaviour, SemanticInferenceBehaviour
  • Must support traceability and explainability of perceptual logic.
  1. Temporal Anchoring
  • Must include a Timestamp marking when the primitive was generated.
  • Must conform to the Time.json schema for consistency.
  1. Instance-Level Metadata
  • Must include a globally unique SpatialPrimitivesID for the full package.
  • Must include a unique SpatialPrimitivesD for each individual primitive.

Syntax

https://schemas.mpai.community/PGM1/V1.0/data/VisualSpatialPrimitives.json

Semantics

Label Description
Header Visual Spatial Primitives Header
– Standard-VSP The characters “PGM-VSP-V”
– Version Major version – 1 or 2 characters
– Dot-separator The character “.”
– Subversion Minor version – 1 or 2 characters
MInstanceID Identifier of M-Instance.
MEnvironmentID Identifier of M-Environment.
VisualSpatialPrimitivesID Unique identifier for this Visual Spatial Primitive instance. Used for traceability, replay, and linking to specific SR or PC cycles.
Relation Type of spatial relationship between entities. Drawn from a controlled vocabulary including:
– Topological: inside, outside, contains, touches, overlaps, disjoint
– Directional: left_of, right_of, above, below, in_front_of, behind
– Proximity: near, far_from, adjacent_to, between
– Functional: reachable_from, visible_from, accessible_via, supports, blocks
SubjectID Identifier of the entity initiating the spatial relation. Typically an object, zone, or User.
ObjectID Identifier of the entity receiving the spatial relation. Defines the spatial target or reference.
Confidence Float (0.0–1.0) indicating certainty of the relationship. Used to modulate prompt phrasing, trigger clarification, or prioritise Behaviours.
SceneReference Optional reference to the scene or zone where the relation was observed. Useful for multi-zone or multi-user environments.
ResolutionMethod Strategy used to derive the primitive. Controlled vocabulary includes:
– GeometryBehaviour Derived from spatial coordinates or bounding boxes
– GestureBehaviour Inferred from user hand pose or motion vector
– GazeBehaviour Inferred from gaze vector and head pose
– ProximityBehaviour Based on distance thresholds
– SemanticInferenceBehaviour Derived from contextual or verbal cues
Timestamp Timestamp marking when the primitive was generated.