PGM-AUA V1.0 AIMs Visual Spatial Reasoning (Tentative)

(Tentative)

Function	Reference Model	Input/Output Data
SubAIMs	JSON Metadata	Profiles

1. Function

The Visual Spatial Reasoning AIM (PGM‑VSR) operates under the Visual Action Directive (PGM‑VAD). It processes Context – produced by Context Capture – that include User State and Visual Scene Descriptors (VSD₀). It reads PGM‑VAD, configures its reasoning pipeline accordingly, then refines and aligns Visual Scene Descriptors through an iterative loop with PGM‑DAC described below. User State is not explicitly mentioned in the iterative loop of Table 1.

Table 1 – Iterative loop of Visual Scene Descriptors

Phase	Inputs	Operation	Outputs	Recipient
Directive intake	PGM‑VAD (Visual Action Directive)	Conform pipeline: select required spatial operations, constraints, and priorities.	Conformance plan (internal)	—
Initial refinement	VSD0 (from OSD‑VSD), conformance plan	Produce VSD1 aligned to directive: descriptor parsing, normalisation, preliminary localisation.	VSD1	PGM‑DAC
Domain enrichment	VSD1	Apply domain‑specific knowledge, resolve ambiguities, add semantic attributes.	VSD2	PGM‑VSR
Directive‑aligned reasoning	VSD2, conformance plan	Execute directive‑scoped spatial reasoning: refined localisation, depth/occlusion, affordance inference, salience mapping.	OSD‑VSD2 (final descriptors)	PGM‑PTC
Status reporting	Conformance plan, execution results	Summarise compliance, coverage, uncertainties, and residual constraints	PGM‑VSR (Visual Action Status)	PGM‑AUC

Processing details

Directive conformance: VSR reads Visual Action Directive (PGM‑VAD), selects and orders spatial operations, sets thresholds and scope, and binds any domain constraints that must be respected downstream.
Initial refinement: VSR transforms VSD₀into VSD₁ under directive constraints, focusing on descriptor parsing, normalisation, and preliminary localisation.
Domain enrichment: Domain Access (PGM‑DAC) augments VSD₁ with domain knowledge, returning VSD₂ for directive‑aligned reasoning.
Spatial reasoning: VSR applies refined localisation, depth and occlusion estimation, affordance inference, and salience mapping, within the directive’s priorities and limits.
Outputs:
- OSD‑VSD₂ (final refined Visual Scene Descriptors): sent to Prompt Creation (PGM‑PRC).
- PGM‑VAS (Visual Action Status): sent to A‑User Control (PGM‑AUC), capturing compliance and residual gaps.

Iterative loop

Feedback: If PGM‑AUC updates PGM‑VAD, VSR re‑conforms the pipeline and re‑runs refinement on the latest descriptors.
Coherence: The loop keeps reasoning spatially coherent and domain‑aware, ensuring downstream AIMs operate with directive‑aligned visual context.

2. Reference Model

Figure 1 gives the Reference Model of the Visual Spatial Reasoning (PGM-VSR) AIM.

Figure 2 – The Reference Model of the Visual Spatial Reasoning (PGM-VSR) AIM

3. Input/Output Data

Table 2 gives the Input and Output Data of PGM-VSR.

Table 2 – Input/Output Data of PGM-VSR

Input	Description
Context	A structured and time-stamped snapshot representing the initial understanding that the A-User achieves of the environment and of the User posture.
Visual Scene Descriptors	A modification of the input Visual Scene Descriptors provided by the Domain Access AIM to help the interpretation of the Visual Scene by injecting constraints, priorities, and refinement logic.
Visual Action Directive	Visual-related actions and process sequences from PGM-AUC.
Output	Description
Visual Scene Descriptors	A structured, analytical representation of the Visual Scene with object geometry, 3D positions, depth, occlusion, and affordance data. It highlights salient objects, normalised positions, proximity, and interaction cues.
Visual Action Status	Visual spatial constraints and scene anchoring from PGM-AUC.

4. SubAIMs

Figure 2 gives the Reference Model of the Visual Spatial Reasoning (PGM-VSR) Composite AIM.

Figure 2 – Reference Model of Visual Spatial Reasoning (PGM-VSR) Composite AIM

Table 3 specifies the Functions performed by PGM-VSP AIM’s SubAIMs in the current example partitioning in SubAIMs.

Table 3 – Functions performed by PGM-VSP AIM’s SubAIMs (example)

	Sub-AIM	Function
VDP	Visual Descriptors Parsing	Decomposes VSD into structured components: Visual Objects (VIO) and Spatial Attitudes (OSA)
DOE	Depth and Occlusion Estimation	Computes relative depth and occlusion flags
AFI	Affordance Inference	Determines actionable properties and interaction potential
VSM	Visual Salience Mapping	Ranks objects by prominence and filters salient entities
VOC	Visual Output Construction	Builds output Visual Scene Descriptors from salient and full descriptors

Table 4 gives the AIMs composing the Visual Spatial Reasoning (PGM-VSR) Composite AIM:

Table 4 – AIMs of the Visual Spatial Reasoning (PGM-VSR) Composite AIM

AIM	AIMs	Names	JSON
PGM-VSR		Visual Spatial Reasoning	Link
	PGM-ADP	Visual Descriptors Parsing	Link
	PGM-DOE	Depth and Occlusion Estimation	Link
	PGM-AFI	Affordance Inference	Link
	PGM-SMP	Visual Salience Mapping	Link
	PGM-VOC	Visual Output Construction	Link

Table 5 gives the input and output data of the PGM-VSR AIM.

Table 5 – Input and output data of the PGM-VSR AIM

AIMs	Input	Output	To
Visual Descriptors Parsing	Visual Scene Descriptors	Visual Objects Spatial Attitude	DOE, AFI, SMP, VOC
Depth and Occlusion Estimation	Visual Objects Spatial Attitude Visual Spatial Directive	Relative Depths Occlusion Flags Visual Spatial Status	SMP, VOC
Affordance Inference	Visual Objects Spatial Attitude Visual Action Directive	Affordance Tags Interaction Potential Visual Spatial Status	SMP, VOC
Salience Mapping	Relative Depths Occlusion Flags Affordance Tags Interaction Potential Visual Action Directive	Relative Depths Occlusion Flags Ranked Visual Objects Affordance Tags Interaction Potential Salient Visual Objects Visual Spatial Status	VOC
Visual Output Construction	Relative Depths Occlusion Flags Ranked Visual Objects Affordance Tags Interaction Potential Salient Visual Objects Visual Spatial Status	Visual Scene Descriptors Visual Spatial Status	—

Table 6 specifies the External and Internal Data Types of the Visual Spatial Reasoning AIM.

Table 6 – External and Internal Data Types identified in Visual Spatial Reasoning AIM

Data Type	Definition
VisualSceneDescriptors	– Final structured output containing all spatialised and semantically enriched visual data (input). – The product of the Composite AIM (output).
UserPointOfView	Contained in component Basic Visual Scene Descriptors.
VisualObjects	Structured list of Visual Objects extracted from the Visual Scene Descriptors.
SpatialAttitudes	Position, Orientation, and their first and second order D spatial attributes of each Visual Object, including .
DepthEstimates	Classification of each object’s relative depth (e.g., foreground, midground, background).
OcclusionFlags	Visibility classification of each object (e.g., fully visible, partially occluded, hidden).
AffordanceProfile	Actionable properties of visual objects (e.g., graspable, tappable, obstructive) and inferred interaction potential.
RankedVisualObject	Ordered list of visual objects prioritized by perceptual salience and interaction relevance.
FilteredSalientObjects	Subset of Ranked Visual Objects selected for inclusion in the Visual Spatial Guide.
VisualSpatialDirective	Dynamic modifier provided by Domain Access AIM. Injects constraints, priorities, and refinement logic into reasoning Sub-AIMs.
VisualSpatialStatus	Structured status report from directive-aware Sub-AIMs. Includes constraint satisfaction, override flags, and anchoring metadata.

Tables 7 and 8 describe the effects of the Visual Spatial Directive on SubAIMs and of the AIMs on the Visual Spatial Status

Table 7 – Effects of VisualActionDirective on PGM-VSR SubAIMs

	Sub-AIM	Directive Effects
DOE	Depth & Occlusion Estimation	– Override depth thresholds (e.g., redefine “foreground” zone). – Prioritise occlusion types (e.g., favour fully visible objects). – Suppress or amplify depth cues based on User State (if the metaverse allows data sharing).
AFI	Affordance Inference	– Inject constraints on actionable properties (e.g., suppress obstructive objects). – Prioritise interaction-relevant affordances. – Refine affordance logic based on User Intent.
VSM	Visual Salience Mapping	– Bias salience scoring toward directive-relevant objects – Filter out objects below directive-defined thresholds – Align ranking logic with User attention or Cognitive State
VOC	Visual Output Construction	– Select output format based on directive (e.g., guide vs full descriptor) – Apply framing constraints (e.g., include only directive-relevant objects) – Normalise output to User viewpoint or temporal focus

Table 8 – Contributions to VisualActionStatus from SubAIMs

	Sub-AIM	Status Contributions
DOE	Depth & Occlusion Estimation	– Flags for directive compliance (e.g., depth match) – Occlusion override trace – Object inclusion/exclusion rationale
AFI	Affordance Inference	– Affordance constraint match status – Interaction potential override trace – Confidence in affordance inference
VSM	Visual Salience Mapping	– Salience override applied – Filtered object count – Ranking logic traceability
VOC	Visual Output Construction	– Final directive compliance summary – Anchoring metadata – Output format trace (e.g., guide vs descriptor)

5. JSON Metadata

https://schemas.mpai.community/PGM1/V1.0/AIMs/VisualSpatialReasoning.json

6. Profiles

No Profiles

Cookie	Duration	Description
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Technical".
CookieLawInfoConsent	1 year	The cookie is set by the GDPR Cookie Consent plug-in and is used to store whether the user has consented to the use of cookies or not. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_pk_id.6.08a8	13 months	Used to store a few details about the user such as the unique visitor ID
_pk_ses.6.08a8	30 minutes	Short lived cookies used to temporarily store data for the visit