Function
Ref. Model
I/O Data
SubAIMs
JSON MData
Profiles
Ref. Software
Conformance
Performance
1 Functions
The Visual Scene Enhancement (PGM‑VSE) AIM produces the description of a visual scene from the captured Visual Object, deriving the perceptual and semantic visual properties relevant to spatial understanding, interaction, and A‑User‑centric reasoning.
Visual Scene Enhancement
- Operates on the Visual CXT Directive received from A‑User Control, the Visual Object, and the Visual Domain Response resulting from queries made to Domain Access,
- Produces the Enhanced Visual Scene Descriptors and the Visual CXT Status sent to Audio-Visual-User Multiplexing, and the Visual Domain Request when querying Domain Access.
The Enhanced Visual Scene Descriptors carry the perceptual semantics of the scene augmented with derived and semantic information, under CXT Directive control.
2 Reference Model
Figure 1 gives the Reference Model of the Visual Scene Enhancement (PGM‑VSE) AIM.

Figure 1 – Reference Model of the Visual Scene Enhancement (PGM‑VSE) AIM
3 I/O Data
Table 1 gives the Input and Output Data of the Visual Scene Enhancement (PGM‑VSE) AIM.
| Input | Description |
|---|---|
| Visual Object | Captured visual data of the scene. |
| Visual CXT Directive | Control directive specifying scope, depth, or policy constraints for visual enhancement. |
| Visual Domain Response | Domain‑specific knowledge supporting visual interpretation and semantic classification. |
| Output | Description |
| Enhanced Visual Scene Descriptors | Description of the visual scene, carrying perceptual semantics augmented with derived and semantic visual properties. |
| Visual CXT Status | Status information describing the execution and outcome of Visual Scene Enhancement processing. |
| Visual Domain Request | Query to Domain Access for domain‑specific knowledge. |
4 SubAIMs (informative)
This section is informative. The decomposition into SubAIMs described below illustrates one conformant architecture for producing the normative outputs of PGM‑VSE. Implementations may adopt alternative internal structures provided they satisfy the conformance requirements of Section 8.
4.1 Reference Model
An implementation of the Visual Scene Enhancement (PGM‑VSE) AIM may be based on the architecture of Figure 2.

Figure 2 – Reference Model of the Composite Visual Scene Enhancement (PGM‑VSE) AIM
4.2 Operation
The Visual Scene Enhancement operation derives the Visual Scene Descriptors from the captured Visual Object, computes relative depth and occlusion relationships among visual objects, identifies object types with optional domain knowledge, infers affordances and interaction potential, maps salience with respect to the interaction, and constructs the Enhanced Visual Scene Descriptors together with the execution status.
4.3 Functions of SubAIMs
Table 2 specifies the functions performed by the Visual Scene Enhancement (PGM‑VSE) AIM SubAIMs in the current example.
| SubAIM | Function |
|---|---|
| Visual Scene Description | Produces an initial Visual Scene Description from the Visual Object under the Visual CXT Directive. |
| Depth and Occlusion Estimation | Computes relative depth relationships and occlusion conditions among visual objects. |
| Visual Object Identification | Assigns semantic object‑type labels to visual objects using classification models and optional domain knowledge. |
| Affordance Inference | Infers affordance tags and interaction potential describing the possible interactions with objects and their feasibility, combining object properties, spatial constraints, and domain information. |
| Visual Salience Mapping | Determines the relative relevance of visual objects with respect to user interaction and context. |
| Visual Output Construction | Aggregates perceptual and enriched evidence into the Enhanced Visual Scene Descriptors and emits the execution status. |
4.4 I/O Data of SubAIMs
Table 3 specifies the Input and Output Data of the Visual Scene Enhancement (PGM‑VSE) AIM SubAIMs.
| SubAIM | Input | Output |
|---|---|---|
| Visual Scene Description | Visual Object Visual CXT Directive |
Visual Scene Descriptors Visual CXT Status |
| Depth and Occlusion Estimation | Visual Scene Descriptors | Relative Depths Occlusion Flags |
| Visual Object Identification | Visual Scene Descriptors Visual Domain Response |
Visual Object Type Type Confidence |
| Affordance Inference | Visual Scene Descriptors Relative Depths Occlusion Flags Visual Object Type Visual Domain Response |
Affordance Tags Interaction Potential |
| Visual Salience Mapping | Relative Depths Occlusion Flags Visual Object Type Affordance Tags Interaction Potential Visual CXT Directive Visual Domain Response |
Ranked Visual Objects Filtered Salient Visual Objects |
| Visual Output Construction | Visual Scene Descriptors Relative Depths Occlusion Flags Visual Object Type Affordance Tags Interaction Potential Salience Results |
Enhanced Visual Scene Descriptors Visual CXT Status |
4.5 AIMs and JSON Metadata
Table 4 provides the links to the AIM specifications and JSON schemas. AIM1 indicates the Composite AIM and AIM2 its SubAIMs.
| AIM1 | AIM2 | Name | JSON |
|---|---|---|---|
| PGM‑VSE | Visual Scene Enhancement | X | |
| PGM-VSD | Visual Scene Description | X | |
| PGM-DOE | Depth and Occlusion Estimation | X | |
| PGM-VOI | Visual Object Identification | X | |
| PGM-AFI | Affordance Inference | X | |
| PGM-VSM | Visual Salience Mapping | X | |
| PGM-VOC | Visual Output Construction | X |
5 JSON Metadata
https://schemas.mpai.community/PGM1/V1.0/AIMs/VisualSceneEnhancement.json
6 Profiles
No Profiles.
7 Reference Software
Not part of this specification.
8 Conformance Testing
Not part of this specification.
9 Performance Assessment
Not part of this specification.