Go to PGM-AUA V1.0 AI Modules

Function
Ref. Model
I/O Data
SubAIMs
JSON MData
Profiles
Ref. Software
Conformance
Performance

1 Functions

The Visual Scene Enhancement (PGM‑VSE) AIM produces the description of a visual scene from the captured Visual Object, deriving the perceptual and semantic visual properties relevant to spatial understanding, interaction, and A‑User‑centric reasoning.

Visual Scene Enhancement

  1. Operates on the Visual CXT Directive received from A‑User Control, the Visual Object, and the Visual Domain Response resulting from queries made to Domain Access,
  2. Produces the Enhanced Visual Scene Descriptors and the Visual CXT Status sent to Audio-Visual-User Multiplexing, and the Visual Domain Request when querying Domain Access.

The Enhanced Visual Scene Descriptors carry the perceptual semantics of the scene augmented with derived and semantic information, under CXT Directive control.

2 Reference Model

Figure 1 gives the Reference Model of the Visual Scene Enhancement (PGM‑VSE) AIM.

Figure 1 – Reference Model of the Visual Scene Enhancement (PGM‑VSE) AIM

3 I/O Data

Table 1 gives the Input and Output Data of the Visual Scene Enhancement (PGM‑VSE) AIM.

Table 1 – Input and Output Data of the Visual Scene Enhancement (PGM‑VSE) AIM

Input Description
Visual Object Captured visual data of the scene.
Visual CXT Directive Control directive specifying scope, depth, or policy constraints for visual enhancement.
Visual Domain Response Domain‑specific knowledge supporting visual interpretation and semantic classification.
Output Description
Enhanced Visual Scene Descriptors Description of the visual scene, carrying perceptual semantics augmented with derived and semantic visual properties.
Visual CXT Status Status information describing the execution and outcome of Visual Scene Enhancement processing.
Visual Domain Request Query to Domain Access for domain‑specific knowledge.

4 SubAIMs (informative)

This section is informative. The decomposition into SubAIMs described below illustrates one conformant architecture for producing the normative outputs of PGM‑VSE. Implementations may adopt alternative internal structures provided they satisfy the conformance requirements of Section 8.

4.1 Reference Model

An implementation of the Visual Scene Enhancement (PGM‑VSE) AIM may be based on the architecture of Figure 2.

Figure 2 – Reference Model of the Composite Visual Scene Enhancement (PGM‑VSE) AIM

4.2 Operation

The Visual Scene Enhancement operation derives the Visual Scene Descriptors from the captured Visual Object, computes relative depth and occlusion relationships among visual objects, identifies object types with optional domain knowledge, infers affordances and interaction potential, maps salience with respect to the interaction, and constructs the Enhanced Visual Scene Descriptors together with the execution status.

4.3 Functions of SubAIMs

Table 2 specifies the functions performed by the Visual Scene Enhancement (PGM‑VSE) AIM SubAIMs in the current example.

Table 2 – Functions of the Visual Scene Enhancement (PGM‑VSE) SubAIMs

SubAIM Function
Visual Scene Description Produces an initial Visual Scene Description from the Visual Object under the Visual CXT Directive.
Depth and Occlusion Estimation Computes relative depth relationships and occlusion conditions among visual objects.
Visual Object Identification Assigns semantic object‑type labels to visual objects using classification models and optional domain knowledge.
Affordance Inference Infers affordance tags and interaction potential describing the possible interactions with objects and their feasibility, combining object properties, spatial constraints, and domain information.
Visual Salience Mapping Determines the relative relevance of visual objects with respect to user interaction and context.
Visual Output Construction Aggregates perceptual and enriched evidence into the Enhanced Visual Scene Descriptors and emits the execution status.

4.4 I/O Data of SubAIMs

Table 3 specifies the Input and Output Data of the Visual Scene Enhancement (PGM‑VSE) AIM SubAIMs.

Table 3 – Input and Output Data of the Visual Scene Enhancement (PGM‑VSE) SubAIMs

SubAIM Input Output
Visual Scene Description Visual Object
Visual CXT Directive
Visual Scene Descriptors
Visual CXT Status
Depth and Occlusion Estimation Visual Scene Descriptors Relative Depths
Occlusion Flags
Visual Object Identification Visual Scene Descriptors
Visual Domain Response
Visual Object Type
Type Confidence
Affordance Inference Visual Scene Descriptors
Relative Depths
Occlusion Flags
Visual Object Type
Visual Domain Response
Affordance Tags
Interaction Potential
Visual Salience Mapping Relative Depths
Occlusion Flags
Visual Object Type
Affordance Tags
Interaction Potential
Visual CXT Directive
Visual Domain Response
Ranked Visual Objects
Filtered Salient Visual Objects
Visual Output Construction Visual Scene Descriptors
Relative Depths
Occlusion Flags
Visual Object Type
Affordance Tags
Interaction Potential
Salience Results
Enhanced Visual Scene Descriptors
Visual CXT Status

4.5 AIMs and JSON Metadata

Table 4 provides the links to the AIM specifications and JSON schemas. AIM1 indicates the Composite AIM and AIM2 its SubAIMs.

Table 4 – AIMs and JSON Metadata of the Visual Scene Enhancement (PGM‑VSE)

AIM1 AIM2 Name JSON
PGM‑VSE Visual Scene Enhancement X
PGM-VSD Visual Scene Description X
PGM-DOE Depth and Occlusion Estimation X
PGM-VOI Visual Object Identification X
PGM-AFI Affordance Inference X
PGM-VSM Visual Salience Mapping X
PGM-VOC Visual Output Construction X

5 JSON Metadata

https://schemas.mpai.community/PGM1/V1.0/AIMs/VisualSceneEnhancement.json

6 Profiles

No Profiles.

7 Reference Software

Not part of this specification.

8 Conformance Testing

Not part of this specification.

9 Performance Assessment

Not part of this specification.

Go to PGM-AUA V1.0 AI Modules