Go to PGM-AUA V1.0 AI Modules

Function Ref. Model I/O Data SubAIMs JSON MData Profiles Ref. Software Conformance Performance

1 Functions

The Context Capture (PGM‑CXC) AIM is the A‑User’s active perceptual interface to the spatial environment. It collects, fuses, and structures multimodal contextual information – including audio and visual, spatial – and supports runtime reorientation under explicit Human or A-User Control commands.

Context Capture provides initial Audio Scene Descriptors and Visual Scene Descriptors. They include object localisation, user gaze/gesture alignment, and spatial layout information.

Context Capture may be directed by the Human through natural commands (e.g., “look at that corner”, “zoom there”, “follow that object”), which A-User Control  sends to appropriate AIMs to translate into perceptual redirection operations. These operations adjust Context Capture’s capture configuration prior to any semantic interpretation.

Specific functionalities

Multimodal Context Acquisition: The Context Capture AIM continuously captures audio, visual, and spatial from the surrounding environment in a given Time Expression.

Audio and Visual Scene Descriptor Generation: The Context Capture AIM generates Audio Scene Descriptors and Visual Scene Descriptors that describe object localisation, spatial layout, User gaze/gesture alignment, and other perceptual features of the environment.

Human‑Driven Perceptual Redirection: The Context Capture AIM supports perceptual redirection when instructed by the Human through A-User Control (e.g., reorienting viewpoint, changing focus, zooming, following a referenced object or region).

Runtime Capture Reconfiguration: The A-User Control AIM dynamically reconfigures its capture parameters (e.g., direction, focus, zoom, sampling region) in response to perceptual redirection commands issued by A-User Control.

Perceptual Grounding for Goal Acquisition: The A-User Control AIM provides perceptual descriptors that enable spatial grounding of Human expressions involving referenced objects or regions (e.g., “that corner”, “this object”, “over there”).

2. Reference Model

Figure 3 gives the Context Capture (PGM-CXC) Reference Model.

Figure 1 – The Reference Model of the Context Capture (PGM-CXC) AIM

3. Input/Output Data

Table 1 – Context Capture (PGM-CXC) AIM

Input Description
Text Object User input expressed in structured text form, including written or transcribed utterances.
Audio Object Captured audio signals from the scene, covering speech, environmental sounds, and paralinguistic cues.
3D Model Object Geometric and spatial data describing the environment, including structures, surfaces, and volumetric features.
Visual Object Visual signals from the scene, encompassing gestures, facial expressions, and environmental imagery.
CCX Directive Control instructions specifying modality prioritisation, acquisition parameters, or framing rules to guide the perceptual processing of an M‑Location.
Output Description
Audio Scene Descriptors Initial Audio Scene Descriptors (no Enhancement).
Visual Scene Descriptors Initial Visual Scene Descriptors (no Enhancement).
CCX Status Scene‑level metadata describing User presence, environmental conditions, and confidence measures for contextual framing.

4. SubAIMs

4.1 Reference Model

Figure 2 depicts the Reference Architecture of the Context Capture (PGM-CXC) AIM.

Figure 2 – Reference Architecture of the Context Capture (PGM-CXC) AIM

4.2 Operation

The Context Capture (PGM-CXC) AIM is activated by an A-User Control’s CCX Directive and  reports on the execution by means of a CCX Status. The four input types are either passed through (Text) or processed by AunAIMs (3D Model, Audio, and Visual).

4.3 Functions of AI Modules

These are the functions specified by 3D Model, Audio, and Visual Scene Descriptors.

4.4 I/O Data of AI Modules

Table 3 specifies the Input and Output Data of the AI Modules.

Table 3 – Functions of Input and Output Data of the AI Modules

AI Module Receives Produces
Text Object Text Object
3D Model Scene Description 3D Model Object 3D Model Scene Descriptors
CCX Directive CCX Status
Audio Scene Description Audio  Object Audio Scene Descriptors
CCX Directive CCX Status
Visual Scene Description Visual Object Visual Scene Descriptors
CCX Directive CCX Status

4.5 AIMs and JSON Metadata

Table 4 provides the links to the AIW and AIM specifications and to the JSON schemas. AIMs/1 indicates that the column contains Composite AIMs and AIM2 indicates that the column contains their SubAIMs.

Table 4 – AIMs and JSON Metadata

AIW AIMs Name JSON
PGM-CCX Context Capture X
OSD-3SD 3D Model Scene Description X
OSD-ASD Audio Scene Description X
OSD-VSD Visual Scene Description X

5. JSON Metadata

https://schemas.mpai.community/PGM1/V1.0/AIMs/ContextCapture.json

6. Profiles

No Profiles.

7. Reference Software

8. Conformance Testing

9. Performance Assessment