| Function | Reference Model | Input/Output Data |
| SubAIMs | JSON Metadata | Profiles |
| Reference Software | Conformance Testing | Performance Assessment |
1. Function
The Context Capture (PGM‑CXC) AIM is the A‑User’s active perceptual interface to the spatial environment. It collects, fuses, and structures multimodal contextual information – including audio, visual, spatial, and environmental signals – and supports runtime reorientation under explicit Human or AUC commands.
CXC provides Audio Scene Descriptors and Visual Scene Descriptors. They include object localisation, user gaze/gesture alignment, and spatial layout information required for Goal Acquisition.
CXC may be directed by the Human through natural commands (e.g., “look at that corner”, “zoom there”, “follow that object”), which AUC translates into perceptual redirection operations. These operations adjust CXC’s capture configuration prior to any semantic interpretation.
Specific functionalities
Multimodal Context Acquisition: The CXC AIM continuously captures audio, visual, spatial, and environmental signals from the surrounding environment.
Audio and Visual Scene Descriptor Generation: The CXC AIM generates Audio Scene Descriptors and Visual Scene Descriptors that describe object localisation, spatial layout, User gaze/gesture alignment, and other perceptual features of the environment.
Human‑Driven Perceptual Redirection: The CXC AIM supports perceptual redirection when instructed by the Human through AUC (e.g., reorienting viewpoint, changing focus, zooming, following a referenced object or region).
Runtime Capture Reconfiguration: The CXC AIM dynamically reconfigures its capture parameters (e.g., direction, focus, zoom, sampling region) in response to perceptual redirection commands issued by AUC.
Perceptual Grounding for Goal Acquisition: The CXC AIM provides perceptual descriptors that enable spatial grounding of Human expressions involving referenced objects or regions (e.g., “that corner”, “this object”, “over there”).
2. Reference Model
Figure 3 gives the Context Capture (PGM-CXC) Reference Model.

Figure 1 – The Reference Model of the Context Capture (PGM-CXC) AIM
3. Input/Output Data
Table 1 – Context Capture (PGM-CXC) AIM
| Input | Description |
| Text Object | User input expressed in structured text form, including written or transcribed utterances. |
| Audio Object | Captured audio signals from the scene, covering speech, environmental sounds, and paralinguistic cues. |
| 3D Model Object | Geometric and spatial data describing the environment, including structures, surfaces, and volumetric features. |
| Visual Object | Visual signals from the scene, encompassing gestures, facial expressions, and environmental imagery. |
| CCX Directive | Control instructions specifying modality prioritisation, acquisition parameters, or framing rules to guide the perceptual processing of an M‑Location. |
| Output | Description |
| Audio Scene Descriptors | Initial Audio Scene Descriptors (no Enhancement). |
| Visual Scene Descriptors | Initial Visual Scene Descriptors (no Enhancement). |
| CCX Status | Scene‑level metadata describing User presence, environmental conditions, and confidence measures for contextual framing. |
4. SubAIMs
No SubAIM.
https://schemas.mpai.community/PGM1/V1.0/AIMs/ContextCapture.json
6. Profiles
No Profiles.