| Function | Ref. Model | I/O Data | SubAIMs | JSON MData | Profiles | Ref. Software | Conformance | Performance |
1. Function
The Context Enhancement (PGM-CXE) AIM performs interpretative enrichment and cross‑modal analysis of a captured media scene in order to produce:
- enhanced descriptions of the audio and visual scene, and
- an interpreted description of the User Entity State.
Context Enhancement operates on time‑synchronised perceptual descriptors produced by Context Capture and applies modality‑specific analysis, cross‑modal alignment, and optional domain knowledge to derive evidence and state descriptions suitable for downstream reasoning and control.
2. Reference Model
The Context Enhancement (PGM-CXE) reference model is organised as a multi‑stage interpretative pipeline operating on captured audio-visual descriptors as depicted in Figure 1.

Figure 1 – Reference Model of Context Enhancement (PGM-CXE)
3 I/O Data
Table 1 specifies the Input and Output Data of the Context Enhancement (PGM-CXE) AIM.
| Input | Description |
| Audio Scene Descriptors (ASD0) | Perceptual description of the audio scene produced by Context Capture. |
| Visual Scene Descriptors (VSD0) | Perceptual description of the visual scene produced by Context Capture. |
| CXE Directive | Control directives specifying scope, depth, or policy constraints for CXE processing concerning Audio, Visual, and User. |
| Domain Response | Domain‑specific knowledge obtained through Domain Access. |
| Output | Description |
| Enhanced Context | Aggregated result combining enhanced Audio and Visual Scene Descriptors and User Entity State. |
| Domain Request | Domain‑specific knowledge obtained through Domain Access. |
| CXE Status | Status information describing the execution and outcome of CXE processing. |
4. SubAIMs (informative)
4.1 Reference Model
Figure 2 depicts the Reference Model of the Context Enhancement (PGM-CXE) Composite AIM.

Figure 2 – Reference Model of the Health Front End (AIH-HFE) AIM
4.2 Operation
The CXE operation includes the following SuAIMs.:
- Modal (Audio and Visual) Scene Enhancement
- Independent enhancement of Audio Scene Descriptors and Visual Scene Descriptors.
- Extraction of derived properties such as salience, interaction potential, object type, depth, occlusion, and acoustic or visual profiles.
- Interaction with Domain Access for domain‑specific enrichment.
- Audio–Visual Alignment
- Cross‑modal association between Audio Objects and Visual Objects referring to the same source or entity.
- Production of Audio‑Visual Scene Geometry expressing correspondence and spatial relations.
- User State Description
- Interpretation of enhanced descriptors and alignment evidence with respect to the User or other entities.
- Derivation of User‑centric evidence and state descriptions under the control of directives (User Entity State).
- Aggregation
- Consolidation of enhanced scene descriptors and User‑related outputs into a coherent Enhanced Context.
- Generation of status information describing the outcome of SUD processing.
The reference model explicitly separates capture, modal enhancement, cross‑modal alignment, and user/entity interpretation, ensuring modularity, traceability, and reuse.
4.3 Functions of AI Modules
Table 2 specifies the Function of the AI Modules.
Table 2 – Functions of Health Back End AI Modules
| SubAIM | Function |
|---|---|
| Audio Scene Enhancement | Enhances the Description of Audio Scene. |
| Visual Scene Enhancement | Enhances the Description of Visual Scene. |
| Audio–Visual Alignment | Aligns the objects in the Audio and Visual Scenes. |
| User State Description | Extracts User’s Entity State Desriptors. |
| Audio- Visual-User Multiplexing | Multiplexes all data produced for transfer to PC Prompt Creation. |
4.4 I/O Data of AI Modules
Table 3 give the Input and Output Data of Context Enhancement (PGM-CXE) SubAIMs.
Table 2 – Input and Output Data of Context Enhancement (PGM-CXE) SubAIMs
| SubAIM | Input Data | Output Data |
|---|---|---|
| Audio Scene Enhancement | Audio Scene Descriptors Audio CXE Directive Audio Domain Response |
Audio Domain Request Audio CXE Status Enhanced Audio Scene Descriptors |
| Visual Scene Enhancement | Visual Scene Descriptors Visual CXE Directive Visual Domain Response |
Visual Domain Request Visual CXE Status Enhanced Visual Scene Descriptors |
| Audio–Visual Alignment | Enhanced Audio Scene Descriptors Enhanced Visual Scene Descriptors User Domain Response |
User Domain Response Audio‑Visual Scene Geometry |
| User State Description | Enhanced Audio Scene Descriptors User CXE Directive Enhanced Audio Scene Descriptors |
User Entity Evidence User Entity State User CXE Status |
| Audi- Visual-User Multiplexing | Enhanced Audio Scene Descriptors Enhanced Visual Scene Descriptors User Entity State |
Audio CXE Status Enhanced Audio Scene Descriptors User Entity State User CXE Status Visual CXE Status Enhanced Visual Scene Descriptors |
4.5 AIMs and JSON Metadata
Table 4 provides the links to the AIW and AIM specifications and to the JSON syntaxes. AIMs/1 indicates that the column contains Composite AIMs and AIMs indicates that the column contains their Basic AIMs.
Table 4 – AIMs and JSON Metadata
| AIM1 | AIM2 | Name | JSON |
| PGM-CXE | Context Enhancement | X | |
| PGM-ASD | Audio Scene Description | X | |
| OSD-AVA | Audio-Visual Analysis | X | |
| PGM-VSD | Visual Scene Description | X | |
| PGM-USD | User State Description | X | |
| PGM-MUX | User-Visual-User Multiplexing | X |
5. JSON Metadata
https://schemas.mpai.community/PGM1/V1.0/AIMs/ContextEnhancement.json
6. Profiles
7. Reference Software
8. Conformance Testing
9. Performance Assessment