Function
Ref. Model
I/O Data
SubAIMs
JSON MData
Profiles
Ref. Software
Conformance
Performance
1 Functions
The Audio Scene Enhancement (PGM‑ASE) AIM produces the description of an audio scene from the captured Audio Object, deriving the perceptual and semantic audio properties relevant to spatial understanding, interaction, and user‑centric reasoning.
Audio Scene Enhancement
- Operates on the Audio CXT Directive received from A‑User Control, the Audio Object, and the Audio Domain Response resulting from queries made to Domain Access,
- Produces the Enhanced Audio Scene Descriptors and the Audio CXT Status sent to Audio-Visual-User Multiplexing, and the Audio Domain Request when querying Domain Access.
The Enhanced Audio Scene Descriptors carry the perceptual semantics of the scene augmented with derived and semantic information, under CXT Directive control.
2 Reference Model
Figure 1 gives the Reference Model of the Audio Scene Enhancement (PGM‑ASE) AIM.

Figure 1 – Reference Model of the Audio Scene Enhancement (PGM‑ASE) AIM
3 I/O Data
Table 1 gives the Input and Output Data of the Audio Scene Enhancement (PGM‑ASE) AIM.
| Input | Description |
|---|---|
| Audio Object | Captured audio of the scene. |
| Audio CXT Directive | Control directive specifying scope, depth, or policy constraints for audio description. |
| Audio Domain Response | Domain‑specific knowledge supporting audio interpretation and semantic classification. |
| Output | Description |
| Enhanced Audio Scene Descriptors | Description of the audio scene, carrying perceptual semantics augmented with derived and semantic audio properties. |
| Audio CXT Status | Status information describing the execution and outcome of Audio Scene Enhancement processing. |
| Audio Domain Request | Query to Domain Access for domain‑specific knowledge. |
4 SubAIMs (informative)
This section is informative. The decomposition into SubAIMs described below illustrates one conformant architecture for producing the normative outputs of PGM‑ASE. Implementations may adopt alternative internal structures provided they satisfy the conformance requirements of Section 8.
4.1 Reference Model
An implementation of the Audio Scene Enhancement (PGM‑ASE) AIM may be based on the architecture of Figure 2.

Figure 2 – Reference Model of the Audio Scene Enhancement (PGM‑ASE) Composite AIM
4.2 Operation
The Audio Scene Enhancement operation parses the captured Audio Object into audio objects and spatial attributes, analyses their motion, proximity, and acoustic environment, identifies object types with optional domain knowledge, maps salience with respect to the interaction, and constructs the Enhanced Audio Scene Descriptors together with the execution status.
4.3 Functions of SubAIMs
Table 2 specifies the functions performed by the Audio Scene Enhancement (PGM‑ASE) AIM SubAIMs in the current example.
| SubAIM | Function |
|---|---|
| Audio Scene Description | Produces an initial Audio Scene Description based on CXT Directive. |
| Audio Motion and Proximity | Detects the temporal and spatial dynamics of audio objects by tracking their evolution in space and time. |
| Acoustic Environment Analysis | Characterises the acoustic conditions affecting the audio scene using signal‑derived measures. |
| Audio Object Identification | Assigns semantic object‑type labels to audio objects using classification models and optional domain knowledge. |
| Audio Salience Mapping | Determines the relative relevance of audio objects with respect to user interaction and context. |
| Enhanced Audio Multiplexing | Aggregates perceptual and enriched evidence into the Enhanced Audio Scene Descriptors and emits the execution status. |
4.4 I/O Data of SubAIMs
Table 3 specifies the Input and Output Data of the Audio Scene Enhancement (PGM‑ASE) AIM SubAIMs.
4.5 AIMs and JSON Metadata
Table 4 provides the links to the AIM specifications and JSON schemas. AIM1 indicates the Composite AIM and AIM2 its SubAIMs.
| AIM1 | AIM2 | Name | JSON |
|---|---|---|---|
| PGM‑ASE | Audio Scene Enhancement | X | |
| PGM-ASD | Audio Scene Description | X | |
| PGM-AMP | Audio Motion and Proximity | X | |
| PGM-AEA | Acoustic Environment Analysis | X | |
| PGM-AOI | Audio Object Identification | X | |
| PGM-ASM | Audio Salience Mapping | X | |
| PGM-AOC | Enhanced Audio Multiplexing | X |
5 JSON Metadata
https://schemas.mpai.community/PGM1/V1.0/AIMs/AudioSceneEnhancement.json
6 Profiles
No Profiles.
7 Reference Software
Not part of this specification.
8 Conformance Testing
Not part of this specification.
9 Performance Assessment
Not part of this specification.