Go to PGM-AUA V1.0 AI Modules

Function
Ref. Model
I/O Data
SubAIMs
JSON MData
Profiles
Ref. Software
Conformance
Performance

1 Functions

The Audio Scene Enhancement (PGM‑ASE) AIM produces the description of an audio scene from the captured Audio Object, deriving the perceptual and semantic audio properties relevant to spatial understanding, interaction, and user‑centric reasoning.

Audio Scene Enhancement

  1. Operates on the Audio CXT Directive received from A‑User Control, the Audio Object, and the Audio Domain Response resulting from queries made to Domain Access,
  2. Produces the Enhanced Audio Scene Descriptors and the Audio CXT Status sent to Audio-Visual-User Multiplexing, and the Audio Domain Request when querying Domain Access.

The Enhanced Audio Scene Descriptors carry the perceptual semantics of the scene augmented with derived and semantic information, under CXT Directive control.

2 Reference Model

Figure 1 gives the Reference Model of the Audio Scene Enhancement (PGM‑ASE) AIM.

Figure 1 – Reference Model of the Audio Scene Enhancement (PGM‑ASE) AIM

3 I/O Data

Table 1 gives the Input and Output Data of the Audio Scene Enhancement (PGM‑ASE) AIM.

Table 1 – Input and Output Data of the Audio Scene Enhancement (PGM‑ASE) AIM

Input Description
Audio Object Captured audio of the scene.
Audio CXT Directive Control directive specifying scope, depth, or policy constraints for audio description.
Audio Domain Response Domain‑specific knowledge supporting audio interpretation and semantic classification.
Output Description
Enhanced Audio Scene Descriptors Description of the audio scene, carrying perceptual semantics augmented with derived and semantic audio properties.
Audio CXT Status Status information describing the execution and outcome of Audio Scene Enhancement processing.
Audio Domain Request Query to Domain Access for domain‑specific knowledge.

4 SubAIMs (informative)

This section is informative. The decomposition into SubAIMs described below illustrates one conformant architecture for producing the normative outputs of PGM‑ASE. Implementations may adopt alternative internal structures provided they satisfy the conformance requirements of Section 8.

4.1 Reference Model

An implementation of the Audio Scene Enhancement (PGM‑ASE) AIM may be based on the architecture of Figure 2.

Figure 2 – Reference Model of the Audio Scene Enhancement (PGM‑ASE) Composite  AIM

4.2 Operation

The Audio Scene Enhancement operation parses the captured Audio Object into audio objects and spatial attributes, analyses their motion, proximity, and acoustic environment, identifies object types with optional domain knowledge, maps salience with respect to the interaction, and constructs the Enhanced Audio Scene Descriptors together with the execution status.

4.3 Functions of SubAIMs

Table 2 specifies the functions performed by the Audio Scene Enhancement (PGM‑ASE) AIM SubAIMs in the current example.

Table 2 – Functions of the Audio Scene Enhancement (PGM‑ASE) SubAIMs

SubAIM Function
Audio Scene Description Produces an initial Audio Scene Description based on CXT Directive.
Audio Motion and Proximity Detects the temporal and spatial dynamics of audio objects by tracking their evolution in space and time.
Acoustic Environment Analysis Characterises the acoustic conditions affecting the audio scene using signal‑derived measures.
Audio Object Identification Assigns semantic object‑type labels to audio objects using classification models and optional domain knowledge.
Audio Salience Mapping Determines the relative relevance of audio objects with respect to user interaction and context.
Enhanced Audio Multiplexing Aggregates perceptual and enriched evidence into the Enhanced Audio Scene Descriptors and emits the execution status.

4.4 I/O Data of SubAIMs

Table 3 specifies the Input and Output Data of the Audio Scene Enhancement (PGM‑ASE) AIM SubAIMs.

Table 3 – Input and Output Data of the Audio Scene Enhancement (PGM‑ASE) SubAIMs

SubAIM Input Output
Audio Scene Description Audio Object
Audio CXT Directive
Audio Scene Descriptors
Audio CXT Status
Audio Motion and Proximity Audio Scene Descriptors Motion Flags
Proximity Class
Acoustic Environment Analysis Audio Scene Descriptors Acoustic Profile
Audio Object Identification Audio Scene Descriptors
Audio Domain Response
Audio Object Type
Audio Salience Mapping Motion Flags
Proximity Class
Acoustic Profile
Audio Object Type
Audio CXT Directive
Audio Domain Response
Ranked Audio Objects
Filtered Salient Audio Objects
Enhanced Audio Multiplexing Audio Objects
Motion Flags
Proximity Class
Acoustic Profile
Audio Object Type
Salience Results
Enhanced Audio Scene Descriptors
Audio CXT Status

4.5 AIMs and JSON Metadata

Table 4 provides the links to the AIM specifications and JSON schemas. AIM1 indicates the Composite AIM and AIM2 its SubAIMs.

Table 4 – AIMs and JSON Metadata of the Audio Scene Enhancement (PGM‑ASE)

AIM1 AIM2 Name JSON
PGM‑ASE Audio Scene Enhancement X
PGM-ASD Audio Scene Description X
PGM-AMP Audio Motion and Proximity X
PGM-AEA Acoustic Environment Analysis X
PGM-AOI Audio Object Identification X
PGM-ASM Audio Salience Mapping X
PGM-AOC Enhanced Audio Multiplexing X

5 JSON Metadata

https://schemas.mpai.community/PGM1/V1.0/AIMs/AudioSceneEnhancement.json

6 Profiles

No Profiles.

7 Reference Software

Not part of this specification.

8 Conformance Testing

Not part of this specification.

9 Performance Assessment

Not part of this specification.

Go to PGM-AUA V1.0 AI Modules