PGM-AUA V1.0 AIMs - Audio Scene Enhancement

Function
Ref. Model
I/O Data
SubAIMs
JSON MData
Profiles
Ref. Software
Conformance
Performance

1 Functions

The Audio Scene Enhancement (PGM‑ASE) AIM produces the description of an audio scene from the captured Audio Object, deriving the perceptual and semantic audio properties relevant to spatial understanding, interaction, and user‑centric reasoning.

Audio Scene Enhancement

Operates on the Audio CXT Directive received from A‑User Control, the Audio Object, and the Audio Domain Response resulting from queries made to Domain Access,
Produces the Enhanced Audio Scene Descriptors and the Audio CXT Status sent to Audio-Visual-User Multiplexing, and the Audio Domain Request when querying Domain Access.

The Enhanced Audio Scene Descriptors carry the perceptual semantics of the scene augmented with derived and semantic information, under CXT Directive control.

2 Reference Model

Figure 1 gives the Reference Model of the Audio Scene Enhancement (PGM‑ASE) AIM.

Figure 1 – Reference Model of the Audio Scene Enhancement (PGM‑ASE) AIM

3 I/O Data

Table 1 gives the Input and Output Data of the Audio Scene Enhancement (PGM‑ASE) AIM.

Table 1 – Input and Output Data of the Audio Scene Enhancement (PGM‑ASE) AIM

Input	Description
Audio Object	Captured audio of the scene.
Audio CXT Directive	Control directive specifying scope, depth, or policy constraints for audio description.
Audio Domain Response	Domain‑specific knowledge supporting audio interpretation and semantic classification.
Output	Description
Enhanced Audio Scene Descriptors	Description of the audio scene, carrying perceptual semantics augmented with derived and semantic audio properties.
Audio CXT Status	Status information describing the execution and outcome of Audio Scene Enhancement processing.
Audio Domain Request	Query to Domain Access for domain‑specific knowledge.

4 SubAIMs (informative)

This section is informative. The decomposition into SubAIMs described below illustrates one conformant architecture for producing the normative outputs of PGM‑ASE. Implementations may adopt alternative internal structures provided they satisfy the conformance requirements of Section 8.

4.1 Reference Model

An implementation of the Audio Scene Enhancement (PGM‑ASE) AIM may be based on the architecture of Figure 2.

Figure 2 – Reference Model of the Audio Scene Enhancement (PGM‑ASE) Composite AIM

4.2 Operation

The Audio Scene Enhancement operation parses the captured Audio Object into audio objects and spatial attributes, analyses their motion, proximity, and acoustic environment, identifies object types with optional domain knowledge, maps salience with respect to the interaction, and constructs the Enhanced Audio Scene Descriptors together with the execution status.

4.3 Functions of SubAIMs

Table 2 specifies the functions performed by the Audio Scene Enhancement (PGM‑ASE) AIM SubAIMs in the current example.

Table 2 – Functions of the Audio Scene Enhancement (PGM‑ASE) SubAIMs

SubAIM	Function
Enhanced Audio Demultiplexing	Produces an initial Audio Scene Description based on CXT Directive.
Acoustic Environment Analysis	Characterises the acoustic conditions affecting the audio scene using signal‑derived measures.
Audio Object Identification	Assigns semantic object‑type labels to audio objects using classification models and optional domain knowledge.
Audio Salience Mapping	Determines the relative relevance of audio objects with respect to user interaction and context.
Enhanced Audio Multiplexing	Aggregates perceptual and enriched evidence into the Enhanced Audio Scene Descriptors and emits the execution status.

4.4 I/O Data of SubAIMs

Table 3 specifies the Input and Output Data of the Audio Scene Enhancement (PGM‑ASE) AIM SubAIMs.

Table 3 – Input and Output Data of the Audio Scene Enhancement (PGM‑ASE) SubAIMs

SubAIM	Input	Output
Enhanced Audio Demultiplexing	Audio CXT Directive Audio Object Audio Domain Response Audio IH Response	Audio CXT Status Audio Objects Audio Domain Request Audio IH Request
Acoustic Environment Analysis	Audio CXT Directive Audio Scene Descriptors Audio Domain Response Audio IH Response	Audio CXT Status Acoustic Profile Audio Domain Request Audio IH Request
Audio Object Identification	Audio CXT Directive Audio Scene Descriptors Audio Domain Response Audio IH Response	Audio CXT Status Audio Object Type Audio Domain Request Audio IH Request
Audio Salience Mapping	Audio Object Type Audio CXT Directive Audio Domain Response Audio IH Response	Audio CXT Status Ranked Audio Objects Audio Domain Request Audio IH Request
Enhanced Audio Multiplexing	Audio CXT Status Audio Objects Acoustic Profile Audio Object Type IDs Audio Domain Request Audio IH Request Ranked Audio Objects	Audio CXT Status Enhanced Audio Scene Descriptors Audio Domain Request Audio IH Request

4.5 AIMs and JSON Metadata

Table 4 provides the links to the AIM specifications and JSON schemas. AIM1 indicates the Composite AIM and AIM2 its SubAIMs.

Table 4 – AIMs and JSON Metadata of the Audio Scene Enhancement (PGM‑ASE)

AIM1	AIM2	Name	JSON
PGM‑ASE		Audio Scene Enhancement	X
	PGM-EAD	Enhanced Audio Demultiplexing	X
	PGM-AMP	Audio Motion and Proximity	X
	PGM-AEA	Acoustic Environment Analysis	X
	PGM-AOI	Audio Object Identification	X
	PGM-ASM	Audio Salience Mapping	X
	PGM-EAM	Enhanced Audio Multiplexing	X

5 JSON Metadata

https://schemas.mpai.community/PGM1/V1.0/AIMs/AudioSceneEnhancement.json

6 Profiles

No Profiles.

7 Reference Software

Not part of this specification.

8 Conformance Testing

Not part of this specification.

9 Performance Assessment

Not part of this specification.

Go to PGM-AUA V1.0 AI Modules

Cookie	Duration	Description
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Technical".
CookieLawInfoConsent	1 year	The cookie is set by the GDPR Cookie Consent plug-in and is used to store whether the user has consented to the use of cookies or not. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_pk_id.6.08a8	13 months	Used to store a few details about the user such as the unique visitor ID
_pk_ses.6.08a8	30 minutes	Short lived cookies used to temporarily store data for the visit

PGM-AUA V1.0 AIMs – Audio Scene Enhancement

1 Functions

2 Reference Model

3 I/O Data

4 SubAIMs (informative)

4.1 Reference Model

4.2 Operation

4.3 Functions of SubAIMs

4.4 I/O Data of SubAIMs

4.5 AIMs and JSON Metadata

5 JSON Metadata

6 Profiles

7 Reference Software

8 Conformance Testing

9 Performance Assessment

Notice