PGM-AUA V1.0 AIMs Audio Spatial Reasoning (Tentative)

(Tentative)

Function	Reference Model	Input/Output Data
SubAIMs	JSON Metadata	Profiles

1. Function

The Audio Spatial Reasoning AIM (PGM‑ASR) AIM acts as a bridge between raw audio scene descriptors and higher-level reasoning modules by interpreting and refining spatial audio context to support reasoning and action execution.

AIM-ASR

Receives
1. Audio Action Directive (PGM‑AAD) from A-User Control.
2. Context from Context Capture that includes Entity State, Visual Scene Descriptors, and Audio Scene Descriptors (ASD₀).
Refines and aligns Audio Scene Descriptors through an iterative loop with Doman Access (PGM‑DAC) with the following steps:
1. ASR sends ASD₀ to DAC.
2. DAC sends ASD₁, an enhanced version of ASD₀ to ASR.
3. ASR sends ASD₂, an enhanced version of ASD₁to Prompt Creation (PGM-PRC)..

2. Reference Model

Figure 1 gives the of Audio Spatial Reasoning (PGM-ASR) AIM Reference Model.

Figure 1 – The Reference Model of the Audio Spatial Reasoning (PGM-ASR) AIM

3. Input/Output Data

Table 1 gives the Input/Output Data of the Audio Spatial Reasoning AIM.

Table 1 – Input/Output Data of the Audio Spatial Reasoning AIM

Input	Description
Context	A structured and time-stamped snapshot representing the initial understanding that the A-User achieves of the environment and of the User posture.
Audio Spatial Directive	A dynamic modifier provided by the Domain Access AIM to help the interpretation of the Audio Scene by injecting directional constraints, source focus hints, salience maps, and refinement logic.
Audio Action Directive	Audio-related actions and process sequences from PGM-AUC.
Output	Description
Audio Scene Descriptors	The input Audio Scene Descriptors enriched with the results of the reasoning made on it, such as motion flags, proximity classification, and acoustic characteristics (e.g. reverb, echo, ambient noise).
Audio Action Status	Audio spatial feasibility, occlusion, and reachability flags to PGM-AUC.

4. SubAIMs

Figure 2 gives the Reference Model of the Composite Audio Spatial Reasoning AIM including an initial set of SubAIMs.

Figure 2 -Reference Model of the Composite Audio Spatial Reasoning (PGM-ASR) AIM.

Table 2 specifies the Functions performed by PGM-ASP AIM’s SubAIMs in the current example.

Table 2 – Functions performed by PGM-ASP AIM’s SubAIMs (example)

SubAIM	Specification
Object Motion & Proximity	Purpose: Detects movement and proximity of audio objects. Tasks: – Track object trajectories over time. – Classify proximity zones (near, mid, far). – Extracts Spatial Attitude. Output: Motion and proximity metadata for each object.
Acoustic Profile Extraction	Purpose: Characterises objects’ Acoustic Profiles. Tasks: – Estimate reverberation (RT60), loudness, timbre, and frequency characteristics. – Identify environmental conditions (e.g., noisy, reverberant). Output: AcousticProfile for each object
Audio Object Identification	Purpose: Adds preliminary semantic meaning to audio objects. Tasks: – Classify objects into broad categories (speech, music, noise, alarm). – Attach confidence scores. Output: Instance Identifier.
Object Salience Ranking	Purpose: Prioritises audio objects based on relevance. Tasks: Rank objects using proximity, semantic importance, and A-User Control directives. Output: RankedAudioObjects list.
Audio Scene Description	Purpose: Aggregates all enriched data into ASD₁. Tasks: – Combine spatial, acoustic, semantic, and salience metadata. – Add PointOfView, EnrichmentTime, AIM ID. Output: ASD₁ → passed to DAC for domain-specific enrichment (to be received as ASD₂).

Table 3 gives the AIMs composing the Audio Spatial Reasoning (PGM-ASR) Composite AIM:

Table 3 – AIMs of the Audio Spatial Reasoning (PGM-ASR) Composite AIM

#	SubAIM	Input	Output	To
OMP	Object Motion & Proximity	Audio Scene Descriptors	Spatial Attitudes, Proximity Class	ASD
OMP	Object Motion & Proximity	Audio Scene Descriptors	Spatial Attitudes	APE
APE	Acoustic Profile Extraction	Audio Scene Descriptors	Audio Objects, Acoustic Profile	AOI
AOI	Audio Object Identification	Audio Objects, Acoustic Profiles, Motion Flags	Audio Object IDs	OSR
OSR	Object Salience Ranking	Audio Object IDs, Proximity Class	Ranked Audio Objects	ASD
ASD	Audio Scene Description	Spatial Attitudes, ProximityClass, AcousticProfile, Audio Objects, Audio Object IDs, RankedAudio Objects	ASD₁	DAC
VSD	Visual Scene Description	Object Audio Characteristics	VDS₁	DAC

JSON Metadata

https://schemas.mpai.community/PGM1/V1.0/AIMs/AudioSpatialReasoning.json

Profiles

No Profiles

Cookie	Duration	Description
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Technical".
CookieLawInfoConsent	1 year	The cookie is set by the GDPR Cookie Consent plug-in and is used to store whether the user has consented to the use of cookies or not. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_pk_id.6.08a8	13 months	Used to store a few details about the user such as the unique visitor ID
_pk_ses.6.08a8	30 minutes	Short lived cookies used to temporarily store data for the visit