(Tentative)
| Function | Reference Model | Input/Output Data |
| SubAIMs | JSON Metadata | Profiles |
1. Function
The Audio Spatial Reasoning AIM (PGM‑ASR) AIM acts as a bridge between raw audio scene descriptors and higher-level reasoning modules by interpreting and refining spatial audio context to support reasoning and action execution.
AIM-ASR
- Receives
- Audio Action Directive (PGM‑AAD) from A-User Control.
- Context from Context Capture that includes Entity State, Visual Scene Descriptors, and Audio Scene Descriptors (ASD0).
- Refines and aligns Audio Scene Descriptors through an iterative loop with Doman Access (PGM‑DAC) with the following steps:
- ASR sends ASD0 to DAC.
- DAC sends ASD1, an enhanced version of ASD0 to ASR.
- ASR sends ASD2, an enhanced version of ASD1 to Prompt Creation (PGM-PRC)..
2. Reference Model
Figure 1 gives the of Audio Spatial Reasoning (PGM-ASR) AIM Reference Model.

Figure 1 – The Reference Model of the Audio Spatial Reasoning (PGM-ASR) AIM
3. Input/Output Data
Table 1 gives the Input/Output Data of the Audio Spatial Reasoning AIM.
Table 1 – Input/Output Data of the Audio Spatial Reasoning AIM
| Input | Description |
| Context | A structured and time-stamped snapshot representing the initial understanding that the A-User achieves of the environment and of the User posture. |
| Audio Spatial Directive | A dynamic modifier provided by the Domain Access AIM to help the interpretation of the Audio Scene by injecting directional constraints, source focus hints, salience maps, and refinement logic. |
| Audio Action Directive | Audio-related actions and process sequences from PGM-AUC. |
| Output | Description |
| Audio Scene Descriptors | The input Audio Scene Descriptors enriched with the results of the reasoning made on it, such as motion flags, proximity classification, and acoustic characteristics (e.g. reverb, echo, ambient noise). |
| Audio Action Status | Audio spatial feasibility, occlusion, and reachability flags to PGM-AUC. |
4. SubAIMs
Figure 2 gives the Reference Model of the Composite Audio Spatial Reasoning AIM including an initial set of SubAIMs.

Figure 2 -Reference Model of the Composite Audio Spatial Reasoning (PGM-ASR) AIM.
Table 2 specifies the Functions performed by PGM-ASP AIM’s SubAIMs in the current example.
Table 2 – Functions performed by PGM-ASP AIM’s SubAIMs (example)
| SubAIM | Specification |
| Object Motion & Proximity | Purpose: Detects movement and proximity of audio objects. Tasks: – Track object trajectories over time. – Classify proximity zones (near, mid, far). – Extracts Spatial Attitude. Output: Motion and proximity metadata for each object. |
| Acoustic Profile Extraction | Purpose: Characterises objects’ Acoustic Profiles. Tasks: – Estimate reverberation (RT60), loudness, timbre, and frequency characteristics. – Identify environmental conditions (e.g., noisy, reverberant). Output: AcousticProfile for each object |
| Audio Object Identification | Purpose: Adds preliminary semantic meaning to audio objects. Tasks: – Classify objects into broad categories (speech, music, noise, alarm). – Attach confidence scores. Output: Instance Identifier. |
| Object Salience Ranking | Purpose: Prioritises audio objects based on relevance. Tasks: Rank objects using proximity, semantic importance, and A-User Control directives. Output: RankedAudioObjects list. |
| Audio Scene Description | Purpose: Aggregates all enriched data into ASD₁. Tasks: – Combine spatial, acoustic, semantic, and salience metadata. – Add PointOfView, EnrichmentTime, AIM ID. Output: ASD₁ → passed to DAC for domain-specific enrichment (to be received as ASD₂). |
Table 3 gives the AIMs composing the Audio Spatial Reasoning (PGM-ASR) Composite AIM:
Table 3 – AIMs of the Audio Spatial Reasoning (PGM-ASR) Composite AIM
| # | SubAIM | Input | Output | To |
| OMP | Object Motion & Proximity | Audio Scene Descriptors | Spatial Attitudes, Proximity Class | ASD |
| Spatial Attitudes | APE | |||
| APE | Acoustic Profile Extraction | Audio Scene Descriptors | Audio Objects, Acoustic Profile | AOI |
| AOI | Audio Object Identification | Audio Objects, Acoustic Profiles, Motion Flags | Audio Object IDs | OSR |
| OSR | Object Salience Ranking | Audio Object IDs, Proximity Class | Ranked Audio Objects | ASD |
| ASD | Audio Scene Description | Spatial Attitudes, ProximityClass, AcousticProfile, Audio Objects,
Audio Object IDs, RankedAudio Objects |
ASD₁ | DAC |
| VSD | Visual Scene Description | Object Audio Characteristics | VDS₁ | DAC |
JSON Metadata
https://schemas.mpai.community/PGM1/V1.0/AIMs/AudioSpatialReasoning.json
Profiles
No Profiles