(Tentative)
| Function | Reference Model | Input/Output Data |
| SubAIMs | JSON Metadata | Profiles |
Function
The Audio Spatial Reasoning AIM (PGM-ASR) receives Audio Scene Descriptors, including source geometry, acoustic metadata, and motion cues. Its primary function is to analyse the spatial, acoustic, and dynamic properties of the Audio Scene and produce structured outputs that support downstream reasoning, prompt generation, and domain-specific action planning.
Internally, PGM-ASR may perform the following operations:
- Descriptor Parsing: Decomposes incoming audio scene descriptors into structured components, identifying sound sources, their spatial attitudes, and associated metadata.
- Source Localisation: Computes the 3D positions and orientations (azimuth/elevation) of sound sources using parsed geometry and spatial cues.
- Motion & Proximity Estimation: Assesses dynamic properties of sources, including movement flags and proximity classification, to infer interaction relevance.
- Acoustic Environment Analysis: Evaluates ambient audio characteristics such as reverberation, echo, and background noise to contextualize source salience.
- Salience Mapping: Integrates motion, proximity, acoustic profile, and spatial orientation to rank sources by perceptual and interaction relevance.
- Output Construction: Synthesises a structured Audio Spatial Output for Domain Access and an Audio Spatial Guide for Prompt Creation, encapsulating salient source data, acoustic context, and spatial layout.
The resulting outputs enable Domain Access and Prompt Creation to operate with full awareness of the audio environment, supporting location-aware interaction, alignment with User perception, and context-sensitive coordination of AIMs.
Reference Model
Figure 4 gives the of Audio Spatial Reasoning (PGM-ASR) AIM Reference Model.

Figure 4 – The Reference Model of the Audio Spatial Reasoning (PGM-ASR) AIM
The functions performed by the PGM-ASR AIM can be classified as:
- Descriptor Parsing
– Ingest Audio Scene Descriptors
– Extract sound source geometry and attitude
- Source Localisation
– Estimate 3D position of each sound source
– Compute azimuth and elevation
- Motion & Proximity Estimation
– Classify sources as static or moving
– Determine proximity class (near/mid/far)
- Acoustic Environment Analysis
– Analyse reverberation, echo, noise levels
– Infer environmental geometry from acoustics
- Salience & Relevance Mapping
– Rank sources by intensity and user relevance
– Filter for guide inclusion
- Output Construction
– Build Audio Spatial Output (for DAC)
– Build Audio Spatial Guide (for PRC)
– Build Audio Spatial Directive (for DAC)
Input/Output Data
| Input | Description |
| Context | A structured and time-stamped snapshot representing the initial understanding that the A-User achieves of the environment and of the User posture. |
| Audio Spatial Directive | A dynamic modifier provided by the Domain Access AIM to help the interpretation of the Audio Scene by injecting directional constraints, source focus hints, salience maps, and refinement logic. |
| Audio Action Directive | Audio-related actions and process sequences from PGM-AUC. |
| Output | Description |
| Audio Spatial Output | A structured, analytical representation of the Audio scene with sound source geometry, 3D positions, azimuth/elevation, motion flags, proximity classification, and acoustic characteristics (e.g. reverb, echo, ambient noise). |
| Audio Spatial Guide | A filtered, User-centred subset of the Audio Scene. It highlights salient sources, normalised directional cues, proximity, and ambient summaries to enrich the Prompt Creation AIM prompts with audio anchors relevant to User focus. |
| Audio Action Status | Audio spatial feasibility, occlusion, and reachability flags to PGM-AUC. |
SubAIMs
Table 11 gives the functions – potentially implementable as SubAIMs – performed by the Audio Spatial Reasoning AIM
Table 11 – Functions performed by Audio Spatial Reasoning AIM
| #
|
Function | Inputs | Outputs | Sent to |
| 1 | Descriptor Parsing | Audio Scene Descriptors (source geometry, acoustic metadata, motion cues) | Parsed source list, source Spatial Attitudes | 2, 3, 4, 5, 6 |
| 2 | Source Localisation | Parsed Audio Scene Geometry, Object Spatial Attitude | 3D source Positions, Orientation (azimuth/ elevation) | 3, 5, 6 |
| 3 | Motion & Proximity Estimation | 3D positions, source layout | Motion flags, proximity classification | 5, 6 |
| 4 | Acoustic Environment Analysis | Source metadata, ambient descriptors | Acoustic characteristics (reverberation, echo, ambient noise) | 5, 6 |
| 5 | Salience Mapping | Motion flags, proximity classification, acoustic profile, azimuth/elevation data | Ranked source list, filtered salient sources | 6 |
| 6 | Output Construction | Ranked salient sources, full descriptors, motion, proximity, acoustic profile | Audio Spatial Output to DAC
Audio Spatial Guide to PRC |
Table 12 – Components of Audio Spatial Reasoning AIM’s outputs
| Output Name | Composing Functions | Included Data Elements | Destination / Purpose |
| Audio Spatial Output | Descriptor Parsing, Source Localisation, Motion & Proximity Estimation, Acoustic Environment Analysis. | Full list of sound sources, 3D positions, azimuth/elevation, motion flags, proximity class, acoustic characteristics (e.g. reverb, echo). | Sent to Domain Access for spatial audio query formulation and scene refinement. |
| Audio Spatial Guide | Source Localisation, Motion & Proximity Estimation, Acoustic Environment Analysis, Salience & Relevance Mapping. | Filtered salient sources, Orientation (azimuth/elevation), proximity classification, ambient audio characteristics, optional narrative cues. | Sent to Prompt Creation AIM to enrich PC-Prompt with user-relative auditory context. |
The following gives and analysis of the Audio Spatial Directive impact on operation and output of the Audio Spatial Reasoning AIM:
Functions using the input
- Source Localisation
- Salience & Relevance Mapping
Directive Elements Used
- Directional constraints (azimuth/elevation bounds)
- Source focus hints
- Salience map
- Refinement instructions
Internal Effects
- Adjusts azimuth/elevation normalisation logic in Source Localisation
- Overrides default salience ranking in Salience Mapping
- Filters or reorders source list based on DAC-specified focus
- Triggers reprocessing of descriptors if refinement is requested
Impact on Outputs
- Audio Spatial Output:
– Source positions may be recalculated
– Salience scores may be updated
– Source list may be reordered or filtered
- Audio Spatial Guide:
– Salient sources may shift
– Directional cues may be adjusted
– Narrative emphasis may change
JSON Metadata
https://schemas.mpai.community/PGM1/V1.0/AIMs/AudioSpatialReasoning.json
Profiles
No Profiles