(Tentative)

Function Reference Model Input/Output Data
SubAIMs JSON Metadata Profiles

1. Function

The Audio Spatial Reasoning AIM (PGM‑ASR) AIM acts as a bridge between raw audio scene descriptors and higher-level reasoning modules by interpreting and refining spatial audio context to support reasoning and action execution.

AIM-ASR

  1. Receives
    1. Audio Action Directive (PGM‑AAD) from A-User Control.
    2. Context from Context Capture that includes Entity State, Visual Scene Descriptors, and Audio Scene Descriptors (ASD0).
  2. Refines and aligns Audio Scene Descriptors through an iterative loop with Doman Access (PGM‑DAC) with the following steps:
    1. ASR sends ASD0 to DAC.
    2. DAC sends ASD1, an enhanced version of ASD0 to ASR.
    3. ASR sends ASD2, an enhanced version of ASD1 to Prompt Creation (PGM-PRC)..

2. Reference Model

Figure 1 gives the of Audio Spatial Reasoning (PGM-ASR) AIM Reference Model.

Figure 1 – The Reference Model of the Audio Spatial Reasoning (PGM-ASR) AIM

3. Input/Output Data

Table 1 gives the Input/Output Data of the Audio Spatial Reasoning AIM.

Table 1 – Input/Output Data of the Audio Spatial Reasoning AIM

Input Description
Context A structured and time-stamped snapshot representing the initial understanding that the A-User achieves of the environment and of the User posture.
Audio Spatial Directive A dynamic modifier provided by the Domain Access AIM to help the interpretation of the Audio Scene by injecting directional constraints, source focus hints, salience maps, and refinement logic.
Audio Action Directive Audio-related actions and process sequences from PGM-AUC.
Output Description
Audio Scene Descriptors The input Audio Scene Descriptors enriched with the results of the reasoning made on it, such as motion flags, proximity classification, and acoustic characteristics (e.g. reverb, echo, ambient noise).
Audio Action Status Audio spatial feasibility, occlusion, and reachability flags to PGM-AUC.

4. SubAIMs

Figure 2 gives the Reference Model of the Composite Audio Spatial Reasoning AIM including an initial set of SubAIMs.

 

Figure 2 -Reference Model of the Composite Audio Spatial Reasoning (PGM-ASR) AIM.

Table 2 specifies the Functions performed by PGM-ASP AIM’s SubAIMs in the current example.

Table 2 – Functions performed by PGM-ASP AIM’s SubAIMs (example)

SubAIM Specification
Object Motion & Proximity Purpose: Detects movement and proximity of audio objects.
Tasks:
– Track object trajectories over time.
– Classify proximity zones (near, mid, far).
– Extracts Spatial Attitude.
Output: Motion and proximity metadata for each object.
Acoustic Profile Extraction Purpose: Characterises objects’ Acoustic Profiles.
Tasks:
– Estimate reverberation (RT60), loudness, timbre, and frequency characteristics.
– Identify environmental conditions (e.g., noisy, reverberant).
Output: AcousticProfile for each object
Audio Object Identification Purpose: Adds preliminary semantic meaning to audio objects.
Tasks:
– Classify objects into broad categories (speech, music, noise, alarm).
– Attach confidence scores.
Output: Instance Identifier.
Object Salience Ranking Purpose: Prioritises audio objects based on relevance.
Tasks: Rank objects using proximity, semantic importance, and A-User  Control directives.
Output: RankedAudioObjects list.
Audio Scene Description Purpose: Aggregates all enriched data into ASD₁.
Tasks:
– Combine spatial, acoustic, semantic, and salience metadata.
– Add PointOfView, EnrichmentTime, AIM ID.
Output: ASD₁ → passed to DAC for domain-specific enrichment (to be received as ASD₂).

Table 3 gives the AIMs composing the Audio Spatial Reasoning (PGM-ASR) Composite  AIM:

Table 3 – AIMs of the Audio Spatial Reasoning (PGM-ASR) Composite  AIM

# SubAIM Input Output To
OMP Object Motion & Proximity Audio Scene Descriptors Spatial Attitudes, Proximity Class ASD
Spatial Attitudes APE
APE Acoustic Profile Extraction Audio Scene Descriptors Audio Objects, Acoustic Profile AOI
AOI Audio Object Identification Audio Objects, Acoustic Profiles, Motion Flags Audio Object IDs OSR
OSR Object Salience Ranking Audio Object IDs, Proximity Class Ranked Audio Objects ASD
ASD Audio Scene Description Spatial Attitudes, ProximityClass, AcousticProfile, Audio Objects,

Audio Object IDs, RankedAudio Objects

ASD₁ DAC
VSD Visual Scene Description Object Audio Characteristics VDS₁ DAC

JSON Metadata

https://schemas.mpai.community/PGM1/V1.0/AIMs/AudioSpatialReasoning.json

Profiles

No Profiles