(Tentative)

Definition Functional Requirements Syntax Semantics

Definition

Audio Spatial Guide (PGM-ASG)

  1. Is produced by the Audio Spatial Reasoning (ASR) AIM
  2. Represents a User-centric view of the spatial audio context derived from scene interpretation
  3. Enriches the User’s spoken or written input with spatial cues – such as sound source relevance, directionality, and proximity – prior to prompt generation

Functional Requirements

Audio Spatial Guide conveys the following main information elements:

Function Description
Salient Source Selection Prioritises audio sources relevant to user intent or focus
Directional Cue Mapping Translates azimuth and elevation into user-relative descriptors
Proximity Framing Classifies sources as near, mid, or far for contextual emphasis
Semantic Labeling Attaches meaningful labels (e.g., “alarm”, “voice”, “music”)
Acoustic Environment Summary Provides high-level descriptors of ambient audio context
Viewpoint Normalisation Adjusts spatial descriptors to match user orientation
Output Generation Produces Audio Adaptive Context Guide for Prompt Creation AIM

Syntax

https://schemas.mpai.community/PGM1/V1.0/data/AudioSpatialGuide.json

Semantics

Label Description
Header Schema header with version tag
├─ Standard-ASG The characters “PGM-ASG-V”
├─ Version Major version – 1 or 2 characters
├─ Dot-separator The character “.” separating version components
└─ Subversion Minor version – 1 or 2 characters
AudioSpatialGuideID Unique identifier for this guide instance
MInstanceID Identifier of M-Instance
MEnvironmentID Identifier of M-Environment
SalientSources List of user-relevant sound sources
├─ SourceID Unique ID for each source
├─ Label Semantic label (e.g., “voice”, “alarm”)
├─ RelativeDirection Azimuth and elevation relative to user
├─ Proximity Estimated closeness to the User
├─ Motion Whether the source is static or moving
└─ NarrativeCue Optional natural language cue for prompt enrichment
AmbientAudioContext Summary of ambient audio conditions
├─ NoiseLevel Background noise classification
├─ Reverberation Echo profile of the environment
├─ DominantSource Most prominent sound source
Trace Provenance metadata
├─ Origin Module that generated the guide
└─ Timestamp Time of guide creation