Go to MPAI-OSD Data Types

Definition
Functional Requirements
Syntax
Semantics
Conformance Testing
Performance Assessment

1      Definition

Enhanced Audio Scene Descriptors (EAD) provide an enriched representation of a scene described by Audio Scene Descriptors (ASD). EADs include additional information generated by AI modules such as Audio Scene Enhancement, Audio Motion and Proximity analysis, Acoustic Environment Analysis, Audio Salience Mapping, and Domain Access enrichment. EADs support downstream processing, orchestration, and interaction without duplicating the semantics or functionality of the generating AI modules.

2      Functional Requirements

The Enhanced Audio Scene Descriptors shall:

  • Provide a mechanism to extend an Audio Scene Descriptors instance without duplicating its structure.
  • Reference a base ASD instance through a unique identifier (BaseASDID).
  • Represent enhanced audio scene elements as entities with unique identifiers (EntityID).
  • Support enrichment of the scene with:
    • audio object descriptions,
    • spatial attitudes,
    • motion characteristics,
    • proximity classification,
    • acoustic environment information.
  • Support representation of spatial relationships through proximity and spatial relation information.
  • Support representation of motion through motion flags associated with entities.
  • Support representation of acoustic environment properties through AcousticProfile.
  • Support salience analysis through:
    • ranking of entities,
    • selection of salient entities.
  • Support filtering of entities based on perceptual relevance.
  • Support representation of relations between entities, including:
    • proximity,
    • masking,
    • dominance,
    • spatial relationships.
  • Support interaction with execution environments through AudioCXEStatus.
  • Support interaction with domain modules through:
    • DomainRequest,
    • DomainResponse.
  • Allow optional inclusion of processing and descriptive metadata.
  • Ensure that enhancements are:
    • consistent with the referenced ASD,
    • non-duplicative of AIM functionality,
    • composable across processing stages,
    • extensible to additional attributes and modalities.

3      Syntax

https://schemas.mpai.community/OSD/V1.5/data/EnhancedAudioSceneDescriptors.json

4      Semantics

Label Description
Header Identifies the schema version using the pattern OSD-EAD-Vx.y.
MInstanceID Identifies the virtual space associated with the descriptors.
UEnvironmentID Identifies the real space associated with the descriptors.
EnhancedAudioSceneDescriptorsID Unique identifier of the enhanced descriptor instance.
EnhancedAudioSceneDescriptorTime Timestamp of the enhancement processing.
BaseASDID Identifier of the Audio Scene Descriptors instance being extended.
EnhancedAudioSceneDescriptorsSpaceTime Spatial and temporal scope of the enhanced descriptors.
Entities Array of entities representing audio objects derived from the base Audio Scene Descriptors.
   – EntityID Unique identifier of the entity within this Enhanced Audio Scene.
   – ASDItemID Identifier of the corresponding item in the base Audio Scene Descriptors.
   – AudioObject Optional full Audio Object data associated with the entity.
Relations Array describing relationships between entities.
   – RelationType Type of relation: Proximity, Masking, Dominance, or SpatialRelation.
   – SourceID EntityID of the source entity of the relation.
   – TargetID EntityID of the target entity of the relation.
   – Weight Strength or relevance of the relation.
ProximityClasses Classification of entities according to proximity to the listener Point of View.
MotionFlags Motion state indicators for dynamic entities in the scene.
RankedEntities Ordered list of EntityIDs ranked by salience or domain relevance.
FilteredEntities List of EntityIDs excluded from further processing based on domain rules or salience thresholds.
SceneState Container of scene-level properties derived from analysis.
   – AcousticProfile Acoustic characteristics of the scene environment.
   – AudioCXEStatus Processing status reported by the Audio Scene Enhancement AIM.
DomainRequest Structured request for domain-specific interpretation sent to Domain Access.
DomainResponse Domain-specific response received from Domain Access enriching the scene interpretation.
DataXMData Processing and exchange metadata associated with the descriptors.
DescrMetadata Additional descriptive metadata (free text, up to 2048 characters).

5      Conformance Testing

A Data instance Conforms with Enhanced Audio Scene Descriptors (OSD-EAD) if:

  1. The Data validates against the Enhanced Audio Scene Descriptors’ JSON Schema.
  2. All Data in the Enhanced Audio Scene Descriptors’ JSON Schema:
    1. Have the specified type.
    2. Validate against their JSON Schemas.
    3. Conform with their Audio Data Qualifiers.
  3. BaseASDID references a valid Basic or Full Audio Scene Descriptors instance.
  4. All EntityID values referenced in RelationsRankedEntities, and FilteredEntities are present in Entities.

6      Performance Assessment