Definition
Functional Requirements
Syntax
Semantics
Conformance Testing
Performance Assessment
1 Definition
Enhanced Audio Scene Descriptors (EAD) provide an enriched representation of a scene described by Audio Scene Descriptors (ASD). EADs include additional information generated by AI modules such as Audio Scene Enhancement, Audio Motion and Proximity analysis, Acoustic Environment Analysis, Audio Salience Mapping, and Domain Access enrichment. EADs support downstream processing, orchestration, and interaction without duplicating the semantics or functionality of the generating AI modules.
2 Functional Requirements
The Enhanced Audio Scene Descriptors shall:
- Provide a mechanism to extend an Audio Scene Descriptors instance without duplicating its structure.
- Reference a base ASD instance through a unique identifier (BaseASDID).
- Represent enhanced audio scene elements as entities with unique identifiers (EntityID).
- Support enrichment of the scene with:
- audio object descriptions,
- spatial attitudes,
- motion characteristics,
- proximity classification,
- acoustic environment information.
- Support representation of spatial relationships through proximity and spatial relation information.
- Support representation of motion through motion flags associated with entities.
- Support representation of acoustic environment properties through AcousticProfile.
- Support salience analysis through:
- ranking of entities,
- selection of salient entities.
- Support filtering of entities based on perceptual relevance.
- Support representation of relations between entities, including:
- proximity,
- masking,
- dominance,
- spatial relationships.
- Support interaction with execution environments through AudioCXEStatus.
- Support interaction with domain modules through:
- DomainRequest,
- DomainResponse.
- Allow optional inclusion of processing and descriptive metadata.
- Ensure that enhancements are:
- consistent with the referenced ASD,
- non-duplicative of AIM functionality,
- composable across processing stages,
- extensible to additional attributes and modalities.
3 Syntax
https://schemas.mpai.community/OSD/V1.5/data/EnhancedAudioSceneDescriptors.json
4 Semantics
| Label | Description |
|---|---|
| Header | Identifies the schema version using the pattern OSD-EAD-Vx.y. |
| MInstanceID | Identifies the virtual space associated with the descriptors. |
| UEnvironmentID | Identifies the real space associated with the descriptors. |
| EnhancedAudioSceneDescriptorsID | Unique identifier of the enhanced descriptor instance. |
| EnhancedAudioSceneDescriptorTime | Timestamp of the enhancement processing. |
| BaseASDID | Identifier of the Audio Scene Descriptors instance being extended. |
| EnhancedAudioSceneDescriptorsSpaceTime | Spatial and temporal scope of the enhanced descriptors. |
| Entities | Array of entities representing audio objects derived from the base Audio Scene Descriptors. |
| – EntityID | Unique identifier of the entity within this Enhanced Audio Scene. |
| – ASDItemID | Identifier of the corresponding item in the base Audio Scene Descriptors. |
| – AudioObject | Optional full Audio Object data associated with the entity. |
| Relations | Array describing relationships between entities. |
| – RelationType | Type of relation: Proximity, Masking, Dominance, or SpatialRelation. |
| – SourceID | EntityID of the source entity of the relation. |
| – TargetID | EntityID of the target entity of the relation. |
| – Weight | Strength or relevance of the relation. |
| ProximityClasses | Classification of entities according to proximity to the listener Point of View. |
| MotionFlags | Motion state indicators for dynamic entities in the scene. |
| RankedEntities | Ordered list of EntityIDs ranked by salience or domain relevance. |
| FilteredEntities | List of EntityIDs excluded from further processing based on domain rules or salience thresholds. |
| SceneState | Container of scene-level properties derived from analysis. |
| – AcousticProfile | Acoustic characteristics of the scene environment. |
| – AudioCXEStatus | Processing status reported by the Audio Scene Enhancement AIM. |
| DomainRequest | Structured request for domain-specific interpretation sent to Domain Access. |
| DomainResponse | Domain-specific response received from Domain Access enriching the scene interpretation. |
| DataXMData | Processing and exchange metadata associated with the descriptors. |
| DescrMetadata | Additional descriptive metadata (free text, up to 2048 characters). |
5 Conformance Testing
A Data instance Conforms with Enhanced Audio Scene Descriptors (OSD-EAD) if:
- The Data validates against the Enhanced Audio Scene Descriptors’ JSON Schema.
- All Data in the Enhanced Audio Scene Descriptors’ JSON Schema:
- Have the specified type.
- Validate against their JSON Schemas.
- Conform with their Audio Data Qualifiers.
- BaseASDID references a valid Basic or Full Audio Scene Descriptors instance.
- All EntityID values referenced in Relations, RankedEntities, and FilteredEntities are present in Entities.