1 Definition
Audio Qualifier is a set of Data providing additional information on Audio Data for potential use by a machine.
Audio Object includes Text Qualifier in addition to Audio Data. It is specified by CAE-USC V2.2.
2 Functional Requirements
Audio Qualifier must allow the expression of the following Qualifier elements:
- Formats
- Content
- Transport
- Attributes
- Source
- Metadata
- Spatial Attributes
- Device
Users needing support of other entries in MPAI-TFA should make a documented request to the MPAI Secretariat to consider addition of such entries.
3 Syntax
https://schemas.mpai.community/TFA/V1.0/data/AudioQualifier.json
4 Semantics
-
Sub-Types
-
Formats
- Content
- Definition: the type of data arrangement used to digitally represent speech.
- Types:
- Content
-
-
-
- Raw Audio
- Definition: the type of data arrangement used to digitally represent samples or their transform coefficients.
- Types:
- Sample Space
- Definition: the representation with samples having the meaning of Audio level.
- Characteristics
- Sampling frequency Number expressing kHz
- Sample precision Integer expressing bits/sample
- Transform Space
- Definition: the characteristics of the representation with samples having the meaning of Spatial Fourier Transform coefficients.
- Characteristics
- Sequence
- Sequential
- Interleaved
- Precision
- float32
- float64
- Sequence
- Spherical Harmonic Decomposition
- Definition: the characteristics of the representation with samples having the meaning of Spherical Fourier Transform coefficients.
- Characteristics
- Sequence
- Sequential
- Interleaved
- Precision
- float32
- float64
- Sequence
- Ambisonics
- Definition: the types of full-sphere surround sound format covering the horizontal plane, above, and below.
- Types
- 1st Order
- 2nd Order
- 3rd Order
- Sample Space
- Compression Formats
- Definition: the type of data arrangement used to reduce the number of bits required to represent an Audio instance.
- Types
- MP3-1: (ISO/IEC 11172-3:1993)
- MP3-2: (ISO/IEC 13818-3:1998)
- AAC-2: (ISO/IEC 13818-7:2006)
- AAC-4: (ISO/IEC 14496-3:2019)
- Raw Audio
-
- Transport
- Definition: the type of data arrangement used to transport an Audio Data Type instance
- Types
- File
- WAV
- Core Audio Format
- RF64
- MP4 (ISO/IEC 14496-12:2022)
- Stream
- DASH (ISO/IEC 23009-1:2022)
- HTTP Live Streaming
- File
-
-
Attributes
- Source Type
- Definition: the types of an Audio instance
- Types:
- Vocal
- Real
- Synthetic
- Music
- Real
- Synthetic
- Sound effects
- Real
- Synthetic
- Noise
- Real
- Synthetic
- Vocal
- Metadata
- Definition: the type of data arrangement used to attach information to an instance of Speech Data Type.
- Types
- General
- Dublin Core
- ID3
- IPTC Phot0 Metadata
- Object Identity
- Definition: the ID of an object in a Audio data Type instance
- General
- Spatial Attributes
- Definition: Attributes that define the Audio Data Type instance in space such as direction, distance, and orientation.
- Types
- Binaural Cues
- Definition: Cues that provide information on the direction of a sound in the horizontal plane by relying on differences in sounds received by the two ears.
- Types
- Interaural level difference (ILD) Array of frequencies and associated level difference (Nx2)
- Interaural time delay (ITD) Array of frequencies and associated time delays (Nx2)
- Interaural phase difference (IPD) Array of frequencies and associated phase differences (Nx2)
- Spectral Cues
- Definition: Cues that contribute to the resolution of front/back confusions when different sound sources create the same interaural cues, and are critical for accurate localization of elevation in the median plane where interaural cues are negligible.
- Type
- Array of frequencies and associated frequency spectra (Nx5)
– 2 for left ear (real and imaginary)
– 2 for right (real and imaginary)
- Array of frequencies and associated frequency spectra (Nx5)
- Interchannel Differences
- Definition: Cues that define the differences between pairs of audio channels with respect to pressure level and time.
- Types
- Interchannel level difference (ICLD) Array of frequencies and associated level differences (NxM(M-1)/2)
M= #channels - Interchannel time difference (ICTD) Array of frequencies and associated time differences (NxM(M-1)/2)
M= #channels
- Interchannel level difference (ICLD) Array of frequencies and associated level differences (NxM(M-1)/2)
- Binaural Cues
- Device
- Definition: characteristics of the device that captured the Audio instance.
- Features
- Device ID Definition: an identifier of the device that captured the Speech instance, typically a string.
- Device Geometry
- Definition: the type of description of the the spatial arrangement of audio sensors in an audio device.
- Types
- Device Location
- Definition: the type of description of the position and orientation of the device that captured an Audio instance in a real or virtual space.
- Types
- Sensor characteristics
- Definition: features of a single microphone sensor having an impact on the captured Audio Data Type instance, specifically directivity pattern and frequency response
- Directivity pattern
- Cardioid
- Supercardioid
- Hypercardioid
- Omnidirectional
- Parametric
- Definition: a directivity pattern which can be represented as coefficients of its trigonometric polynomial.
- Features:
- Degree
- Coefficients
- Frequency response
- Definition: the sensitivity of a microphone sensor expressed as an array of complex numbers at a discrete number of frequencies.
- Source Type