1 Definition
A Data Type an instance of which represents analogue signals – or is rendered to be perceived – in the human-audible range (16 Hz – 20 kHz).
2 Functional Requirements
An Audio Qualifier must allow the expression of the following Elements:
- Sub-Types
- Formats
- Content
- Transport
- Attributes
- Source
- Metadata
- Spatial Attributes
- Device
3 Syntax
https://schemas.mpai.community/TFA/V1.0/data/AudioQualifier.json
4 Semantics
-
Sub-Types
- No Sub-Types
-
Formats
-
Content
- Raw Audio
- Definition: the method used to digitally represent samples or their transform coefficients.
- Methods
- Sample Space
- Definition: the representation with samples having the meaning of Audio level.
- Characteristics
- Sampling frequency Number expressing kHz
- Sample precision Integer expressing bits/sample
- Transform Space
- Definition: the representation with samples having the meaning of Spatial Fourier Transform coefficients.
- Characteristics
- Sequence
- Sequential
- Interleaved
- Precision
- float32
- float64
- Sequence
- Spherical Harmonic Decomposition
- Definition: the representation with samples having the meaning of Spherical Fourier Transform coefficients.
- Characteristics
- Sequence
- Sequential
- Interleaved
- Precision
- float32
- float64
- Sequence
- Ambisonics
- Definition: the full-sphere surround sound format covering the horizontal plane, and above and below.
- Types
- 1st Order
- 2nd Order
- 3rd Order
- Sample Space
- Compression Formats
- Definition: the method used to reduce the number of bits required to represent an Audio instance.
- Methods
- MP3-1: (ISO/IEC 11172-3:1993)
- MP3-2: (ISO/IEC 13818-3:1998)
- AAC-2: (ISO/IEC 13818-7:2006)
- AAC-4: (ISO/IEC 14496-3:2019)
- Raw Audio
-
Transport
- Definition: the method used to transport an Audio Data Type instance
- Methods
- File
- WAV (https://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.2088-1-201910-I!!PDF-E.pdf))
- Core Audio Format (https://developer.apple.com/library/archive/documentation/MusicAudio/Reference/CAFSpec/CAF_spec/CAF_spec.html)
- RF64 (https://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.2088-1-201910-I!!PDF-E.pdf)
- MP4 (ISO/IEC 14496-12:2022)
- Stream
- DASH (ISO/IEC 23009-1:2022)
- HTTP Live Streaming (https://datatracker.ietf.org/doc/html/rfc8216)
- File
-
-
Attributes
-
Source Type
- Definition: the types of an Audio instance
- Types:
- Vocal
- Real
- Synthetic
- Music
- Real
- Synthetic
- Sound effects
- Real
- Synthetic
- Noise
- Real
- Synthetic
- Vocal
-
Metadata
- Definition: the method used to attach information to an instance of Speech Data Type.
- Methods
- General
- Dublin Core (https://www.dublincore.org/specification_status/recommendation/)
- ID3 (https://id3.org/)
- IPTC Phot0 Metadata (https://www.iptc.org/std/photometadata/specification/IPTC-PhotoMetadata)
- Object Identity
- Definition: the ID of an object in a Audio data Type instance
- Methods
- Instance Identifier (https://mpai.community/standards/mpai-osd/v1-1/data-types/instance-identifier/)
- General
-
Spatial Attributes
- Definition: Attributes that define the Audio Data Type instance in space such as direction, distance, and orientation.
- Types
- Binaural Cues
- Definition: Cues that provide information on the direction of a sound in the horizontal plane by relying on differences in sounds received by the two ears.
- Types
- Interaural level difference (ILD) Array of frequencies and associated level difference (Nx2)
- Interaural time delay (ITD) Array of frequencies and associated time delays (Nx2)
- Interaural phase difference (IPD) Array of frequencies and associated phase differences (Nx2)
- Spectral Cues
- Definition: Cues that contribute to the resolution of front/back confusions when different sound sources create the same interaural cues, and are critical for accurate localization of elevation in the median plane where interaural cues are negligible.
- Type
- Array of frequencies and associated frequency spectra (Nx5)
– 2 for left ear (real and imaginary)
– 2 for right (real and imaginary)
- Array of frequencies and associated frequency spectra (Nx5)
- Interchannel Differences
- Definition: Cues that define the differences between pairs of audio channels with respect to pressure level and time.
- Types
- Interchannel level difference (ICLD) Array of frequencies and associated level differences (NxM(M-1)/2)
M= #channels - Interchannel time difference (ICTD) Array of frequencies and associated time differences (NxM(M-1)/2)
M= #channels
- Interchannel level difference (ICLD) Array of frequencies and associated level differences (NxM(M-1)/2)
- Binaural Cues
-
Device
- Definition: features of the device that captured the Audio instance.
- Features
- Device ID Definition: an identifier of the device that captured the Speech instance, typically a string.
- Device Geometry
- Definition: the method of describing the spatial arrangement of audio sensors in an audio device.
- Methods
- Microphone Array Geometry (https://mpai.community/standards/mpai-cae/usc/v2-2/data-types/microphone-array-geometry/)
- Device Location
- Definition: the method to define position and orientation of the device that captured an Audio instance in a real or virtual space.
- Methods
- Point of View (https://mpai.community/standards/mpai-osd/v1-1/data-types/point-of-view/)
- Sensor characteristics
- Definition: features of a single microphone sensor having an impact on the captured Audio Data Type instance, specifically directivity pattern and frequency response
- Directivity pattern
- Cardioid
- Supercardioid
- Hypercardioid
- Omnidirectional
- Parametric
- Definition: a directivity pattern which can be represented as coefficients of its trigonometric polynomial.
- Features:
- Degree
- Coefficients
- Frequency response
- Definition: the sensitivity of a microphone sensor expressed as an array of complex numbers at a discrete number of frequencies.
-