1 Definition

A Data Type an instance of which represents analogue signals – or is rendered to be perceived – in the human-audible range (16 Hz – 20 kHz).

2 Functional Requirements

An Audio Qualifier must allow the expression of the following Elements:

  1. Sub-Types
  2. Formats
    1. Content
    2. Transport
  3. Attributes
    1. Source
    2. Metadata
    3. Spatial Attributes
    4. Device

3 Syntax

https://schemas.mpai.community/TFA/V1.0/data/AudioQualifier.json

4 Semantics

  1. Sub-Types

    1. No Sub-Types
  2. Formats

    1. Content
      1. Raw Audio
        1. Definition: the method used to digitally represent samples or their transform coefficients.
        2. Methods
          1. Sample Space
            1. Definition: the representation with samples having the meaning of Audio level.
            2. Characteristics
              1. Sampling frequency Number expressing kHz
              2. Sample precision Integer expressing bits/sample
          2. Transform Space
            1. Definition: the representation with samples having the meaning of  Spatial Fourier Transform coefficients.
            2. Characteristics
              1. Sequence
                1. Sequential
                2. Interleaved
              2. Precision
                1. float32
                2. float64
          3. Spherical Harmonic Decomposition
            1. Definition: the representation with samples having the meaning of  Spherical Fourier Transform coefficients.
            2. Characteristics
              1.  Sequence
                1. Sequential
                2. Interleaved
              2. Precision
                1. float32
                2. float64
          4. Ambisonics
            1. Definition: the full-sphere surround sound format covering the horizontal plane, and above and below.
            2. Types
              1. 1st Order
              2. 2nd Order
              3. 3rd Order
      2. Compression Formats
        1. Definition: the method used to reduce the number of bits required to represent an Audio instance.
        2. Methods
          1. MP3-1: (ISO/IEC 11172-3:1993)
          2. MP3-2: (ISO/IEC 13818-3:1998)
          3. AAC-2: (ISO/IEC 13818-7:2006)
          4. AAC-4: (ISO/IEC 14496-3:2019)
    2. Transport
      1. Definition: the method used to transport an Audio Data Type instance
      2. Methods
        1. File
          1. WAV (https://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.2088-1-201910-I!!PDF-E.pdf))
          2. Core Audio Format (https://developer.apple.com/library/archive/documentation/MusicAudio/Reference/CAFSpec/CAF_spec/CAF_spec.html)
          3. RF64 (https://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.2088-1-201910-I!!PDF-E.pdf)
          4. MP4 (ISO/IEC 14496-12:2022)
        2. Stream
          1.  DASH (ISO/IEC 23009-1:2022)
          2. HTTP Live Streaming (https://datatracker.ietf.org/doc/html/rfc8216)
  3. Attributes

    1. Source Type
      1. Definition: the types of an Audio instance
      2. Types:
        1. Vocal
          1. Real
          2. Synthetic
        2. Music
          1. Real
          2. Synthetic
        3. Sound effects
          1. Real
          2. Synthetic
        4. Noise
          1. Real
          2. Synthetic
    2. Metadata
      1. Definition: the method used to attach information to an instance of Speech Data Type.
      2. Methods
        1. General
          1. Dublin Core (https://www.dublincore.org/specification_status/recommendation/)
          2. ID3 (https://id3.org/)
          3. IPTC Phot0 Metadata (https://www.iptc.org/std/photometadata/specification/IPTC-PhotoMetadata)
          4. Object Identity
            1. Definition: the ID of an object in a Audio data Type instance
            2. Methods
              1. Instance Identifier (https://mpai.community/standards/mpai-osd/v1-1/data-types/instance-identifier/)
      3. Spatial Attributes
        1. Definition: Attributes that define the Audio Data Type instance in space such as direction, distance, and orientation.
        2. Types
          1. Binaural Cues
            1. Definition:  Cues that provide information on the direction of a sound in the horizontal plane by relying on differences in sounds received by the two ears.
            2. Types
              1. Interaural level difference (ILD) Array of frequencies and associated level difference (Nx2)
              2. Interaural time delay (ITD) Array of frequencies and associated time delays (Nx2)
              3. Interaural phase difference (IPD) Array of frequencies and associated phase differences (Nx2)
          2. Spectral Cues
            1. Definition: Cues that contribute to the resolution of front/back confusions when different sound sources create the same interaural cues, and are critical for accurate localization of elevation in the median plane where interaural cues are negligible.
            2. Type
              1. Array of frequencies and associated frequency spectra (Nx5)
                – 2 for left ear (real and imaginary)
                – 2 for right (real and imaginary)
          3. Interchannel Differences
            1. Definition:  Cues that define the differences between pairs of audio channels with respect to pressure level and time.
            2. Types
              1. Interchannel level difference (ICLD) Array of frequencies and associated level differences (NxM(M-1)/2)
                M= #channels
              2. Interchannel time difference (ICTD) Array of frequencies and associated time differences (NxM(M-1)/2)
                M= #channels
      4. Device
        1. Definition: features of the device that captured the Audio instance.
        2. Features
          1. Device ID Definition: an identifier of the device that captured the Speech instance, typically a string.
          2. Device Geometry
            1. Definition: the method of describing the spatial arrangement of audio sensors in an audio device.
            2. Methods
              1. Microphone Array Geometry (https://mpai.community/standards/mpai-cae/usc/v2-2/data-types/microphone-array-geometry/)
          3. Device Location
            1. Definition: the method to define position and orientation of the device that captured an Audio instance in a real or virtual space.
            2. Methods
              1. Point of View (https://mpai.community/standards/mpai-osd/v1-1/data-types/point-of-view/)
          4. Sensor characteristics
            1. Definition: features of a single microphone sensor having an impact on the captured Audio Data Type instance, specifically directivity pattern and frequency response
            2. Directivity pattern
              1. Cardioid
              2. Supercardioid
              3. Hypercardioid
              4. Omnidirectional
              5. Parametric
                1. Definition: a directivity pattern which can be represented as coefficients of its trigonometric polynomial.
                2. Features:
                  1. Degree
                  2. Coefficients
            3. Frequency response
              1. Definition: the sensitivity of a microphone sensor expressed as an array of complex numbers at a discrete number of frequencies.