1 Definition

Audio Qualifier is a set of Data providing additional information on Audio Data for potential use by a machine.

Audio Object includes Text Qualifier in addition to Audio Data. It is specified by CAE-USC V2.2.

2 Functional Requirements

Audio Qualifier must allow the expression of the following Qualifier elements:

  1. Formats
    1. Content
    2. Transport
  2. Attributes
    1. Source
    2. Metadata
    3. Spatial Attributes
    4. Device

Users needing support of other entries in MPAI-TFA should make a documented request to the MPAI Secretariat to consider addition of such entries.

3 Syntax

https://schemas.mpai.community/TFA/V1.0/data/AudioQualifier.json

4 Semantics

  1. Sub-Types

  2. Formats

    1. Content
      1. Definition: the type of data arrangement used to digitally represent speech.
      2. Types:
        1. Raw Audio
          1. Definition: the type of data arrangement used to digitally represent samples or their transform coefficients.
          2. Types:
            1. Sample Space
              1. Definition: the representation with samples having the meaning of Audio level.
              2. Characteristics
                1. Sampling frequency Number expressing kHz
                2. Sample precision Integer expressing bits/sample
            2. Transform Space
              1. Definition: the characteristics of the representation with samples having the meaning of  Spatial Fourier Transform coefficients.
              2. Characteristics
                1. Sequence
                  1. Sequential
                  2. Interleaved
                2. Precision
                  1. float32
                  2. float64
            3. Spherical Harmonic Decomposition
              1. Definition: the characteristics of the representation with samples having the meaning of  Spherical Fourier Transform coefficients.
              2. Characteristics
                1.  Sequence
                  1. Sequential
                  2. Interleaved
                2. Precision
                  1. float32
                  2. float64
            4. Ambisonics
              1. Definition: the types of full-sphere surround sound format covering the horizontal plane, above, and below.
              2. Types
                1. 1st Order
                2. 2nd Order
                3. 3rd Order
        2. Compression Formats
          1. Definition: the type of data arrangement used to reduce the number of bits required to represent an Audio instance.
          2. Types
            1. MP3-1: (ISO/IEC 11172-3:1993)
            2. MP3-2: (ISO/IEC 13818-3:1998)
            3. AAC-2: (ISO/IEC 13818-7:2006)
            4. AAC-4: (ISO/IEC 14496-3:2019)
    1. Transport
      1. Definition: the type of data arrangement used to transport an Audio Data Type instance
      2. Types
        1. File
          1. WAV
          2. Core Audio Format
          3. RF64
          4. MP4 (ISO/IEC 14496-12:2022)
        2. Stream
          1.  DASH (ISO/IEC 23009-1:2022)
          2. HTTP Live Streaming
  1. Attributes

    1. Source Type
      1. Definition: the types of an Audio instance
      2. Types:
        1. Vocal
          1. Real
          2. Synthetic
        2. Music
          1. Real
          2. Synthetic
        3. Sound effects
          1. Real
          2. Synthetic
        4. Noise
          1. Real
          2. Synthetic
    2. Metadata
      1. Definition: the type of data arrangement used to attach information to an instance of Speech Data Type.
      2. Types
        1. General
          1. Dublin Core
          2. ID3
          3. IPTC Phot0 Metadata
          4. Object Identity
            1. Definition: the ID of an object in a Audio data Type instance
              1. Instance Identifier
    3. Spatial Attributes
      1. Definition: Attributes that define the Audio Data Type instance in space such as direction, distance, and orientation.
      2. Types
        1. Binaural Cues
          1. Definition:  Cues that provide information on the direction of a sound in the horizontal plane by relying on differences in sounds received by the two ears.
          2. Types
            1. Interaural level difference (ILD) Array of frequencies and associated level difference (Nx2)
            2. Interaural time delay (ITD) Array of frequencies and associated time delays (Nx2)
            3. Interaural phase difference (IPD) Array of frequencies and associated phase differences (Nx2)
        2. Spectral Cues
          1. Definition: Cues that contribute to the resolution of front/back confusions when different sound sources create the same interaural cues, and are critical for accurate localization of elevation in the median plane where interaural cues are negligible.
          2. Type
            1. Array of frequencies and associated frequency spectra (Nx5)
              – 2 for left ear (real and imaginary)
              – 2 for right (real and imaginary)
        3. Interchannel Differences
          1. Definition:  Cues that define the differences between pairs of audio channels with respect to pressure level and time.
          2. Types
            1. Interchannel level difference (ICLD) Array of frequencies and associated level differences (NxM(M-1)/2)
              M= #channels
            2. Interchannel time difference (ICTD) Array of frequencies and associated time differences (NxM(M-1)/2)
              M= #channels
    4. Device
      1. Definition: characteristics of the device that captured the Audio instance.
      2. Features
        1. Device ID Definition: an identifier of the device that captured the Speech instance, typically a string.
        2. Device Geometry
          1. Definition: the type of description of the the spatial arrangement of audio sensors in an audio device.
          2. Types
            1. Microphone Array Geometry
        3. Device Location
          1. Definition: the type of description of the position and orientation of the device that captured an Audio instance in a real or virtual space.
          2. Types
            1. Point of View
        4. Sensor characteristics
          1. Definition: features of a single microphone sensor having an impact on the captured Audio Data Type instance, specifically directivity pattern and frequency response
          2. Directivity pattern
            1. Cardioid
            2. Supercardioid
            3. Hypercardioid
            4. Omnidirectional
            5. Parametric
              1. Definition: a directivity pattern which can be represented as coefficients of its trigonometric polynomial.
              2. Features:
                1. Degree
                2. Coefficients
          3. Frequency response
            1. Definition: the sensitivity of a microphone sensor expressed as an array of complex numbers at a discrete number of frequencies.