1 Definition

Speech Qualifier is a set of Data providing additional information on Speech Data for potential use by a machine.

The combination of Speech Data and Speech Qualifier is called Speech Object, specified by MPAI-OSD V1.3.

2 Functional Requirements

A Speech Qualifier must allow the expression of the following Elements:

  1. Formats
    1. Content
    2. Transport
  2. Attributes
    1. Source
    2. Metadata
    3. Spatial Attributes
    4. Device

Users needing additional entries in the Speech Qualifier or support of new Qualifiers should make a documented request to the MPAI Secretariat. Requests will be considered by the appropriate MPAI committee.

3 Syntax

https://schemas.mpai.community/TFA/V1.3/formats/SpeechQualifiers.json

4 Semantics

  1. Formats

    1. Content
      1. Definition: the type of data arrangement used to digitally represent speech.
      2. Types:
        1. Raw Speech
          1. Definition: the type of data arrangement used to digitally represent samples.
          2. Types:
            1. Sampling Frequency: Number expressing kHz.
            2. Sample Precision: Number expressing bits/sample.
        2. Speech Compression Formats
          1. Definition: the type of data arrangement used to reduce the number of bits for speech.
          2. Types:
            1. G711A
            2. G711mu
            3. MP3 (ISO/IEC 11172-3:1993)
            4. AAC (ISO/IEC 14496-3:2019)
    2. Transport
      1. Definition: the type of data arrangement used to transport Speech.
      2. Types:
        1. File
          1. Definition: the type of data arrangement used to statically transport Speech by files.
          2. Types:
            1. WAV
            2. MP4 (ISO/IEC 14496-12:2022)
        2. Stream
          1. Definition: the type of data arrangement used to dynamically transport Speech by stream.
          2. Types:
            1.  DASH (ISO/IEC 23009-1:2022)
            2. HTTP Live Streaming
  2. Attributes

    1. Source Type
      1. Definition: the types of the Speech instance
      2. Types:
        1. Real
        2. Synthetic
    2. Metadata
      1. Definition: the type of data arrangement used to attach information to a Speech instance.
      2. Types:
        1. Language
          1. Definition: the type of data arrangement used to indicate the Language used by a Speech instance.
          2. Type:
            1. ISO 639-1
            2. ISO 639-2
            3. ISO 639-3
        2. Speaker Identity
          1. Definition: the type of data arrangement used to identify a speaker.
          2. Type:
            1. MPAI Instance Identifier
        3. Content Description
          1. Definition: the type of data arrangement used to describe the content of a Speech instance.
          2. Types:
            1. ASCII
            2. UTF-8,
            3. UTF-16,
            4. UTF-32
        4. Entity Internal Status D
          1. Definition: the type of data arrangement used to describe the internal status such as cognitive state, emotion, and social attitude.
          2. Type:
            1. MPAI Personal Status
      3. Device
        1. Definition: Characteristics of the device that captured the speech.
        2. Characteristics:
          1. Device ID
            1. Definition: an identifier of the device
            2. Identifier:
              1. String
          2. Device Location
            1. Definition: the position and orientation of the device in a real or virtual space.
            2. Types:
              1. MPAI Point of View
          3. Sensor Characteristics
            1. Definition: sensor features having an impact on the captured speech.
            2. Sensor features
              1. Omnidirectional
              2. Figure of eight
              3. Cardioid
              4.  Supercardioid
              5.  Hypercardioid