1 Definition

Speech Qualifier is a set of Data providing additional information on Speech Data for potential use by a machine.

Speech Object includes Speech Qualifier in addition to Speech Data. It is specified by MPAI-MMC V2.2.

2 Functional Requirements

A Speech Qualifier must allow the expression of the following Elements:

  1. Formats
    1. Content
    2. Transport
  2. Attributes
    1. Source
    2. Metadata
    3. Spatial Attributes
    4. Device

Users needing support of other entries in MPAI-TFA should make a documented request to the MPAI Secretariat to consider addition of such entries.

3 Syntax

https://schemas.mpai.community/TFA/V1.0/formats/SpeechQualifiers.json

4 Semantics

  1. Sub-Types

    1. No Sub-Types
  2. Formats

    1. Content
      1. Definition: the type of data arrangement used to digitally represent speech.
      2. Types:
        1. Raw Speech
          1. Definition: the type of data arrangement used to digitally represent samples.
          2. Types:
            1. Sampling Frequency: Number expressing kHz.
            2. Sample Precision: Number expressing bits/sample.
        2. Speech Compression Formats
          1. Definition: the type of data arrangement used to reduce the number of bits for speech.
          2. Types:
            1. G711A
            2. G711mu
            3. MP3 (ISO/IEC 11172-3:1993)
            4. AAC (ISO/IEC 14496-3:2019)
    2. Transport
      1. Definition: the type of data arrangement used to transport Speech.
      2. Types:
        1. File
          1. Definition: the type of data arrangement used to statically transport Speech by files.
          2. Types:
            1. WAV
            2. MP4 (ISO/IEC 14496-12:2022)
        2. Stream
          1. Definition: the type of data arrangement used to dynamically transport Speech by stream.
          2. Types:
            1.  DASH (ISO/IEC 23009-1:2022)
            2. HTTP Live Streaming
  3. Attributes

    1. Source Type
      1. Definition: the types of the Speech instance
      2. Types:
        1. Real
        2. Synthetic
    2. Metadata
      1. Definition: the type of data arrangement used to attach information to a Speech instance.
      2. Types:
        1. Language
          1. Definition: the type of data arrangement used to indicate the Language used by a Speech instance.
          2. Type:
            1. ISO 639-1
            2. ISO 639-2
            3. ISO 639-3
        2. Speaker Identity
          1. Definition: the type of data arrangement used to identify a speaker.
          2. Type:
            1. MPAI Instance Identifier
        3. Content Description
          1. Definition: the type of data arrangement used to describe the content of a Speech instance.
          2. Types:
            1. ASCII
            2. UTF-8,
            3. UTF-16,
            4. UTF-32
        4. Entity Internal Status D
          1. Definition: the type of data arrangement used to describe the internal status such as cognitive state, emotion, and social attitude.
          2. Type:
            1. MPAI Personal Status
      3. Device
        1. Definition: Characteristics of the device that captured the speech.
        2. Characteristics:
          1. Device ID
            1. Definition: an identifier of the device
            2. Identifier:
              1. String
          2. Device Location
            1. Definition: the position and orientation of the device in a real or virtual space.
            2. Types:
              1. MPAI Point of View
          3. Sensor Characteristics
            1. Definition: sensor features having an impact on the captured speech.
            2. Sensor features
              1. Omnidirectional
              2. Figure of eight
              3. Cardioid
              4.  Supercardioid
              5.  Hypercardioid