1 Definition

A Data Type whose instance represents – or is rendered to be perceived – as an analogue signal with vocal characteristics.

2 Functional Requirements

A Speech Qualifier must allow the expression of the following Elements:

  1. Sub-Types
  2. Formats
    1. Content
    2. Transport
  3. Attributes
    1. Source
    2. Metadata
    3. Spatial Attributes
    4. Device

3 Syntax

https://schemas.mpai.community/TFA/V1.0/data/SpeechQualifier.json

4 Semantics

  1. Sub-Types

    1. No Sub-Types
  2. Formats:

    1. Content
      1. Definition: The method used to digitally represent speech.
      2. Methods
        1. PCM
          1. Definition: the digital representation of speech using samples.
          2. Characteristics:
            1. Sampling Frequency: Number expressing kHz.
            2. Sample Precision: Number expressing bits/sample.
        2. Compression Formats:
          1. Definition: the method used to reduce the number of bits required to represent a Speech instance.
          2. Methods
            1. G711A (https://www.itu.int/rec/dologin_pub.asp?lang=f&id=T-REC-G.711-198811-I!!PDF-E&type=items)
            2. G711mu (https://www.itu.int/rec/dologin_pub.asp?lang=f&id=T-REC-G.711-198811-I!!PDF-E&type=items)
            3. MP3 (ISO/IEC 11172-3:1993)
            4. AAC2 (ISO/IEC 13818-7:2006)
            5. AAC4 (ISO/IEC 14496-3:2019)
    2. Transport
      1. Definition: the method used to transport Speech.
      2. Methods
        1. File
          1. Definition: the container of Speech.
          2. Containers
            1. WAV (https://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.2088-1-201910-I!!PDF-E.pdf)
            2. MP4 (ISO/IEC 14496-12:2022)
        2. Stream
          1. Definition: the method to move Content across the network.
          2. Methods
            1. DASH (ISO/IEC 23009-1:2022)
            2. HTTP Live Streaming (https://datatracker.ietf.org/doc/html/rfc8216)
  3. Attributes

    1. Source
      1. Definition: the type of Speech instance
      2. Types
        1. Real
        2. Synthetic
    2. Metadata
      1. Definition: the descriptive Data attached to a Speech instance.
      2. Descriptions
        1. Language
          1. Definition: the method used to indicate the Language used by a Speech instance.
          2. Methods
            1. ISO 636-1
            2. ISO 636-2
            3. ISO 636-3
        2. Speaker Identity
          1. Definition: the method used to identify a speaker.
          2. Methods
            1. Instance Identifier ((https://mpai.community/standards/mpai-osd/v1-1/data-types/instance-identifier/)
        3. Content Description
          1. Definition: the method used to describe the content of a Speech instance in words
          2. Methods
            1. ASCII
            2. Unicode (ISO/IEC10646)
        4. Entity Internal Status
          1. Definition: the method used to describe the internal status such as cognitive state, emotion, and social attitude.
          2. Methods
            1. Personal Status (https://mpai.community/standards/mpai-mmc/v2-2/data-types/personal-status/)
    3. Device
      1. Definition: elements of the device that captured the Speech instance.
      2. Elements
        1. Device ID
          1. Definition: an identifier of the device that captured the Speech instance
          2. Methods
            1. A string
        2. Device Location
          1. Definition: method to define the position and orientation of the device in a real or virtual space that captured the Speech instance.
          2. Methods
            1. Point of View (https://mpai.community/standards/mpai-osd/v1-1/data-types/point-of-view/)
        3. Sensor Characteristics
          1. Definition: sensor features having an impact on the captured Speech instance
          2. Sensor Features
            1. Omnidirectional
            2. Figure of eight
            3. Cardioid
            4.  Supercardioid
            5.  Hypercardioid