1     Definition 2     Functional Requirements 3     Syntax
4     Semantics 5    Conformance Testing 6     Performance Assessment

1      Definition

A Data Type representing characteristic elements extracted from the input speech, specifically Pitch, Intensity, Tempo, Personal Status, and NNSpeechFeatures in a period of time.

2      Functional Requirements

Speech Descriptors may include Neural Network Descriptors.

3      Syntax

https://schemas.mpai.community/MMC/V2.3/data/SpeechDescriptors.json

4      Semantics

Label Size Description
Header N1 Bytes Speech Descriptors Header
– Standard – SpeechDescriptors 9 Bytes The characters “MMC-SPD-V”
– Version N2 Bytes Major version – 1 or 2 characters
– Dot-separator 1 Byte The character “.”
– Subversion N3 Byte Minor version – 1 or 2 characters
MInstanceID N4 Bytes ID of the Metaverse Instance.
SpeechDescriptorsID N5 Bytes ID of Speech Descriptors.
SpeechDescriptorsData N7 Bytes Data associated with Input Text.
NNSpeechFeatures N8 Bytes The output vector of a neural-network using Speech as input.
Duration N9 Bytes The Time in which the Speech Descriptors are computed.
Pitch N10 Bytes Real number measuring the fundamental frequency of Speech in Hz (Hertz).
Intensity N11 Bytes Real number measuring the Energy of Speech in dBs (decibel).
Tempo N12 Byte Real number measuring the rate at which specified linguistic units (Phonemes, Syllables, or Words) are produced.
Personal Status N13 Byte The Speech Personal Status carried by the input speech.

5     Conformance Testing

A Data instance Conforms with MPAI-MMC V2.3 Speech Descriptors (MMC-SPD) if:

  1. The Data validates against the Speech Descriptors’ JSON Schema.
  2. All Data in the  Speech Descriptors’ JSON Schema
    1. Have the specified type
    2. Validate against their JSON Schemas
    3. Conform with their Data Qualifiers if present.

6     Performance Assessment