1     Definition 2     Functional Requirements 3     Syntax
4     Semantics 5    Conformance Testing 6     Performance Assessment

1      Definition

A Data Type representing characteristic elements extracted from the input speech, specifically Pitch, Intensity, Speed, Personal Status, and NNSpeechFeatures.

2      Functional Requirements

Speech Descriptors may include Neural Network Descriptors.

3      Syntax

https://schemas.mpai.community/MMC/V2.3/data/SpeechDescriptors.json

4      Semantics

Label Size Description
Header N1 Bytes Speech Descriptors Header
– Standard – SpeechDescriptors 9 Bytes The characters “MMC-SPD-V”
– Version N2 Bytes Major version – 1 or 2 characters
– Dot-separator 1 Byte The character “.”
– Subversion N3 Byte Minor version – 1 or 2 characters
MInstanceID N4 Bytes ID of the Metaverse Instance.
SpeechDescriptorsID N5 Bytes ID of Speech Descriptors.
SpeechDescriptorsData N7 Bytes Data associated with Input Text.
NNSpeechFeatures N8 Bytes Indicates specifically neural-network-based characteristic elements extracted from the input speech by Neural Network
Pitch N9 Bytes Indicates the fundamental frequency of Speech expressed as a real number indicating frequency as Hz (Hertz).
Intensity N10 Bytes Energy of Speech expressed as a real number indicating dBs (decibel).
Tempo N11 Bytes Indicates the Speech Rate as a real number indicating specified linguistic units (Phonemes, Syllables, or Words) per second.
Personal Status N12 Byte Indicates the Speech Personal Status that the input speech carries.

5     Conformance Testing

A Data instance Conforms with MPAI-MMC V2.3 Speech Descriptors (MMC-SPD) if:

  1. The Data validates against the Speech Descriptors’ JSON Schema.
  2. All Data in the  Speech Descriptors’ JSON Schema
    1. Have the specified type
    2. Validate against their JSON Schemas
    3. Conform with their Data Qualifiers if present.

6     Performance Assessment