1. Definition
A Data Type representing various features of a Speech Segment, including speaker identity, prosody, and additional vocal elements including tension, whispery quality, or creaky voice.
2 Functional Requirements
Speech Descriptors may include Neural Network Descriptors.
3 Syntax
https://schemas.mpai.community/MMC/V2.2/data/SpeechDescriptors.json
4 Semantics
| Label | Size | Description |
| Header | N1 Bytes | Input Text Header |
| – Standard | 9 Bytes | The characters “MMC-SPD-V” |
| – Version | N2 Bytes | Major version – 1 or 2 characters |
| – Dot-separator | 1 Byte | The character “.” |
| – Subversion | N3 Byte | Minor version – 1 or 2 characters |
| MInstanceID | N4 Bytes | ID of the Metaverse Instance. |
| SpeechDescriptorsID | N5 Bytes | ID of Speech Descriptors. |
| SpeechDescriptorsData | N7 Bytes | Data associated to Input Text. |
| SpeechFeatures | N8 Byte | Indicates characteristic elements extracted from the input speech, specifically pitch, tone, intonation, intensity, speed, emotion, and NNspeechFeatures. |
| NNSpeechFeatures | N9 Bytes | Indicates specifically neural-network-based characteristic elements extracted from the input speech by Neural Network |
| pitch | N10 Bytes | Indicates the fundamental frequency of Speech expressed as a real number indicating frequency as Hz (Hertz). |
| tone | N11 Bytes | Tone is a variation in the pitch of the voice while speaking expressed as human readable words as in Table 48. |
| ToneType | N12Byte | Indicates the Tone that the input speech carries. |
| intonation | N13 Bytes | A variation of the pitch, intensity and speed within a time period measured in seconds. |
| intensity | N14 Bytes | Energy of Speech expressed as a real number indicating dBs (decibel). |
| speed | N7 Bytes | Indicates the Speech Rate as a real number indicating specified linguistic units (e.g., Phonemes, Syllables, or Words) per second. |
| emotion | N15 Byte | Indicates the Emotion that the input speech carries. |
| EmotionType | N16 Bytes | Indicates the Emotion that the input speech carries. |
| toneName | N17 Bytes | Specifies the name of a Tone. |
| toneSetName | N18 Bytes | Name of the Tone set which contains the Tone. Tone set is used as a baseline, but other sets are possible. |