1.      Definition

Data representing various features of a Speech Segment, including speaker identity, prosody, and additional vocal elements including tension, whispery quality, or creaky voice.

 2      Functional Requirements

Speech Descriptors includes Neural Network Descriptors.

3     Syntax

https://schemas.mpai.community/MMC/V2.2/data/SpeechDescriptors.json

4     Semantics

Label Size Description
Header N1 Bytes Input Text Header
– Standard 9 Bytes The characters “MMC-SPD-V”
– Version N2 Bytes Major version – 1 or 2 characters
– Dot-separator 1 Byte The character “.”
– Subversion N3 Byte Minor version – 1 or 2 characters
MInstanceID N4 Bytes ID of the Metaverse Instance.
SpeechDescriptorsID N5 Bytes ID of Speech Descriptors.
SpeechDescriptorsData N7 Bytes Data associated to Input Text.
SpeechFeatures N8 Byte Indicates characteristic elements extracted from the input speech, specifically pitch, tone, intonation, intensity, speed, emotion, and NNspeechFeatures.
NNSpeechFeatures N9 Bytes Indicates specifically neural-network-based characteristic elements extracted from the input speech by Neural Network
pitch N10 Bytes Indicates the fundamental frequency of Speech expressed as a real number indicating frequency as Hz (Hertz).
tone N11 Bytes Tone is a variation in the pitch of the voice while speaking expressed as human readable words as in Table 48.
ToneType N12Byte Indicates the Tone that the input speech carries.
intonation N13 Bytes A variation of the pitch, intensity and speed within a time period measured in seconds.
intensity N14 Bytes Energy of Speech expressed as a real number indicating dBs (decibel).
speed N7 Bytes Indicates the Speech Rate as a real number indicating specified linguistic units (e.g., Phonemes, Syllables, or Words) per second.
emotion N15 Byte Indicates the Emotion that the input speech carries.
EmotionType N16 Bytes Indicates the Emotion that the input speech carries.
toneName N17 Bytes Specifies the name of a Tone.
toneSetName N18 Bytes Name of the Tone set which contains the Tone. Tone set is used as a baseline, but other sets are possible.