MPAI-MMC V2.2 Data Types Speech Descriptors

1. Definition

A Data Type representing various features of a Speech Segment, including speaker identity, prosody, and additional vocal elements including tension, whispery quality, or creaky voice.

2 Functional Requirements

Speech Descriptors may include Neural Network Descriptors.

3 Syntax

https://schemas.mpai.community/MMC/V2.2/data/SpeechDescriptors.json

4 Semantics

Label	Size	Description
Header	N1 Bytes	Input Text Header
– Standard	9 Bytes	The characters “MMC-SPD-V”
– Version	N2 Bytes	Major version – 1 or 2 characters
– Dot-separator	1 Byte	The character “.”
– Subversion	N3 Byte	Minor version – 1 or 2 characters
MInstanceID	N4 Bytes	ID of the Metaverse Instance.
SpeechDescriptorsID	N5 Bytes	ID of Speech Descriptors.
SpeechDescriptorsData	N7 Bytes	Data associated to Input Text.
SpeechFeatures	N8 Byte	Indicates characteristic elements extracted from the input speech, specifically pitch, tone, intonation, intensity, speed, emotion, and NNspeechFeatures.
NNSpeechFeatures	N9 Bytes	Indicates specifically neural-network-based characteristic elements extracted from the input speech by Neural Network
pitch	N10 Bytes	Indicates the fundamental frequency of Speech expressed as a real number indicating frequency as Hz (Hertz).
tone	N11 Bytes	Tone is a variation in the pitch of the voice while speaking expressed as human readable words as in Table 48.
ToneType	N12Byte	Indicates the Tone that the input speech carries.
intonation	N13 Bytes	A variation of the pitch, intensity and speed within a time period measured in seconds.
intensity	N14 Bytes	Energy of Speech expressed as a real number indicating dBs (decibel).
speed	N7 Bytes	Indicates the Speech Rate as a real number indicating specified linguistic units (e.g., Phonemes, Syllables, or Words) per second.
emotion	N15 Byte	Indicates the Emotion that the input speech carries.
EmotionType	N16 Bytes	Indicates the Emotion that the input speech carries.
toneName	N17 Bytes	Specifies the name of a Tone.
toneSetName	N18 Bytes	Name of the Tone set which contains the Tone. Tone set is used as a baseline, but other sets are possible.

Cookie	Duration	Description
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Technical".
CookieLawInfoConsent	1 year	The cookie is set by the GDPR Cookie Consent plug-in and is used to store whether the user has consented to the use of cookies or not. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_pk_id.6.08a8	13 months	Used to store a few details about the user such as the unique visitor ID
_pk_ses.6.08a8	30 minutes	Short lived cookies used to temporarily store data for the visit

MPAI-MMC V2.2 Data Types Speech Descriptors

1. Definition

2 Functional Requirements

3 Syntax

4 Semantics

Notice