1   Definition

A Data Type that can be used by a Text-To-Speech AI Module to generate a Speech Object from a Text Object.

2   Functional Requirements

The generated Speech Object can be perceived as:

  1. Having been generated by a specific human.
  2. Having a specific language or dialect intonation.
  3. Not having any specific connotation.
  4. Etc.

The Speech Model can be implemented as a Neural Network trained to generate utterances with specific features. A specific Neural Network has its own set of parameters that define it as having a particular Format identified by an ID.

3   Syntax

https://schemas.mpai.community/MMC/V2.2/data/SpeechModel.json

4   Semantics

Label Size Description
Header N1 Bytes
·       Standard-Visual Object 9 Bytes The characters “MMC-SML-V”
·       Version N2 Bytes Major version – 1 or 2 characters
·       Dot-separator 1 Byte The character “.”
·       Subversion N3 Bytes Minor version – 1 or 2 characters
MInstanceID N4 Bytes Identifier of M-Instance.
SpeechModelID N5 Bytes Identifier of the Visual Object.
SpeechModelData N6 Bytes Standard set of Model Data
– SpeechModelQualifier N7 Bytes Model Format ID
– Speech ModelPayload N8 Bytes Set of Data Length and URI
  – SpeechModelDataLength N9 Bytes Model Data Length in Bytes
  – SpeechModeDataURI N10 Bytes Mode Data URI
DescrMetadata N10 Bytes Descriptive Metadata

5   Data Formats

IDs required:

  1. ModelFormat
  2. ModelAttribute
  3. ModeAttributelFormat

6   To Respondents

Respondents are requested to:

  1. Comment on the characterisation of Speech Model.
  2. Propose Speech Model Formats.