<- JSON Syntax and Semantics   Go t o ToC   Annex1->

1       Entity Context Understanding (HMC-ECU) 2       Entity Dialogue Processing (MMC-EDP)
3       Natural Language Understanding (MMC-NLU) 4       Personal Status Extraction (MMC-PSE)
5       Text and Speech Translation (MMC-TST) 6       Audio-Visual Scene Rendering (PAF-AVR)
7       Personal Status Display (PAF-PSD) 8       Text-to-Speech (MMC-TTS)

1        Entity Context Understanding (HMC-ECU)

1.1        Definition

HMC-ECU

  • Receives Audio-Visual Scene Descriptors.
  • Processes the Descriptors to enable Machine to achieve understanding of the information conveyed by an Entity and its Context.
  • Produces
    • Personal Status
    • Refined and Translated Text
    • Meaning
    • An Audio Instance ID
    • A Visual Instance ID
    • The Geometry of the Scene that contains the Audio and Visual Objects.
  • Enables the downstream Entity Dialogue Processing AIM to produce a pertinent Communication Item as a Portable Avatar.

1.2        Specification

Entity and Context Understanding is specified.

1.3        Attributes

An ECU AIM Profile is determined by whether the AIM uses one or more of the following attributes:

Attribute Code Function
Body Descriptors BDD HMC-ECU Receives Body Descriptors
Face Descriptors FCD HMC-ECU Receives Face Descriptors
Speech Object SPO HMC-ECU Receives Speech Object
Text Object TXO HMC-ECU Receives Text Object
Visual Object VIO HMC-ECU Receives Visual Object
Audio Object AUO HMC-ECU Receives Audio Object
Audio-Visual Scene Descriptors AVS HMC-ECU Receives Audio-Visual Scene Descriptors
Audio-Visual Scene Geometry AVG HMC-ECU Receives Audio-Visual Scene Geometry
Translation TRN HMC-ECU Translates Text Object

2        Entity Dialogue Processing (MMC-EDP)

2.1        Definition

MMC-EDP

  • Receives:
    • Text Object
    • Object Instance ID
    • Input Personal Status
    • Text Descriptors
    • AV Scene Geometry
    • Speaker ID
    • Face ID
    • Memory
  • Processes the information received:
    • Handling one Speech Object at a time.
    • Taking past Speech Objects into account.
  • Produces elements of the Machine Response to the data issued by the Entity in its Context in the form of:
    • Text
    • Personal Status.

2.2        Specification

MMC-EDP is specified.

2.3        Attributes

A Profile is determined by whether the AIM uses one or more of the following attributes:

Attribute Code Function
Text Object TXO MMC-EDP receives Text (directly from human or through NLU).
Object Instance ID OII MMC-EDP receives the ID of an A/V/AV Instance referenced in the dialogue.
Input Personal Status IPS MMC-EDP receives Personal Status.
Text Descriptors TXD MMC-EDP receives Meaning.
AV Scene Geometry AVG MMC-EDP receives AV Scene Geometry to enable it to locate the Object.
Speaker ID SPI MMC-EDP receives Speaker ID.
Face ID FCI MMC-EDP receives Face ID.
Memory MEM MMC-EDP takes into account prior Input Data of the dialogue session.

3        Natural Language Understanding (MMC-NLU)

3.1        Definition

MMC-NLU

  • Receives
    • Text Object directly input by the Entity.
    • Recognised Text from the Automatic Speech Recognition AIM.
    • An ID of an Instance.
    • The Audio-Visual Scene Geometry containing the Instance.
  • Refines the Input Text if coming from an Automatic Speech Recognition AIM and extracts the Meaning (Text Descriptors) from the Recognised Text or from a Text Object provided by the Entity.
  • Produces
    • Refined Text.
    • Text Descriptors (Meaning).
  • Enables the Personal Stats Display to produce a Portable Avatar.

3.2        Specification

MMC-NLU is specified.

3.3        Attributes

A Profile is determined by whether the AIM uses one or more of the following attributes:

Attribute Code Function
Text Object TXO MMC-NLU receives Text directly from human.
Recognised Text TXR MMC-NLU receives text from ASR.
Object Instance ID OII MMC-NLU receives Object Instance ID
Audio-Visual Scene Geometry AVG MMC-NLU receives AV Geometry.
Text Descriptors TXD MMC-NLU Produces Text Descriptors (Meaning)

4        Personal Status Extraction (MMC-PSE)

4.1        Definition

MMC-PSE

  • Receives
    • Text information
      • Text Selector informs about availability of Text Descriptors
      • Text Object
      • Text Descriptors
    • Speech information
      • Speech Selector informs about availability of Speech Descriptors
      • Speech Object
      • Speech Descriptors
    • Face information informs about availability of Face Descriptors
      • Face Selector
      • Face Object
      • Face Descriptors
    • Body information
      • Gesture Selector informs about availability of Gesture Descriptors
      • Body Object
      • Gesture Descriptors
    • Processes the received information
      • Computing the Modality (Text, Speech, Face, and Gesture) Descriptors for Cognitive State, Emotion and Social Attitude if Modality Selector signals that it is not already available.
      • Interpreting the Descriptors to produce the Personal Statuses of the Modalities.
      • Multiplexing the Personal Statuses of the Modalities into the Personal Status.
    • Produces Personal Status.
    • Enables the Entity Dialogue Processing to improve its ability to respond.

4.2        Specification

MMC-PSE is specified.

4.3        Attributes

A PSE AIM Profile is determined by whether the AIM uses one or more of the following Attributes:

Attribute Code Function
Text Object TXO MMC-PSE receives Text
Speech Object SPO MMC-PSE receives Speech
Face Object FCO MMC-PSE receives Face
Body Object BDO MMC-PSE receives Gesture

When an MMC-PSE is used as a component AIM in a Composite AIM as in the case of HMC-ECU, the MMC-PSE Attributes become Sub-Attributes of the Composite AIM.

 

5        Text and Speech Translation (MMC-TST)

5.1        Definition

MMC-TST

  • Receives:
    • Selector to inform whether
      • The AIM output should be Text or Speech.
      • The output Speech should retain the Features of the input Speech.
    • Language Preferences in the form of requested input and output language.
    • Personal Status.
    • Text.
    • Speech.
  • Performs (a subset of) the following:
    • Converts input Speech into Text using Personal Status.
    • Translates the Text to the target language
    • Extracts the Features from Speech.
    • Converts Text into Speech adding the Speech Features.
  • Produces:
    • Translated Text
    • Translated Speech

5.2        Specification

MMC-TST is specified.

5.3        Attributes

An MMC-TST AIM Profile is determined by whether the AIM uses one or more of the following Attributes:

Attributes Code Functions
Language Preferences LGP MMC-TST receives information on input and output languages.
Text Object TXO MMC-TST receives Text
Speech Object SPO MMC-TST receives Speech
Speech Descriptors SPD MMC-TST uses Speech Descriptors
Personal Status IPS MMC-TST receives Personal Status

When an MMC-TST is used as a component AIM in a Composite AIM as in the case of HMC-ECU, the LGP (Language Preferences) Attribute of MMC-TST become Sub-Attributes of the Composite AIM represented as 3-letter codes of [6], Part 3.

6        Audio-Visual Scene Rendering (PAF-AVR)

6.1        Definition

PAF-AVR

  • Receives
    • Audio-Visual Scene Descriptors or a Portable Avatar.
    • A Point of View.
  • Transforms the Portable Avatar into Audio-Visual Scene Descriptors.
  • Produces
    • Text included in the Portable Avatar.
    • Output Audio, the result of rendering the Audio Scene Descriptors from the Point of View.
    • Output Visual, the result of rendering the Visual Scene Descriptors from the Point of View.

6.2        Specification

PAF-AVR is specified.

6.3        Attributes

A PAF-AVR AIM Profile is determined by whether the AIM uses one or more of the following attributes:

Attribute Code Function
Point of View POV PAF-AVR is informed to provide Output Audio and/or Output Visual as perceived from a Point of View.
Portable Avatar PAV PAF-AVR receives a Portable Avatar and produces an Audio-Visual Scene from the Point of View.
Audio-Visual Scene Descriptors AVS PAF-AVR receives Audio-Visual Scene Descriptors and produces an Audio-Visual Scene from the Point of View.
Output Text TXO PAF-AVR produces Text Object.
Output Audio AUO PAF-AVR produces Audio Object.
Output Visual VIO PAF-AVR produces Visual Object.

7        Personal Status Display (PAF-PSD)

7.1        Definition

PAF-PSD

  • Receives
    • Text Object
    • Personal Status
    • Avatar Model
    • Speech Model
    • NN Format
  • Uses
    • Text and PS-Speech to produce the Machine Speech.
    • Machine Speech, Avatar Model, and PS-Face to produce Machine Face Descriptors.
    • Machine Text, Avatar Model, and PS-Gesture to produce Machine Body Descriptors
  • Produces Portable Avatar.
  • Enables PAF-AVR to render the Portable Avatar produced by PAF-PSD.

7.2        Specification

PAF-PSD is specified.

 

7.3        Attributes

A PAF-PSD AIM Profile is determined by whether the AIM uses one or more of the following Attributes:

Attribute Code Function
Text Object TXO PAF-PSD receives Text and produces Speech.
Personal Status IPS PAF-PSD receives Personal Status.
Speech Model SPM
Avatar Model AVM PAF-PSD receives an Avatar Model.

8        Text-to-Speech (MMC-TTS)

8.1        Definition

MMC-TTS

  • Receives
    • Text Object.
    • Personal Status.
    • Speech Model.
  • Feeds Text Object and Personal Status to Speech Model.
  • Produces an utterance.

8.2        Specification

MMC-TTS is specified.

8.3        Attributes

An MMC-TTS AIM Profile is determined by whether the AIM uses one or more of the following Attributes:

Attribute Code Function
Text Object TXO MMC-TTS receives Text Object
Personal Status IPS MMC-TTS receives Personal Status
Speech Model SPM MMC-TTS receives NN Speech Model

 <- JSON Syntax and Semantics   Go t o ToC   Annex1->