<- Scope Go t o ToC References ->
Terms beginning with a capital letter have the meaning defined in Table 1. Terms beginning with a small letter have the meaning commonly defined for the context in which they are used. For instance, Table 1 defines Object and Scene but does not define object and scene.
A dash “-” preceding a Term in Table 1 indicates the following readings according to the font:
- Normal font: the Term in the table without a dash and preceding the one with a dash should be read before that Term. For example, “Avatar” and “- Model” will yield “Avatar Model.”
- Italic font: the Term in Table 1 without a dash and preceding the one with a dash should be read after that Term. For example, “Avatar” and “- Portable” will yield “Portable Avatar.”
Table 1 – Table of terms and definitions
Term | Definition |
Attitude | |
– Social | The coded representation of the internal state related to the way a human or avatar intends to position vis-à-vis the Environment or subsets of it, e.g., “Respectful”, “Confrontational”, “Soothing”. |
– Spatial | Position and Orientation and their velocities and accelerations of an Audio and Visual Object in a Virtual Environment. |
Audio | Digital representation of an analogue audio signal sampled at a frequency between 8-192 kHz with a number of bits/sample between 8 and 32, and non-linear and linear quantisation. |
– Object | Coded representation of Audio information with its metadata. An Audio Object can be a combination of Audio Objects. |
– Scene | The Audio Objects of an Environment with Object location metadata. |
Audio-Visual Object | Coded representation of Audio-Visual information with its metadata. An Audio-Visual Object can be a combination of Audio-Visual Objects. |
Audio-Visual Scene | (AV Scene) The Audio-Visual Objects of an Environment with Object location metadata. |
Avatar | An animated 3D object representing a real or fictitious person in a Virtual Space. |
– Model | An inanimate avatar exposing interfaces enabling animation. |
Cognitive State | The coded representation of the internal state reflecting the way a human or avatar understands the Environment, such as “Confused”, “Dubious”, “Convinced”. |
Colour (of speech) | The timber of an identifiable voice independent of a current Personal Status and language. |
Connected Autonomous Vehicle | A vehicle able to autonomously reach an assigned geographical position by:
1. Understanding human utterances. 2. Planning a route. 3. Sensing and interpreting the Environment. 4. Exchanging information with other CAV. 5. Acting on the CAV’s motion actuation subsystem. |
Context | Information surrounding an Entity and providing additional information about the communication emitted by the Entity. |
Data | Information in digital form. |
– Format | The standard digital representation of Data. |
– Type | An instance of Data with a specific Data Format. |
Descriptor | Coded representation of text, audio, speech, or visual feature. |
Digital Representation | Data corresponding to and representing a real entity. |
Emotion | The coded representation of the internal state resulting from the interaction of a human or avatar with the Environment or subsets of it, such as “Angry”, “Sad”, “Determined”. |
Entity | A real or Digital Human |
Environment | A Virtual Space containing a Scene. |
Face | The portion of a 2D or 3D digital representation corresponding to the face of a human. |
Factor | One of Emotion, Cognitive State and Attitude. |
Gesture | A movement of the body or part of it, such as the head, arm, hand, and finger, often a complement to a vocal utterance. |
Grade | The intensity of a Factor. |
Human | A human being in a real space. |
– Digital | A Digitised or a Virtual Human in a Virtual Space. |
– Digitised | An Object in a Virtual Space that has the appearance of a specific human when rendered. |
– Virtual | An Object in a Virtual Space created by a computer that has a human appearance when rendered but is not a Digitised Human. |
Identifier | The label uniquely associated with a human or an avatar or an object. |
Instance | An element of a set of entities – Objects, users etc. – belonging to some levels in a hierarchical classification (taxonomy). |
Intention | The result of analysis of the goal of an input question. |
Manifestation | The manner of showing the Personal Status, or a subset of it, in any one of Speech, Face, and Gesture. |
Meaning | Information extracted from Text such as syntactic and semantic information, Personal Status, and other information, such as an Object Identifier. |
Modality | One of Text, Speech, Face, or Gesture. |
Object Descriptor | An individual attribute of the coded representation of an object in a Scene, including its Spatial Attitude. |
Orientation | The set of the 3 roll, pitch, yaw angles indicating the rotation around the principal axis (x) of an Object, its y axis having an angle of 90˚ counterclockwise (right-to-left) with the x axis and its z axis pointing up toward the viewer. |
Personal Status | The ensemble of information internal to a person, including Emotion, Cognitive State, and Attitude. |
Portable Avatar | A Data Type representing an Avatar and its Context. |
Pitch | The fundamental frequency of Speech. Pitch is the attribute that makes it possible to judge sounds as “higher” and “lower.” |
Point of View | The Spatial Attitude of a human or avatar looking at an Environment. |
Position | The 3 coordinates (x,y,z) of a representative point of an object in the Real and Virtual Space. |
Refined Text | The Text resulting from the analysis of the Text produced by Automatic Speech Recognition made by Natural Language Understanding. |
Scene | A structured composition of Objects. |
Speech | Digital representation of analogue speech sampled at a frequency between 8 kHz and 96 kHz with a number of bits/sample of 8, 16 and 24, and non-linear and linear quantisation. |
– Features | Aspects of a speech segment that enable its description and reproduction, e.g., degree of vocal tension, Pitch, etc., and that can be automatically recognised and extracted for speech synthesis or other related purposes. |
– Rate | The number of Speech Units per second. |
– Unit | Phoneme, syllable, or word as a segment of Speech. |
Summary | An abridged outline of the content of the utterance(s) of one or more Users possibly including their Personal Statuses. |
Text | A sequence of characters drawn from a finite alphabet. |
Visual Object | Coded representation of Visual information with its metadata. A Video Object can be a combination of Video Objects. |
Vocal Gesture | Utterance, such as cough, laugh, hesitation, etc. Lexical elements are excluded. |