Terms beginning with a capital letter have the meaning defined in Table 1. Terms beginning with a small letter have the meaning commonly defined for the context in which they are used. For instance, Table 1 defines Object and Scene but does not define object and scene.
A dash “-” preceding a Term in Table 1 indicates the following readings according to the font:
- Normal font: the Term in the table without a dash and preceding the one with a dash should be read before that Term. For example, “Avatar” and “- Model” will yield “Avatar Model.”
- Italic font: the Term in Table 1 without a dash and preceding the one with a dash should be read after that Term. For example, “Avatar” and “- Portable” will yield “Portable Avatar.”
Table 1 – Table of terms and definitions
|The coded representation of the internal state related to the way a human or avatar intends to position vis-à-vis the Environment or subsets of it, e.g., “Respectful”, “Confrontational”, “Soothing”.
|Position and Orientation and their velocities and accelerations of an Audio and Visual Object in a Virtual Environment.
|Digital representation of an analogue audio signal sampled at a frequency between 8-192 kHz with a number of bits/sample between 8 and 32, and non-linear and linear quantisation.
|Coded representation of Audio information with its metadata. An Audio Object can be a combination of Audio Objects.
|The Audio Objects of an Environment with Object location metadata.
|Coded representation of Audio-Visual information with its metadata. An Audio-Visual Object can be a combination of Audio-Visual Objects.
|(AV Scene) The Audio-Visual Objects of an Environment with Object location metadata.
|An animated 3D object representing a real or fictitious person in a Virtual Space.
|An inanimate avatar exposing interfaces enabling animation.
|The coded representation of the internal state reflecting the way a human or avatar understands the Environment, such as “Confused”, “Dubious”, “Convinced”.
|Colour (of speech)
|The timber of an identifiable voice independent of a current Personal Status and language.
|Connected Autonomous Vehicle
|A vehicle able to autonomously reach an assigned geographical position by:
1. Understanding human utterances.
2. Planning a route.
3. Sensing and interpreting the Environment.
4. Exchanging information with other CAV.
5. Acting on the CAV’s motion actuation subsystem.
|Information surrounding an Entity and providing additional information about the communication emitted by the Entity.
|Information in digital form.
|The standard digital representation of Data.
|An instance of Data with a specific Data Format.
|Coded representation of text, audio, speech, or visual feature.
|Data corresponding to and representing a real entity.
|The coded representation of the internal state resulting from the interaction of a human or avatar with the Environment or subsets of it, such as “Angry”, “Sad”, “Determined”.
|A real or Digital Human
|A Virtual Space containing a Scene.
|The portion of a 2D or 3D digital representation corresponding to the face of a human.
|One of Emotion, Cognitive State and Attitude.
|A movement of the body or part of it, such as the head, arm, hand, and finger, often a complement to a vocal utterance.
|The intensity of a Factor.
|A human being in a real space.
|A Digitised or a Virtual Human in a Virtual Space.
|An Object in a Virtual Space that has the appearance of a specific human when rendered.
|An Object in a Virtual Space created by a computer that has a human appearance when rendered but is not a Digitised Human.
|The label uniquely associated with a human or an avatar or an object.
|An element of a set of entities – Objects, users etc. – belonging to some levels in a hierarchical classification (taxonomy).
|The result of analysis of the goal of an input question.
|The manner of showing the Personal Status, or a subset of it, in any one of Speech, Face, and Gesture.
|Information extracted from Text such as syntactic and semantic information, Personal Status, and other information, such as an Object Identifier.
|One of Text, Speech, Face, or Gesture.
|An individual attribute of the coded representation of an object in a Scene, including its Spatial Attitude.
|The set of the 3 roll, pitch, yaw angles indicating the rotation around the principal axis (x) of an Object, its y axis having an angle of 90˚ counterclockwise (right-to-left) with the x axis and its z axis pointing up toward the viewer.
|The ensemble of information internal to a person, including Emotion, Cognitive State, and Attitude.
|A Data Type representing an Avatar and its Context.
|The fundamental frequency of Speech. Pitch is the attribute that makes it possible to judge sounds as “higher” and “lower.”
|Point of View
|The Spatial Attitude of a human or avatar looking at an Environment.
|The 3 coordinates (x,y,z) of a representative point of an object in the Real and Virtual Space.
|The Text resulting from the analysis of the Text produced by Automatic Speech Recognition made by Natural Language Understanding.
|A structured composition of Objects.
|Digital representation of analogue speech sampled at a frequency between 8 kHz and 96 kHz with a number of bits/sample of 8, 16 and 24, and non-linear and linear quantisation.
|Aspects of a speech segment that enable its description and reproduction, e.g., degree of vocal tension, Pitch, etc., and that can be automatically recognised and extracted for speech synthesis or other related purposes.
|The number of Speech Units per second.
|Phoneme, syllable, or word as a segment of Speech.
|An abridged outline of the content of the utterance(s) of one or more Users possibly including their Personal Statuses.
|A sequence of characters drawn from a finite alphabet.
|Coded representation of Visual information with its metadata. A Video Object can be a combination of Video Objects.
|Utterance, such as cough, laugh, hesitation, etc. Lexical elements are excluded.