<- Scope     Go to ToC       References ->

Terms beginning with a capital letter have the meaning defined in Table 1. Terms beginning with a small letter have the meaning commonly defined for the context in which they are used. For instance, Table 1 defines Object and Scene but does not define object and scene.

A dash “-” preceding a Term in Table 1 indicates the following readings according to the font:

  1. Normal font: the Term in the table without a dash and preceding the one with a dash should be read before that Term. For example, “Avatar” and “- Model” will yield “Avatar Model.”
  2. Italic font: the Term in the table without a dash and preceding the one with a dash should be read after that Term. For example, “Avatar” and “- Portable” will yield “Portable Avatar.”

The full set of Terms and Definitions relevant to all MPAI Technical Specifications, including MPAI-HMC, can be accessed online.

Table 1 – General MPAI-HMC terms

–       Social The coded representation of the internal state related to the way a human or avatar intends to position vis-à-vis the Environment or subsets of it, e.g., “Respectful”, “Confrontational”, “Soothing”.
–       Spatial Position and Orientation and their velocities and accelerations of an Object in a Real or Virtual Environment.
Audio Digital representation of an analogue audio signal sampled at a frequency between 8-192 kHz with a number of bits/sample between 8 and 32, and non-linear and linear quantisation. Data with characteristics of Audio may be synthetically produced.
Audio Block A set of consecutive Audio samples.
Audio Channel A sequence of Audio Blocks.
Avatar An Object rendered to represent a Human of a Machine in a virtual space.
–       Model An inanimate Avatar exposing animation interfaces.
–       Portable A Data Type including Avatar ID, Time, Visual Environment, Spatial Attitude, Avatar Model, Body Descriptors, Face Descriptors, Language Preference, Speech Coding, Speech Data, Text, and Personal Status [8].
Body A digital representation of a human body, head included, face excluded.
Centre Point The point of an Object selected to have Local Coordinates (0,0,0).
Cognitive State The coded representation of the internal state reflecting the way a human or avatar understands the Environment, such as “Confused”, “Dubious”, “Convinced”.
Communication Item An element generated by a Machine communicating with an Entity expressed with a Portable Avatar.
Context Information surrounding an Entity and providing additional insight into the information the Entity communicates.
Coordinate System A coordinate system where the position of a point is specified by three numbers.
–       Cartesian A coordinate system where the three numbers are the signed distances from the point to three mutually perpendicular planes.
–       Spherical A coordinate system where the three numbers are:

–       the radial distance of that point from a fixed origin.

–       the polar angle measured from a fixed zenith direction.

–       the azimuthal angle of its orthogonal projection on a reference plane.

Culture The collection of language and customs governing the way a human, or a group of humans employ to express their internal statuses.
Data Information in digital form.
–       Format The standard digital representation of Data.
–       Type An instance of Data with a specific Data Format.
Descriptor The Digital Representation of a feature of an Object.
–       Body A Data Type including the digital representation of the features of the body of a real or digital human.
–       Face A Data Type including the digital representation of a feature of the face of a real or digital human.
–       Speech A Data Type including the digital representation of a feature of speech of a real or digital human, such as degree of vocal tension, pitch, etc.
–       Text A Data Type including the digital representation of a feature of text.
Digital Representation Data corresponding to and representing a physical entity.
Emotion The coded representation of the internal state resulting from the interaction of a human or avatar with the Environment or subsets of it, such as “Angry”, “Sad”, “Determined”.
Entity A human in a real environment or digitally represented as a Digitised Human in a Virtual Environment a Digital or a Virtual Human in a Virtual Environment.
Environment A Virtual Space that may be null or may include an Audio-Visual Scene.
Experience The state of an Entity whose senses/sensors are continuously affected for a meaningful period.
Face A digital representation of a human face.
Factor One of Emotion, Cognitive State, and Attitude.
Gesture A movement of a Digital Human or part of it, such as the head, arm, hand, and finger, often a complement to a vocal utterance.
Human A human being in a real space.
–       Digital A Digitised or a Virtual Human in a Virtual Space.
–       Digitised An Object in a Virtual Space that has the appearance of a specific human when rendered.
–       Virtual An Object in a Virtual Space created by a computer that has a human appearance when rendered but is not a Digitised Human.
Identifier The label uniquely associated with a human or an Object.
Instance An element of a set of entities – Objects, Digital Humans etc. – belonging to some levels in a hierarchical classification (taxonomy).
–       Audio The instance of an Audio Object.
–       Visual The instance of a Visual Object.
Machine An Implementation of MPAI-MMC.
Meaning Information extracted from Text such as syntactic and semantic information, Personal Status, and other information, such as an Object Identifier.
Microphone Array A microphone system that uses multiple microphones arranged in a specific pattern to capture audio in an audio space.
–       Geometry A Data Type representing the spatial arrangement of the microphones in a Microphone Array.
Modality One of Text, Speech, Face, or Gesture.
Object A data structure that can be rendered to cause an Experience.
–       Audio An Object described by Audio Descriptors.
–       Audio-Visual An Object described by Audio-Visual Descriptors.
–       Body A digital representation of the body of a Human or a Machine.
–       Descriptor The digital representation of the feature of an Object.
–       Digital A Digitised or a Virtual Object.
–       Digitised The digital representation of a real object.
–       Face The digital representation of the face of a Human or a Machine.
–       Speech An Object described by Speech Descriptors.
–       Text A string of Text.
–       Virtual An Object not representing an object in the real environment.
–       Visual An Object described by Visual Descriptors.
Orientation The 3 Euler angles of an Object in a Virtual Space.
Personal Status A Data Type including three Factors – Cognitive State, Emotion and Social Attitude – conveyed by four Modalities – Text, Speech, Face, and Gesture and providing standard extensible labels for the three Factors [6].
–       Face The Cognitive State, Emotion, and Social Attitude conveyed by a Face Object.
–       Gesture The Cognitive State, Emotion, and Social Attitude conveyed by the Gesture of a Body Object.
–       Speech The Cognitive State, Emotion, and Social Attitude conveyed by a Speech Object.
–       Text The Cognitive State, Emotion, and Social Attitude conveyed by a Text Object.
Portable Avatar A Data Type representing an Avatar and its Context.
Position The coordinates of a representative point for an object in a Virtual Space with respect to a set of coordinate axes.
Principal Axis The x axis of an Object.
Rendering The process of instantiating a Virtual Space as a human-perceptible entity.
Scene A composition of Objects located according to a Scene Geometry.
–       Audio A Scene composed of Audio Objects.
–       Audio-Visual A Scene composed of Audio Objects, Visual Objects and co-located Audio-Visual Objects.
–       Multichannel A data structure containing at least 2 time-aligned interleaved Audio Channels.
–       Visual A Scene composed of Visual Objects.
Scene Descriptors The digital representation of a feature of a scene.
–       Audio A Data Type including the digital representation of the audio features of a real or digital scene.
–       Audio-Visual A Data Type combining the Audio or Visual Scene Descriptors.
–       Visual A Data Type including the digital representation of the visual features of a real or digital scene.
Scene Geometry The digital representation of the Object arrangement of a Scene.
–       Audio A Data Type describing the spatial arrangement of the Visual Objects of a Scene.
–       Audio-Visual A Data Type describing the spatial arrangement of the Audio, Visual, and Audio-Visual Objects of a Scene.
–       Visual A Data Type describing the spatial arrangement of the Visual Objects of a Scene.
Selector Input Data having the goal to set a parameter (e.g., use of Text vs Speech or Language Preference) or an operating mode of a Machine.
Speech Digital representation of analogue speech sampled at a frequency between 8 kHz and 96 kHz with a number of bits/sample of 8, 16 or 24, and non-linear and linear quantisation or compressed. Data with characteristics of Speech may be synthetically produced.
Text A sequence of characters represented according to [12].
–       Recognised The Text at the output of an Automatic Speech Recognition AIM.
–       Refined Text The Text at the output of a Natural Language Understanding AIM.
–       Translated Text The Text at the output of a Natural Language Translation AIM.
Virtual Space A space generated and maintained by a computing platform that can be rendered.


 <- Scope     Go to ToC       References ->