Terms beginning with a capital letter have the meaning defined in Table 1. Terms beginning with a small letter have the meaning commonly defined for the context in which they are used. For instance, Table 1 defines Object and Scene but does not define object and scene.
A dash “-” preceding a Term in Table 1 indicates the following readings according to the font:
- Normal font: the Term in the table without a dash and preceding the one with a dash should be read before that Term. For example, “Avatar” and “- Model” will yield “Avatar Model.”
- Italic font: the Term in the table without a dash and preceding the one with a dash should be read after that Term. For example, “Avatar” and “- Portable” will yield “Portable Avatar.”
The full set of Terms and Definitions relevant to all MPAI Technical Specifications, including MPAI-HMC, can be accessed online.
Table 1 – General MPAI-HMC terms
|The coded representation of the internal state related to the way a human or avatar intends to position vis-à-vis the Environment or subsets of it, e.g., “Respectful”, “Confrontational”, “Soothing”.
|Position and Orientation and their velocities and accelerations of an Object in a Real or Virtual Environment.
|Digital representation of an analogue audio signal sampled at a frequency between 8-192 kHz with a number of bits/sample between 8 and 32, and non-linear and linear quantisation. Data with characteristics of Audio may be synthetically produced.
|A set of consecutive Audio samples.
|A sequence of Audio Blocks.
|An Object rendered to represent a Human of a Machine in a virtual space.
|An inanimate Avatar exposing animation interfaces.
|A Data Type including Avatar ID, Time, Visual Environment, Spatial Attitude, Avatar Model, Body Descriptors, Face Descriptors, Language Preference, Speech Coding, Speech Data, Text, and Personal Status .
|A digital representation of a human body, head included, face excluded.
|The point of an Object selected to have Local Coordinates (0,0,0).
|The coded representation of the internal state reflecting the way a human or avatar understands the Environment, such as “Confused”, “Dubious”, “Convinced”.
|An element generated by a Machine communicating with an Entity expressed with a Portable Avatar.
|Information surrounding an Entity and providing additional insight into the information the Entity communicates.
|A coordinate system where the position of a point is specified by three numbers.
|A coordinate system where the three numbers are the signed distances from the point to three mutually perpendicular planes.
|A coordinate system where the three numbers are:
– the radial distance of that point from a fixed origin.
– the polar angle measured from a fixed zenith direction.
– the azimuthal angle of its orthogonal projection on a reference plane.
|The collection of language and customs governing the way a human, or a group of humans employ to express their internal statuses.
|Information in digital form.
|The standard digital representation of Data.
|An instance of Data with a specific Data Format.
|The Digital Representation of a feature of an Object.
|A Data Type including the digital representation of the features of the body of a real or digital human.
|A Data Type including the digital representation of a feature of the face of a real or digital human.
|A Data Type including the digital representation of a feature of speech of a real or digital human, such as degree of vocal tension, pitch, etc.
|A Data Type including the digital representation of a feature of text.
|Data corresponding to and representing a physical entity.
|The coded representation of the internal state resulting from the interaction of a human or avatar with the Environment or subsets of it, such as “Angry”, “Sad”, “Determined”.
|A human in a real environment or digitally represented as a Digitised Human in a Virtual Environment a Digital or a Virtual Human in a Virtual Environment.
|A Virtual Space that may be null or may include an Audio-Visual Scene.
|The state of an Entity whose senses/sensors are continuously affected for a meaningful period.
|A digital representation of a human face.
|One of Emotion, Cognitive State, and Attitude.
|A movement of a Digital Human or part of it, such as the head, arm, hand, and finger, often a complement to a vocal utterance.
|A human being in a real space.
|A Digitised or a Virtual Human in a Virtual Space.
|An Object in a Virtual Space that has the appearance of a specific human when rendered.
|An Object in a Virtual Space created by a computer that has a human appearance when rendered but is not a Digitised Human.
|The label uniquely associated with a human or an Object.
|An element of a set of entities – Objects, Digital Humans etc. – belonging to some levels in a hierarchical classification (taxonomy).
|The instance of an Audio Object.
|The instance of a Visual Object.
|An Implementation of MPAI-MMC.
|Information extracted from Text such as syntactic and semantic information, Personal Status, and other information, such as an Object Identifier.
|A microphone system that uses multiple microphones arranged in a specific pattern to capture audio in an audio space.
|A Data Type representing the spatial arrangement of the microphones in a Microphone Array.
|One of Text, Speech, Face, or Gesture.
|A data structure that can be rendered to cause an Experience.
|An Object described by Audio Descriptors.
|An Object described by Audio-Visual Descriptors.
|A digital representation of the body of a Human or a Machine.
|The digital representation of the feature of an Object.
|A Digitised or a Virtual Object.
|The digital representation of a real object.
|The digital representation of the face of a Human or a Machine.
|An Object described by Speech Descriptors.
|A string of Text.
|An Object not representing an object in the real environment.
|An Object described by Visual Descriptors.
|The 3 Euler angles of an Object in a Virtual Space.
|A Data Type including three Factors – Cognitive State, Emotion and Social Attitude – conveyed by four Modalities – Text, Speech, Face, and Gesture and providing standard extensible labels for the three Factors .
|The Cognitive State, Emotion, and Social Attitude conveyed by a Face Object.
|The Cognitive State, Emotion, and Social Attitude conveyed by the Gesture of a Body Object.
|The Cognitive State, Emotion, and Social Attitude conveyed by a Speech Object.
|The Cognitive State, Emotion, and Social Attitude conveyed by a Text Object.
|A Data Type representing an Avatar and its Context.
|The coordinates of a representative point for an object in a Virtual Space with respect to a set of coordinate axes.
|The x axis of an Object.
|The process of instantiating a Virtual Space as a human-perceptible entity.
|A composition of Objects located according to a Scene Geometry.
|A Scene composed of Audio Objects.
|A Scene composed of Audio Objects, Visual Objects and co-located Audio-Visual Objects.
|A data structure containing at least 2 time-aligned interleaved Audio Channels.
|A Scene composed of Visual Objects.
|The digital representation of a feature of a scene.
|A Data Type including the digital representation of the audio features of a real or digital scene.
|A Data Type combining the Audio or Visual Scene Descriptors.
|A Data Type including the digital representation of the visual features of a real or digital scene.
|The digital representation of the Object arrangement of a Scene.
|A Data Type describing the spatial arrangement of the Visual Objects of a Scene.
|A Data Type describing the spatial arrangement of the Audio, Visual, and Audio-Visual Objects of a Scene.
|A Data Type describing the spatial arrangement of the Visual Objects of a Scene.
|Input Data having the goal to set a parameter (e.g., use of Text vs Speech or Language Preference) or an operating mode of a Machine.
|Digital representation of analogue speech sampled at a frequency between 8 kHz and 96 kHz with a number of bits/sample of 8, 16 or 24, and non-linear and linear quantisation or compressed. Data with characteristics of Speech may be synthetically produced.
|A sequence of characters represented according to .
|The Text at the output of an Automatic Speech Recognition AIM.
|– Refined Text
|The Text at the output of a Natural Language Understanding AIM.
|– Translated Text
|The Text at the output of a Natural Language Translation AIM.
|A space generated and maintained by a computing platform that can be rendered.