MPAI-HMC Definitions

<- Scope Go to ToC References ->

Terms beginning with a capital letter have the meaning defined in Table 1. Terms beginning with a small letter have the meaning commonly defined for the context in which they are used. For instance, Table 1 defines Object and Scene but does not define object and scene.

A dash “-” preceding a Term in Table 1 indicates the following readings according to the font:

Normal font: the Term in the table without a dash and preceding the one with a dash should be read before that Term. For example, “Avatar” and “- Model” will yield “Avatar Model.”
Italic font: the Term in the table without a dash and preceding the one with a dash should be read after that Term. For example, “Avatar” and “- Portable” will yield “Portable Avatar.”

The full set of Terms and Definitions relevant to all MPAI Technical Specifications, including MPAI-HMC, can be accessed online.

Table 1 – General MPAI-HMC terms

Attitude
– Social	The coded representation of the internal state related to the way a human or avatar intends to position vis-à-vis the Environment or subsets of it, e.g., “Respectful”, “Confrontational”, “Soothing”.
– Spatial	Position and Orientation and their velocities and accelerations of an Object in a Real or Virtual Environment.
Audio	Digital representation of an analogue audio signal sampled at a frequency between 8-192 kHz with a number of bits/sample between 8 and 32, and non-linear and linear quantisation. Data with characteristics of Audio may be synthetically produced.
Audio Block	A set of consecutive Audio samples.
Audio Channel	A sequence of Audio Blocks.
Avatar	An Object rendered to represent a Human of a Machine in a virtual space.
– Model	An inanimate Avatar exposing animation interfaces.
– Portable	A Data Type including Avatar ID, Time, Visual Environment, Spatial Attitude, Avatar Model, Body Descriptors, Face Descriptors, Language Preference, Speech Coding, Speech Data, Text, and Personal Status [8].
Body	A digital representation of a human body, head included, face excluded.
Centre Point	The point of an Object selected to have Local Coordinates (0,0,0).
Cognitive State	The coded representation of the internal state reflecting the way a human or avatar understands the Environment, such as “Confused”, “Dubious”, “Convinced”.
Communication Item	An element generated by a Machine communicating with an Entity expressed with a Portable Avatar.
Context	Information surrounding an Entity and providing additional insight into the information the Entity communicates.
Coordinate System	A coordinate system where the position of a point is specified by three numbers.
– Cartesian	A coordinate system where the three numbers are the signed distances from the point to three mutually perpendicular planes.
– Spherical	A coordinate system where the three numbers are: – the radial distance of that point from a fixed origin. – the polar angle measured from a fixed zenith direction. – the azimuthal angle of its orthogonal projection on a reference plane.
Culture	The collection of language and customs governing the way a human, or a group of humans employ to express their internal statuses.
Data	Information in digital form.
– Format	The standard digital representation of Data.
– Type	An instance of Data with a specific Data Format.
Descriptor	The Digital Representation of a feature of an Object.
– Body	A Data Type including the digital representation of the features of the body of a real or digital human.
– Face	A Data Type including the digital representation of a feature of the face of a real or digital human.
– Speech	A Data Type including the digital representation of a feature of speech of a real or digital human, such as degree of vocal tension, pitch, etc.
– Text	A Data Type including the digital representation of a feature of text.
Digital Representation	Data corresponding to and representing a physical entity.
Emotion	The coded representation of the internal state resulting from the interaction of a human or avatar with the Environment or subsets of it, such as “Angry”, “Sad”, “Determined”.
Entity	A human in a real environment or digitally represented as a Digitised Human in a Virtual Environment a Digital or a Virtual Human in a Virtual Environment.
Environment	A Virtual Space that may be null or may include an Audio-Visual Scene.
Experience	The state of an Entity whose senses/sensors are continuously affected for a meaningful period.
Face	A digital representation of a human face.
Factor	One of Emotion, Cognitive State, and Attitude.
Gesture	A movement of a Digital Human or part of it, such as the head, arm, hand, and finger, often a complement to a vocal utterance.
Human	A human being in a real space.
– Digital	A Digitised or a Virtual Human in a Virtual Space.
– Digitised	An Object in a Virtual Space that has the appearance of a specific human when rendered.
– Virtual	An Object in a Virtual Space created by a computer that has a human appearance when rendered but is not a Digitised Human.
Identifier	The label uniquely associated with a human or an Object.
Instance	An element of a set of entities – Objects, Digital Humans etc. – belonging to some levels in a hierarchical classification (taxonomy).
– Audio	The instance of an Audio Object.
– Visual	The instance of a Visual Object.
Machine	An Implementation of MPAI-MMC.
Meaning	Information extracted from Text such as syntactic and semantic information, Personal Status, and other information, such as an Object Identifier.
Microphone Array	A microphone system that uses multiple microphones arranged in a specific pattern to capture audio in an audio space.
– Geometry	A Data Type representing the spatial arrangement of the microphones in a Microphone Array.
Modality	One of Text, Speech, Face, or Gesture.
Object	A data structure that can be rendered to cause an Experience.
– Audio	An Object described by Audio Descriptors.
– Audio-Visual	An Object described by Audio-Visual Descriptors.
– Body	A digital representation of the body of a Human or a Machine.
– Descriptor	The digital representation of the feature of an Object.
– Digital	A Digitised or a Virtual Object.
– Digitised	The digital representation of a real object.
– Face	The digital representation of the face of a Human or a Machine.
– Speech	An Object described by Speech Descriptors.
– Text	A string of Text.
– Virtual	An Object not representing an object in the real environment.
– Visual	An Object described by Visual Descriptors.
Orientation	The 3 Euler angles of an Object in a Virtual Space.
Personal Status	A Data Type including three Factors – Cognitive State, Emotion and Social Attitude – conveyed by four Modalities – Text, Speech, Face, and Gesture and providing standard extensible labels for the three Factors [6].
– Face	The Cognitive State, Emotion, and Social Attitude conveyed by a Face Object.
– Gesture	The Cognitive State, Emotion, and Social Attitude conveyed by the Gesture of a Body Object.
– Speech	The Cognitive State, Emotion, and Social Attitude conveyed by a Speech Object.
– Text	The Cognitive State, Emotion, and Social Attitude conveyed by a Text Object.
Portable Avatar	A Data Type representing an Avatar and its Context.
Position	The coordinates of a representative point for an object in a Virtual Space with respect to a set of coordinate axes.
Principal Axis	The x axis of an Object.
Rendering	The process of instantiating a Virtual Space as a human-perceptible entity.
Scene	A composition of Objects located according to a Scene Geometry.
– Audio	A Scene composed of Audio Objects.
– Audio-Visual	A Scene composed of Audio Objects, Visual Objects and co-located Audio-Visual Objects.
– Multichannel	A data structure containing at least 2 time-aligned interleaved Audio Channels.
– Visual	A Scene composed of Visual Objects.
Scene Descriptors	The digital representation of a feature of a scene.
– Audio	A Data Type including the digital representation of the audio features of a real or digital scene.
– Audio-Visual	A Data Type combining the Audio or Visual Scene Descriptors.
– Visual	A Data Type including the digital representation of the visual features of a real or digital scene.
Scene Geometry	The digital representation of the Object arrangement of a Scene.
– Audio	A Data Type describing the spatial arrangement of the Visual Objects of a Scene.
– Audio-Visual	A Data Type describing the spatial arrangement of the Audio, Visual, and Audio-Visual Objects of a Scene.
– Visual	A Data Type describing the spatial arrangement of the Visual Objects of a Scene.
Selector	Input Data having the goal to set a parameter (e.g., use of Text vs Speech or Language Preference) or an operating mode of a Machine.
Speech	Digital representation of analogue speech sampled at a frequency between 8 kHz and 96 kHz with a number of bits/sample of 8, 16 or 24, and non-linear and linear quantisation or compressed. Data with characteristics of Speech may be synthetically produced.
Text	A sequence of characters represented according to [12].
– Recognised	The Text at the output of an Automatic Speech Recognition AIM.
– Refined Text	The Text at the output of a Natural Language Understanding AIM.
– Translated Text	The Text at the output of a Natural Language Translation AIM.
Virtual Space	A space generated and maintained by a computing platform that can be rendered.

<- Scope Go to ToC References ->

Cookie	Duration	Description
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Technical".
CookieLawInfoConsent	1 year	The cookie is set by the GDPR Cookie Consent plug-in and is used to store whether the user has consented to the use of cookies or not. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_pk_id.6.08a8	13 months	Used to store a few details about the user such as the unique visitor ID
_pk_ses.6.08a8	30 minutes	Short lived cookies used to temporarily store data for the visit

Notice