<- Scope Go to ToC References ->
Capitalised Terms have the meaning defined in Table 1. Non-capitalised terms have the meaning commonly defined for the context in which they are used. For instance, Table 1 defines Object and Scene but does not define object and scene. Therefore the terms object and Object in a sentence refer to a real object and its digital representation.
A dash “-” preceding a Term in Table 1 indicates the following readings depending on whether the font is:
- Normal: the Term in the table without a dash and preceding the one with a dash should be read before that Term. For example, “Avatar” and “- Model” will yield “Avatar Model.”
- Italic: the Term in the table without a dash and preceding the one with a dash should be read after that Term. For example, “Avatar” and “- Portable” will yield “Portable Avatar.”
The full set of Terms and Definitions relevant to all MPAI Technical Specifications, including MPAI-HMC, can be accessed online.
Table 1 – General MPAI-HMC terms
Terms | Definitions |
Attitude | |
– Social | The coded representation of the internal state related to the way a human or avatar intends to position vis-à-vis the Environment or subsets of it, e.g., “Respectful”, “Confrontational”, “Soothing”. |
– Spatial | Position and Orientation and their velocities and accelerations of an Object in a Real or Virtual Environment. |
Audio | A Data Type an instance of which represents analogue signals – or is rendered to be perceived – in the human-audible range (16 Hz – 20 kHz). |
Avatar | An Data Type including the 3D Model of an Avatar and the Face and Body Descriptors. |
– Model | An inanimate Avatar exposing animation interfaces. |
– Portable | A Data Type including Avatar ID, Time, Avatar, Language, Speech, Text, Speech Model, Personal Status, Audio-Visual Scene Descriptors, and potentially an input Portable Avatar. |
Body | A digital representation of a human body, head included, face excluded. |
Centre Point | The point of an Object selected to have Local Coordinates (0,0,0). |
Cognitive State | The coded representation of the internal state reflecting the way a human or avatar understands the Environment, such as “Confused”, “Dubious”, “Convinced”. |
Communication Item | An element generated by a Machine communicating with an Entity expressed with a Portable Avatar. |
Context | Additional information about a communication emitted by an Entity, such as language, culture etc. |
Coordinate System | A system where the position of a point is specified by three numbers. |
– Cartesian | A coordinate system where the three numbers are the signed distances from the point to three mutually perpendicular planes. |
– Spherical | A coordinate system where the three numbers are: – the radial distance of that point from a fixed origin. – the polar angle measured from a fixed zenith direction. – the azimuthal angle of its orthogonal projection on a reference plane. |
Culture | The collection of language and customs governing the way a human, or a group of humans communicate. |
Data | Information in digital form. |
– Format | A specific digital representation of Data. |
– Media | Data representing Text, Speech, Audio, Visual, 3D Model, LiDAR, RADAR, Ultrasound information. |
– Object | A Data Type including Data of a given Data Type and the Qualifier of that Data Type. |
– Type | A recognised instance of Data. |
Descriptors | The Digital Representation of the feature of an entity, e.g., an object or a scene. |
– Audio | A Data Type including the digital representation of the features of an audio instance. |
– Audio-Visual | A Data Type including the digital representation of the features of an audio-visual instance. |
– Body | A Data Type including the digital representation of the features of the body of an Entity. |
– Face | A Data Type including the digital representation of a feature of the face of an Entity. |
– Speech | A Data Type including the digital representation of a feature of speech of an Entity, such as degree of vocal tension, pitch, etc. |
– Text | A Data Type including the digital representation of a feature of text. |
– Visual | A Data Type including the digital representation of the features of a visual instance. |
Digital Representation | Data corresponding to and representing a physical entity. |
Emotion | The coded representation of the internal state resulting from the interaction of a human or avatar with an environment or subsets of it, such as “Angry”, “Sad”, “Determined”. |
Entity | A human in a real environment or a Digitised Human in a Virtual Environment rendered in a real environment or a Virtual Human in a Virtual Environment. |
Environment | A Virtual Space that may be null or may include one or more Audio-Visual Scenes. |
Experience | The state of an Entity whose senses/sensors are continuously affected for a meaningful period. |
Face | The digital representation of a human face. |
Factor | One of Emotion, Cognitive State, and Attitude. |
Gesture | A movement of a Digital Human or part of it, such as the head, arm, hand, and finger, often a complement to a vocal utterance. |
Human | A human being in a real space. |
– Digital | A Digitised or a Virtual Human. |
– Digitised | A Data Type representing a human. |
– Virtual | Data created by a computer that has a human appearance when rendered but is not a Digitised Human. |
Identifier | The label uniquely associated with a human or a Data Instance. |
Instance | An element of a set of entities belonging to some levels in a hierarchical classification (taxonomy). |
– Audio | The instance of an Audio Object. |
– Visual | The instance of a Visual Object. |
Machine | An Implementation of MPAI-HMC. |
Meaning | Information extracted from Text such as syntactic and semantic information, Personal Status, and other information, such as an Object Identifier. |
Microphone Array | A microphone system that uses multiple microphones arranged in a specific pattern to capture audio in an audio space. |
– Geometry | A Data Type representing the spatial arrangement of the microphones in a Microphone Array. |
Modality | One of Text, Speech, Face, or Gesture. |
Object | A Data Type including Media Data and an optional Qualifier. |
– 3D Model | A Data Type including 3D Model Data and Qualifier. |
– Audio | A Data Type including Audio Data and Qualifier. |
– Audio-Visual | A Data Type including Audio-Visual Data and Qualifier. |
– Digital | A Digitised or a Virtual Object. |
– Digitised | Data representing a real object. |
– Speech | A Data Type including Speech Data and Qualifier. |
– Text | A Data Type including Text Data and Qualifier. |
– Visual | A Data Type including Visual Data and Qualifier. |
Orientation | The 3 Euler angles of an object. |
Personal Status | A Data Type including three Factors – Cognitive State, Emotion and Social Attitude – conveyed by four Modalities – Text, Speech, Face, and Gesture and providing standard extensible labels for the three Factors [6]. |
– Face | The Cognitive State, Emotion, and Social Attitude conveyed by a Face. |
– Gesture | The Cognitive State, Emotion, and Social Attitude conveyed by the Gesture of a Body. |
– Speech | The Cognitive State, Emotion, and Social Attitude conveyed by Speech Data. |
– Text | The Cognitive State, Emotion, and Social Attitude conveyed by Text Data. |
Position | The coordinates of a representative point for an object in a Virtual Space with respect to a set of coordinate axes. |
Rendering | The process of instantiating Data or a Virtual Space as a human-perceptible entity. |
Scene | A composition of Media arranged according to a Scene Geometry. |
– 3D Model | A Scene composed of 3D Model Objects. |
– Audio | A Scene composed of Audio Objects. |
– Audio-Visual | A Scene composed of Speech and Audio Objects, Visual and 3D Model Objects and co-located Audio-Visual Objects. |
– Speech | A Scene composed of Speech Objects. |
– Visual | A Scene composed of Visual Objects. |
Scene Descriptors | A Data Type including the Media Objects and their spatial arrangement in a Scene. |
– 3D Model | A Data Type including an Audio Scene’s 3D Model Objects and Sub-Scenes, and their spatial arrangement. |
– Audio | A Data Type including an Audio Scene’s Audio Objects and Sub-Scenes, and their spatial arrangement. |
– Audio-Visual | A Data Type including an Audio Scene’s Speech, Audio, Visual, 3D Model, and Audio-Visual Objects, and their spatial arrangement. |
– Visual | A Data Type including a Visual Scene’s Visual Objects and Sub-Scenes and their spatial arrangement. |
Scene Geometry | A Data Type including the spatial arrangement of the Media Objects in a Scene. |
– 3D Model | A Data Type describing the spatial arrangement of the 3D Model Objects in a Scene. |
– Audio | A Data Type describing the spatial arrangement of the Audio Objects of a Scene. |
– Audio-Visual | A Data Type describing the spatial arrangement of the Speech, Audio, Visual, 3D Model, and Audio-Visual Objects of a Scene. |
– Speech | A Data Type describing the spatial arrangement of the Speech Objects of a Scene. |
– Visual | A Data Type describing the spatial arrangement of the Visual Objects of a Scene. |
Selector | Input Data having the goal to set a parameter (e.g., use of Text vs Speech or Language Preference) or an operating mode of a Machine. |
Speech | Digital representation of analogue speech sampled at a frequency between 8 kHz and 96 kHz with a number of bits/sample of 8, 16 or 24, and non-linear and linear quantisation or compressed. Data with characteristics of Speech may be synthetically produced. |
Text | A series of characters drawn from a finite alphabet of a character set. |
– Recognised | The Text at the output of an Automatic Speech Recognition AIM. |
– Refined Text | The Text at the output of a Natural Language Understanding AIM. |
– Translated Text | The Text at the output of a Natural Language Translation AIM. |
Virtual Space | A space generated and maintained by a computing platform that can be rendered. |
The Terms used in this standard whose first letter is capital and are not already included in Table 1 are defined in Table 2.
Term | Definition |
Access | Static or slowly changing data that are required by an application such as domain knowledge data, data models, etc. |
AI Framework (AIF) | The environment where AIWs are executed. |
AI Model (AIM) | A data processing element receiving AIM-specific Inputs and producing AIM-specific Outputs according to according to its Function. An AIM may be an aggregation of AIMs. |
– Attribute | An input Data or an output Data or a functionality, such as the ability to translate or retain memory of past operations. |
– Basic | An AIM that does not aggregate other AIMs. |
– Composite | An AIM that does not include or does not expose AIMs. |
– Profile | The label that uniquely identifies a set of Attributes of an AIM. |
AI Workflow (AIW) | A structured aggregation of AIMs implementing a Use Case receiving AIW-specific inputs and producing AIW-specific outputs according to the AIW Function. |
Application Standard | An MPAI Standard designed to enable a particular application domain. |
Channel | A connection between an output port of an AIM and an input port of an AIM. The term “connection” is also used as synonymous. |
Communication | The infrastructure that implements message passing between AIMs. |
Component | One of the 7 AIF elements: Access, Communication, Controller, Internal Storage, Global Storage, Store, and User Agent |
Composite AIM | An AIM aggregating more than one AIM. |
Component | One of the 7 AIF elements: Access, Communication, Controller, Internal Storage, Global Storage, Store, and User Agent |
Conformance | The attribute of an Implementation of being a correct technical Implementation of a Technical Specification. |
– Testing | The normative document specifying the Means to Test the Conformance of an Implementation. |
– Testing Means | Procedures, tools, data sets and/or data set characteristics to Test the Conformance of an Implementation. |
Connection | A channel connecting an output port of an AIM and an input port of an AIM. |
Controller | A Component that manages and controls the AIMs in the AIF, so that they execute in the correct order and at the time when they are needed |
Data | Information in digital form. |
– Format | The standard digital representation of Data. |
– Type | An instance of Data with a specific Data Format. |
– Semantics | The meaning of Data. |
Descriptor | Coded representation of a text, audio, speech, or visual feature. |
Digital Representation | Data corresponding to and representing a physical entity. |
Ecosystem | The ensemble of actors making it possible for a User to execute an application composed of an AIF, one or more AIWs, each with one or more AIMs potentially sourced from independent implementers. |
Explainability | The ability to trace the output of an Implementation back to the inputs that have produced it. |
Fairness | The attribute of an Implementation whose extent of applicability can be assessed by making the training set and/or network open to testing for bias and unanticipated results. |
Function | The operations effected by an AIW or an AIM on input data. |
Global Storage | A Component to store data shared by AIMs. |
AIM/AIW Storage | A Component to store data of the individual AIMs. |
Identifier | A name that uniquely identifies an Implementation. |
Implementation | 1. An embodiment of the MPAI-AIF Technical Specification, or 2. An AIW or AIM of a particular Level (1-2-3) conforming with a Use Case of an MPAI Application Standard. |
Implementer | A legal entity implementing MPAI Technical Specifications. |
ImplementerID (IID) | A unique name assigned by the ImplementerID Registration Authority to an Implementer. |
ImplementerID Registration Authority (IIDRA) | The entity appointed by MPAI to assign ImplementerID’s to Implementers. |
Instance ID | Instance of a class of Objects and the Group of Objects the Instance belongs to. |
Interoperability | The ability to functionally replace an AIM with another AIW having the same Interoperability Level |
– Level | The attribute of an AIW and its AIMs to be executable in an AIF Implementation and to: 1. Be proprietary (Level 1) 2. Pass the Conformance Testing (Level 2) of an Application Standard 3. Pass the Performance Testing (Level 3) of an Application Standard. |
Knowledge Base | Structured and/or unstructured information made accessible to AIMs via MPAI-specified interfaces |
Message | A sequence of Records transported by Communication through Channels. |
Normativity | The set of attributes of a technology or a set of technologies specified by the applicable parts of an MPAI standard. |
Performance | The attribute of an Implementation of being Reliable, Robust, Fair and Replicable. |
– Assessment | The normative document specifying the Means to Assess the Grade of Performance of an Implementation. |
– Assessment Means | Procedures, tools, data sets and/or data set characteristics to Assess the Performance of an Implementation. |
– Assessor | An entity Assessing the Performance of an Implementation. |
Profile | A particular subset of the technologies used in MPAI-AIF or an AIW of an Application Standard and, where applicable, the classes, other subsets, options and parameters relevant to that subset. |
Record | A data structure with a specified structure |
Reference Model | The AIMs and theirs Connections in an AIW. |
Reference Software | A technically correct software implementation of a Technical Specification containing source code, or source and compiled code. |
Reliability | The attribute of an Implementation that performs as specified by the Application Standard, profile, and version the Implementation refers to, e.g., within the application scope, stated limitations, and for the period of time specified by the Implementer. |
Replicability | The attribute of an Implementation whose Performance, as Assessed by a Performance Assessor, can be replicated, within an agreed level, by another Performance Assessor. |
Robustness | The attribute of an Implementation that copes with data outside of the stated application scope with an estimated degree of confidence. |
Scope | The domain of applicability of an MPAI Application Standard |
Service Provider | An entrepreneur who offers an Implementation as a service (e.g., a recommendation service) to Users. |
Standard | A set of Technical Specification, Reference Software, Conformance Testing, Performance Assessment, and Technical Report of an MPAI application Standard. |
Technical Specification | The normative specification of the set of AIWs belonging to an application domain along with the AIMs required to Implement the AIWs that includes: 1. The formats of the Input/Output data of the AIWs implementing the AIWs. 2. The Connections of the AIMs of the AIW. 3. The formats of the Input/Output data of the AIMs belonging to the AIW. |
Testing Laboratory | A laboratory accredited to Assess the Grade of Performance of Implementations. |
Time Base | The protocol specifying how Components can access timing information |
Topology | The set of AIM Connections of an AIW. |
Use Case | A particular instance of the Application domain target of an Application Standard. |
User | A user of an Implementation. |
User Agent | The Component interfacing the user with an AIF through the Controller |
Version | A revision or extension of a Standard or of one of its elements. |
Zero Trust | A cybersecurity model primarily focused on data and service protection that assumes no implicit trust. |
<- Scope Go to ToC References ->