PGM-AUA V1.0 Data Types - Multimodal Input Harmonisation

Definition

The Multimodal Input Harmonisation (MIH) Data Type:

Is produced by the Multimodal Input Harmonisation internal component of the Entity State Description (ESD) AIM.
Represents a time-bounded, entity-centred harmonisation of multimodal perceptual inputs.
Correlates visual objects, audio objects, and textual elements that are contemporaneous and refer to the same entity or entities.
Does not introduce semantic interpretation, inference, or state attribution.

MIH provides a precisely aligned evidential substrate for downstream linguistic, behavioural, cross‑modal interpretation, and Entity State construction.

Functional Requirements

MIH conveys the following main information elements:

Function	Description
Multimodal Correlation	Establishes explicit correspondences between Visual Objects, Audio Objects, and Text segments that coexist within a common temporal window.
Temporal Anchoring	Provides a harmonisation time reference to ensure that subsequent reasoning operates on co‑temporal evidence only.
Entity Referencing	Identifies which multimodal evidence items relate to the same logical entity (e.g. the User or another entity in the scene).
Modal Integrity	Preserves the original structure and semantics of Visual and Audio Scene Descriptors without duplication or modification.
Referential Transparency	Uses object‑or‑objectID constructs to ensure that all references are explicit, inspectable, and verifiable.
Interpretation Neutrality	Explicitly refrains from performing affective, intentional, or cognitive inference.
Reasoning Substrate	Serves as the mandatory input substrate for Linguistic–Paralinguistic Analysis, Behavioural / Expressive Analysis, and subsequent Entity State construction.
Auditability	Includes Data Exchange Metadata to support provenance, traceability, and confidence assessment.

Syntax

https://schemas.mpai.community/MMC/V2.5/data/MultimodalInputHarmonisation.json

Semantics

Label	Description
Header	MIH header identifying the data type and version, formatted as `MMC-MIH-Vx.y`.
MInstanceID	Identifier of the M‑Instance producing the MIH data.
MIHID	Unique identifier of the Multimodal Input Harmonisation instance.
HarmonisationTime	Time reference identifying the temporal window within which multimodal evidence is harmonised.
VisualContext	Set of visual entities relevant to harmonisation. Each item SHALL be either a `VisualObject` or a `VisualObjectID` referencing a Visual Scene Descriptor produced upstream.
AudioContext	Set of audio entities relevant to harmonisation. Each item SHALL be either an `AudioObject` or an `AudioObjectID` referencing an Audio Scene Descriptor produced upstream.
TextContext	Set of textual elements derived from ASR. Each item binds a recognised text segment or a `TextSegmentID` to a temporal anchor.
TextContext.TextOrTextID	Either the recognised text string or an identifier referencing a text segment produced by an ASR AIM.
TextContext.SpaceTime	Temporal anchor indicating when the text segment was uttered.
EntityContext	Entity‑centric correspondence structure grouping visual, audio, and textual evidence that refers to the same logical entity.
EntityContext.EntityID	Identifier of the logical entity (e.g. User or other actor) to which the referenced evidence relates.
EntityContext.VisualRefs	Visual evidence associated with the entity. Each item SHALL be either a `VisualObject` or a `VisualObjectID`.
EntityContext.AudioRefs	Audio evidence associated with the entity. Each item SHALL be either an `AudioObject` or an `AudioObjectID`.
EntityContext.TextRefs	Textual evidence associated with the entity. Each item SHALL be either a recognised text segment or a `TextSegmentID`.
DataXMData	Data Exchange Metadata providing provenance, source AIM identification, confidence, legality, and rights information.
DescrMetadata	Human‑readable descriptive metadata associated with the MIH instance.

Cookie	Duration	Description
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Technical".
CookieLawInfoConsent	1 year	The cookie is set by the GDPR Cookie Consent plug-in and is used to store whether the user has consented to the use of cookies or not. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_pk_id.6.08a8	13 months	Used to store a few details about the user such as the unique visitor ID
_pk_ses.6.08a8	30 minutes	Short lived cookies used to temporarily store data for the visit

PGM-AUA V1.0 Data Types – Multimodal Input Harmonisation

Definition

Functional Requirements

Syntax

Semantics

Notice