Go to PGM-AUA V1.0 AI Module

Function Ref. Model I/O Data SubAIMs JSON MData Profiles Ref. Software Conformance Performance

Function

The User State Description (PGM-USD) AIM derives a structured representation of the User Entity State from multimodal perceptual and enhanced contextual inputs.

USD operates on Enhanced Audio Scene Descriptors, Enhanced Visual Scene Descriptors, an aligned multimodal context, and produces a User Entity State representation capturing cognitive, behavioural, expressive, and interactional aspects of the observed User.

USD performs a multi-stage interpretative process including multimodal harmonisation, linguistic and paralinguistic analysis, behavioural and expressive inference, and cross-modal interpretation, without performing deliberative reasoning or goal-directed decision-making.

All processing is executed under User-USD Directive control issued by A‑User Control and supported by domain knowledge acquired through Domain Access.

Reference Model

Figure 1 gives the Reference Model of the User State Description (PGM-USD) AIM.

Figure 1 – The Reference Model of the User State Description (PGM-USD) AIM

Input/Output Data

Table 1 gives the Input and Output Data of the User State Description (PGM-USD) AIM.

Table 1 – Input/Output Data of User State Description (PGM-USD) AIM

Input Description
Enhanced Audio Scene Descriptors Audio descriptors enriched by Audio Scene Enhancement representing speech, paralinguistic cues, and environmental audio context.
Enhanced Visual Scene Descriptors Visual descriptors enriched by Visual Scene Enhancement representing user posture, gestures, and visual interaction context.
AV Scene Geometry Aligned multimodal representation linking audio and visual sources generated by Audio-Visual Alignment.
User-CXE Directive Control directives specifying scope, depth, and policy constraints for user state derivation.
User Domain Request Domain knowledge requests issued during interpretation stages.
Output Description
User Entity State Structured representation of user cognitive, behavioural, expressive, and interactional state.
User-CXE Status Status information describing execution progress and outcome of USD processing.
User Domain Response Responses to domain knowledge requests supporting interpretation and inference.

SubAIMs (Informative)

4.1 Reference Model

Figure 2 depicts the Reference Architecture of the User State Description (PGM-USD) Composite AIM.

Figure 2 – Reference Model of User State Description (PGM-USD) Composite AIM

4.2 Operation

The User State Description AI Module operates by progressively transforming multimodal enhanced descriptors into a structured User Entity State through a sequence of internal SubAIMs.

The effective inputs are Enhanced Audio Scene Descriptors, Enhanced Visual Scene Descriptors, AV Scene Geometry, Domain Responses, and the User-USD Directive from A‑User Control.

The Multimodal Input Harmonisation SubAIM aligns temporal, structural, and referential aspects of multimodal inputs, producing a Harmonised Multimodal Context suitable for subsequent analysis.

The Linguistic–Paralinguistic Analysis SubAIM processes text and speech descriptors to derive linguistic content and paralinguistic features such as prosody, emphasis, and rhythm, producing Linguistic–PL Evidence.

The Behavioural and Expressive Analysis SubAIM interprets linguistic evidence and multimodal context to derive behavioural and expressive indicators reflecting interaction patterns, communicative intent, and expressive signals.

The Cross-Modal Interpretation SubAIM integrates linguistic, behavioural, and multimodal evidence to produce a coherent cross-modal interpretation of user behaviour and interaction context.

The Entity State Construction SubAIM aggregates all derived evidence into a consistent User Entity State representation, ensuring alignment across modalities and preserving traceability to input evidence.

USD processing produces intermediate and final status information reflecting progress and outcomes of each stage.

4.3 Functions of AI Modules

Table 2 – Functions of User State Description (PGM-USD) AIM’s SubAIMs

SubAIM Specification Purpose
Multimodal Input Harmonisation Aligns multimodal descriptors into a coherent temporal and structural context.
Linguistic–Paralinguistic Analysis Extracts linguistic meaning and paralinguistic cues from speech and text.
Behavioural and Expressive Analysis Derives behavioural patterns and expressive indicators from multimodal evidence.
Cross-Modal Interpretation Integrates multimodal evidence into a unified interpretation.
Entity State Construction Produces the final structured User Entity State and associated status information.

4.4 I/O Data of AI Modules

Table 3 – I/O Data of User State Description (PGM-USD) AIM’s SubAIMs

SubAIM Input Data Output Data
Multimodal Input Harmonisation Enhanced ASD, Enhanced VSD, AV Scene Geometry, User-CXE Directive Harmonised Multimodal Context
Linguistic–Paralinguistic Analysis Harmonised Multimodal Context, Speech Descriptors, User-CXE Directive, Domain Response Linguistic–PL Evidence, Domain Request
Behavioural and Expressive Analysis Linguistic–PL Evidence, Harmonised Multimodal Context, Domain Response, User-CXE Directive Behavioural & Expressive Indicators, Domain Request
Cross-Modal Interpretation Linguistic–PL Evidence, Behavioural Indicators, Harmonised Context, Domain Response Cross-Modal Interpretation Evidence
Entity State Construction Cross-Modal Evidence, Linguistic Evidence, Behavioural Indicators, Harmonised Context, Domain Response, User-CXE Directive User Entity State, User-CXEStatus

4.5 AIMs and JSON Metadata

Table 4 – AIMs and JSON Metadata

AIM1 AIM2 Names JSON
PGM-USD User State Description Link
PGM-MIH Multimodal Input Harmonisation Link
PGM-LPA Linguistic–Paralinguistic Analysis Link
PGM-BEA Behavioural and Expressive Analysis Link
PGM-CMI Cross-Modal Interpretation Link
PGM-ESC Entity State Construction Link

5. JSON Metadata

https://schemas.mpai.community/PGM1/V1.0/AIMs/UserStateDescription.json

6. Profiles

No Profiles