| Function | Ref. Model | I/O Data | SubAIMs | JSON MData | Profiles | Ref. Software | Conformance | Performance |
Function
The User State Description (PGM-USD) AIM derives a structured representation of the User Entity State from multimodal perceptual and enhanced contextual inputs.
USD operates on Enhanced Audio Scene Descriptors, Enhanced Visual Scene Descriptors, an aligned multimodal context, and produces a User Entity State representation capturing cognitive, behavioural, expressive, and interactional aspects of the observed User.
USD performs a multi-stage interpretative process including multimodal harmonisation, linguistic and paralinguistic analysis, behavioural and expressive inference, and cross-modal interpretation, without performing deliberative reasoning or goal-directed decision-making.
All processing is executed under User-USD Directive control issued by A‑User Control and supported by domain knowledge acquired through Domain Access.
Reference Model
Figure 1 gives the Reference Model of the User State Description (PGM-USD) AIM.

Figure 1 – The Reference Model of the User State Description (PGM-USD) AIM
Input/Output Data
Table 1 gives the Input and Output Data of the User State Description (PGM-USD) AIM.
Table 1 – Input/Output Data of User State Description (PGM-USD) AIM
| Input | Description |
|---|---|
| Enhanced Audio Scene Descriptors | Audio descriptors enriched by Audio Scene Enhancement representing speech, paralinguistic cues, and environmental audio context. |
| Enhanced Visual Scene Descriptors | Visual descriptors enriched by Visual Scene Enhancement representing user posture, gestures, and visual interaction context. |
| AV Scene Geometry | Aligned multimodal representation linking audio and visual sources generated by Audio-Visual Alignment. |
| User-CXE Directive | Control directives specifying scope, depth, and policy constraints for user state derivation. |
| User Domain Request | Domain knowledge requests issued during interpretation stages. |
| Output | Description |
| User Entity State | Structured representation of user cognitive, behavioural, expressive, and interactional state. |
| User-CXE Status | Status information describing execution progress and outcome of USD processing. |
| User Domain Response | Responses to domain knowledge requests supporting interpretation and inference. |
SubAIMs (Informative)
4.1 Reference Model
Figure 2 depicts the Reference Architecture of the User State Description (PGM-USD) Composite AIM.

Figure 2 – Reference Model of User State Description (PGM-USD) Composite AIM
4.2 Operation
The User State Description AI Module operates by progressively transforming multimodal enhanced descriptors into a structured User Entity State through a sequence of internal SubAIMs.
The effective inputs are Enhanced Audio Scene Descriptors, Enhanced Visual Scene Descriptors, AV Scene Geometry, Domain Responses, and the User-USD Directive from A‑User Control.
The Multimodal Input Harmonisation SubAIM aligns temporal, structural, and referential aspects of multimodal inputs, producing a Harmonised Multimodal Context suitable for subsequent analysis.
The Linguistic–Paralinguistic Analysis SubAIM processes text and speech descriptors to derive linguistic content and paralinguistic features such as prosody, emphasis, and rhythm, producing Linguistic–PL Evidence.
The Behavioural and Expressive Analysis SubAIM interprets linguistic evidence and multimodal context to derive behavioural and expressive indicators reflecting interaction patterns, communicative intent, and expressive signals.
The Cross-Modal Interpretation SubAIM integrates linguistic, behavioural, and multimodal evidence to produce a coherent cross-modal interpretation of user behaviour and interaction context.
The Entity State Construction SubAIM aggregates all derived evidence into a consistent User Entity State representation, ensuring alignment across modalities and preserving traceability to input evidence.
USD processing produces intermediate and final status information reflecting progress and outcomes of each stage.
4.3 Functions of AI Modules
Table 2 – Functions of User State Description (PGM-USD) AIM’s SubAIMs
| SubAIM Specification | Purpose |
|---|---|
| Multimodal Input Harmonisation | Aligns multimodal descriptors into a coherent temporal and structural context. |
| Linguistic–Paralinguistic Analysis | Extracts linguistic meaning and paralinguistic cues from speech and text. |
| Behavioural and Expressive Analysis | Derives behavioural patterns and expressive indicators from multimodal evidence. |
| Cross-Modal Interpretation | Integrates multimodal evidence into a unified interpretation. |
| Entity State Construction | Produces the final structured User Entity State and associated status information. |
4.4 I/O Data of AI Modules
Table 3 – I/O Data of User State Description (PGM-USD) AIM’s SubAIMs
| SubAIM | Input Data | Output Data |
|---|---|---|
| Multimodal Input Harmonisation | Enhanced ASD, Enhanced VSD, AV Scene Geometry, User-CXE Directive | Harmonised Multimodal Context |
| Linguistic–Paralinguistic Analysis | Harmonised Multimodal Context, Speech Descriptors, User-CXE Directive, Domain Response | Linguistic–PL Evidence, Domain Request |
| Behavioural and Expressive Analysis | Linguistic–PL Evidence, Harmonised Multimodal Context, Domain Response, User-CXE Directive | Behavioural & Expressive Indicators, Domain Request |
| Cross-Modal Interpretation | Linguistic–PL Evidence, Behavioural Indicators, Harmonised Context, Domain Response | Cross-Modal Interpretation Evidence |
| Entity State Construction | Cross-Modal Evidence, Linguistic Evidence, Behavioural Indicators, Harmonised Context, Domain Response, User-CXE Directive | User Entity State, User-CXEStatus |
4.5 AIMs and JSON Metadata
Table 4 – AIMs and JSON Metadata
| AIM1 | AIM2 | Names | JSON |
| PGM-USD | User State Description | Link | |
| PGM-MIH | Multimodal Input Harmonisation | Link | |
| PGM-LPA | Linguistic–Paralinguistic Analysis | Link | |
| PGM-BEA | Behavioural and Expressive Analysis | Link | |
| PGM-CMI | Cross-Modal Interpretation | Link | |
| PGM-ESC | Entity State Construction | Link |
5. JSON Metadata
https://schemas.mpai.community/PGM1/V1.0/AIMs/UserStateDescription.json
6. Profiles
No Profiles