PGM-AUA V1.0 AIMs - User State Description

Go to PGM-AUA V1.0 AI Modules

Function

Ref. Model

1. Function

The User State Description (PGM‑USD) AIM derives a structured representation of the User State from multimodal perceptual and enhanced contextual inputs.

User State Description:

Operates on the Enhanced Audio Scene Descriptors produced by Audio Scene Enhancement, the Enhanced Visual Scene Descriptors produced by Visual Scene Enhancement, the User CXT Directive received from A‑User Control, the User Domain Response resulting from queries made to Domain Access, and the User IH Response resulting from queries made to Interaction History.
Performs a multi‑stage interpretative process including multimodal harmonisation, linguistic and paralinguistic analysis, behavioural and expressive inference, and cross‑modal interpretation, without performing deliberative reasoning or goal‑directed decision‑making.
Produces the User State sent to Context Description Multiplexing, the User CXT Status, the User Domain Request when querying Domain Access, and the User IH Request when querying Interaction History from A-User Storage.

The User State carries a structured representation of the cognitive, behavioural, expressive, and interactional aspects of the observed User, under User CXT Directive control.

2. Reference Model

Figure 1 depicts the Reference Model of the User State Description (PGM‑USD) AIM.

Figure 1 – Reference Model of the User State Description (PGM‑USD) AIM

3. Input/Output Data

Table 1 lists the Input and Output Data of the User State Description (PGM‑USD) AIM.

Table 1 – Input/Output Data of the User State Description (PGM‑USD) AIM

Input	Description
Enhanced Audio Scene Descriptors	Audio descriptors enriched by Audio Scene Enhancement representing speech, paralinguistic cues, and environmental audio context.
Enhanced Visual Scene Descriptors	Visual descriptors enriched by Visual Scene Enhancement representing user posture, gestures, and visual interaction context.
User CXT Directive	Contextual directive specifying scope, depth, and policy constraints for user state derivation.
User Domain Response	Domain‑specific knowledge supporting interpretation and inference stages.
User IH Response	Interaction History response providing temporal context for user state derivation.
Output	Description
User State	Structured representation of the cognitive, behavioural, expressive, and interactional state of the observed User.
User CXT Status	Multiplexed status signals from the internal SubAIMs reporting their processing outcomes.
User Domain Request	Multiplexed domain requests from the internal SubAIMs.
User IH Request	Multiplexed Interaction History requests from the internal SubAIMs.

4. SubAIMs (Informative)

This section is informative. The decomposition into SubAIMs described below illustrates one conformant architecture for producing the normative outputs of PGM‑USD. Implementations may adopt alternative internal structures provided they satisfy the conformance requirements of Section 8.

4.1 Reference Model

Figure 2 depicts the Reference Model of the User State Description (PGM‑USD) Composite AIM.

Figure 2 – Reference Model of the User State Description (PGM‑USD) Composite AIM

4.2 Operation

The User State Description operation receives Enhanced Audio Scene Descriptors, Enhanced Visual Scene Descriptors, and the control signals, and progressively transforms them into a structured User Entity State through a sequence of specialised SubAIMs.

The Entity State Demultiplexing SubAIM receives the external inputs and routes the control signals to the appropriate SubAIMs. The Multimodal Input Harmonisation SubAIM aligns temporal, structural, and referential aspects of the multimodal inputs, producing a Harmonised Multimodal Context suitable for subsequent analysis. The Linguistic‑Paralinguistic Analysis SubAIM processes text and speech descriptors to derive linguistic content and paralinguistic features such as prosody, emphasis, and rhythm, producing Linguistic‑PL Evidence. The Behavioural and Expressive Analysis SubAIM interprets linguistic evidence and multimodal context to derive behavioural and expressive indicators reflecting interaction patterns, communicative intent, and expressive signals. The Cross‑Modal Interpretation SubAIM integrates linguistic, behavioural, and multimodal evidence to produce a coherent cross‑modal interpretation of user behaviour and interaction context. The Entity State Multiplexing SubAIM aggregates all derived evidence into a consistent User Entity State and assembles the multiplexed status and request signals.

Each of Linguistic‑Paralinguistic Analysis, Behavioural and Expressive Analysis, and Cross‑Modal Interpretation independently generates a User CXT Status, a User Domain Request, and a User IH Request, which are multiplexed by Entity State Multiplexing into the corresponding external output signals.

4.3 Functions of SubAIMs

Table 2 specifies the functions of the SubAIMs of the User State Description (PGM‑USD) Composite AIM.

Table 2 – Functions of the SubAIMs of the User State Description (PGM‑USD) Composite AIM

Name	Function
Entity State Demultiplexing	Receives the external inputs and routes the control signals to the appropriate SubAIMs.
Multimodal Input Harmonisation	Aligns temporal, structural, and referential aspects of multimodal inputs into a Harmonised Multimodal Context suitable for subsequent analysis.
Linguistic‑Paralinguistic Analysis	Extracts linguistic meaning and paralinguistic cues such as prosody, emphasis, and rhythm from speech and text, producing Linguistic‑PL Evidence.
Behavioural and Expressive Analysis	Derives behavioural patterns and expressive indicators reflecting interaction patterns, communicative intent, and expressive signals from multimodal evidence.
Cross‑Modal Interpretation	Integrates linguistic, behavioural, and multimodal evidence into a coherent cross‑modal interpretation of user behaviour and interaction context.
Entity State Multiplexing	Aggregates all derived evidence into a consistent User Entity State and assembles the multiplexed User CXT Status, User Domain Request, and User IH Request.

4.4 Input/Output Data of SubAIMs

Table 3 lists the Input and Output Data of the SubAIMs of the User State Description (PGM‑USD) Composite AIM.

Table 3 – Input/Output Data of the SubAIMs of the User State Description (PGM‑USD) Composite AIM

Name	Input Data	Output Data
Entity State Demultiplexing	Enhanced Audio Scene Descriptors Enhanced Visual Scene Descriptors User CXT Directive User Domain Response User IH Response	Enhanced Audio Scene Descriptors Enhanced Visual Scene Descriptors User CXT Directive User Domain Response User IH Response
Multimodal Input Harmonisation	Enhanced Audio Scene Descriptors Enhanced Visual Scene Descriptors User CXT Directive User Domain Response User IH Response	Harmonised Multimodal Context Speech Descriptors
Linguistic‑Paralinguistic Analysis	Harmonised Multimodal Context Speech Descriptors User CXT Directive User Domain Response User IH Response	Linguistic‑PL Evidence User CXT Status User Domain Request User IH Request
Behavioural and Expressive Analysis	Harmonised Multimodal Context Linguistic‑PL Evidence User CXT Directive User Domain Response User IH Response	Behavioural & Expressive Indicators User CXT Status User Domain Request User IH Request
Cross‑Modal Interpretation	Harmonised Multimodal Context Linguistic‑PL Evidence Behavioural & Expressive Indicators User CXT Directive User Domain Response User IH Response	Cross‑Modal Interpretation Evidence User CXT Status User Domain Request User IH Request
Entity State Multiplexing	All SubAIM outputs	User State User CXT Status User Domain Request User IH Request

4.5 AIMs and JSON Metadata

Table 4 gives the User State Description (AIM1) and its SubAIMs (AIM2).

Table 4 – User State Description (AIM1) and its SubAIMs (AIM2)

AIM1	AIM2	Name	JSON
PGM‑USD		User State Description	X
	PGM‑ESX	Entity State Demultiplexing	X
	PGM‑MIH	Multimodal Input Harmonisation	X
	PGM‑LPA	Linguistic‑Paralinguistic Analysis	X
	PGM‑BEA	Behavioural and Expressive Analysis	X
	PGM‑CMI	Cross‑Modal Interpretation	X
	PGM‑ESM	Entity State Multiplexing	X

5. JSON Metadata

https://schemas.mpai.community/PGM1/V1.0/AIMs/UserStateDescription.json

6. Profiles

No Profiles.

7. Reference Software

Not part of this specification.

8. Conformance Testing

A USD implementation conforms with User State Description (PGM‑USD) if:

The implementation includes all SubAIMs listed in Table 2.
All I/O Data listed in Table 1 are present and conform with their respective Data Types.
All SubAIM I/O Data listed in Table 3 conform with their respective Data Types.
The implementation produces a User State that integrates the outputs of Linguistic‑Paralinguistic Analysis, Behavioural and Expressive Analysis, and Cross‑Modal Interpretation.

9. Performance Assessment