Go to PGM-AUA V1.0 AI Modules

Function
Ref. Model
I/O Data
SubAIMs
JSON MData
Profiles
Ref. Software
Conformance
Performance

1      Functions

The User State Description (PGM-USD) AIM derives a description of the User’s observable state by interpreting enriched audio‑visual scene information, speech descriptors, and cross‑modal correspondence evidence.

USD operates on enhanced scene descriptors, Speech Descriptors, and alignment evidence produced by Context Enhancement and applies evidence‑based reasoning under directive control to construct a User Entity State suitable for downstream reasoning, control, and personalisation.

USD operates on User‑USD Directive received from A‑User Control, Enhanced Audio Scene Descriptors and Enhanced Visual Scene Descriptors received from Context Enhancement, Speech Descriptors received from Automatic Speech Recognition, and Domain RS resulting from queries made to Domain Access, and produces User‑USD Status sent to A‑User Control, User Entity State sent to Prompt Creation, and Domain RQ when querying Domain Access.

2      Reference Model

Figure 1 gives the Reference Model of the User State Description (PGM-USD) AIM.

User State Description PGM-USD V1.0 Reference Model

Figure 1 – Reference Model of the User State Description (PGM-USD) AIM

3      Input/Output Data

Table 1 gives the Input/Output Data of the User State Description (PGM-USD) AIM.

Table 1 – Input/Output Data of the User State Description (PGM-USD) AIM

Input Description
Enhanced Audio Scene Descriptors Audio Scene Descriptors augmented with derived and semantic properties by Audio Scene Enhancement.
Enhanced Visual Scene Descriptors Visual Scene Descriptors augmented with derived and semantic properties by Visual Scene Enhancement.
Speech Descriptors Paralinguistic features of the speech segment produced by Automatic Speech Recognition.
User-USD Directive Control directives specifying scope, depth, or policy constraints for user state interpretation, broadcast to all SubAIMs.
Domain RS Domain‑specific knowledge supporting user state interpretation and constraint enforcement.
Output Description
User Entity State Structured description of the User’s observable state derived from multimodal evidence.
User-USD Status Aggregated status information describing the execution and outcome of User State Description processing.
Domain RQ Request for domain‑specific knowledge sent to Domain Access.

4      SubAIMs (informative)

4.1 Reference Model

An implementation of User State Description (PGM-USD) AIM may adopt the architecture of Figure 2.

User State Description PGM-USD V1.0 Composite Reference Model

Figure 2 – Reference Model of the User State Description (PGM-USD) Composite AIM

4.2 Operation

The USD operation includes the following SubAIMs:

  1. Automatic Speech Recognition – Converts the speech component of Enhanced Audio Scene Descriptors into text and Speech Descriptors carrying paralinguistic evidence.
  2. Multimodal Input Harmonisation – Aligns Enhanced Audio Scene Descriptors, Enhanced Visual Scene Descriptors, recognised text, and Speech Descriptors into a Harmonised Multimodal Context without semantic interpretation. Exchanges Domain RQ/RS with Domain Access.
  3. Linguistic‑Paralinguistic Analysis – Extracts linguistic and paralinguistic evidence from the Harmonised Multimodal Context. Issues Domain RQ and emits an independent User‑USD Status.
  4. Behavioural and Expressive Analysis – Derives behavioural and expressive indicators from the Harmonised Multimodal Context and linguistic evidence. Issues Domain RQ and emits an independent User‑USD Status.
  5. Cross‑Modal Interpretation – Integrates linguistic, behavioural, and expressive evidence into Cross‑Modal Interpretative Evidence. Issues Domain RQ and emits an independent User‑USD Status.
  6. Entity State Construction – Constructs the User Entity State from all evidence streams and the three independent User‑USD Status reports, producing the final User Entity State and aggregated User‑USD Status.

The User‑USD Directive from A‑User Control is broadcast to all SubAIMs. The three independent User‑USD Status reports from Linguistic‑Paralinguistic Analysis, Behavioural and Expressive Analysis, and Cross‑Modal Interpretation are fed into Entity State Construction for aggregation.

4.3 Functions of AI Modules

Table 2 specifies the Functions performed by PGM-USD AIM’s SubAIMs.

Table 2 – Functions performed by User State Description (PGM-USD) AIM SubAIMs

SubAIM Description
Automatic Speech Recognition Converts speech input into recognised text and Speech Descriptors carrying paralinguistic evidence.
Multimodal Input Harmonisation Aligns audio, visual, text, and speech descriptor inputs into a Harmonised Multimodal Context without semantic interpretation.
Linguistic‑Paralinguistic Analysis Extracts linguistic and paralinguistic evidence from the Harmonised Multimodal Context and Speech Descriptors.
Behavioural and Expressive Analysis Derives behavioural and expressive indicators of the User from multimodal evidence.
Cross‑Modal Interpretation Integrates linguistic and behavioural evidence into cross‑modal interpretative evidence under directives and domain constraints.
Entity State Construction Constructs the User Entity State from all evidence streams and aggregates the independent User‑USD Status reports from preceding SubAIMs.

4.4 I/O Data of AI Modules

Table 3 specifies the Input/Output Data of the PGM-USD AIM’s SubAIMs.

Table 3 – Input/Output Data of User State Description (PGM-USD) AIM SubAIMs

SubAIM Input Data Output Data
Automatic Speech Recognition Speech (from Enhanced Audio Scene Descriptors) Text Speech Descriptors
Multimodal Input Harmonisation Enhanced Audio Scene Descriptors Enhanced Visual Scene Descriptors Text Speech Descriptors User‑USD Directive Domain RS Harmonised Multimodal Context Domain RQ
Linguistic‑Paralinguistic Analysis Harmonised Multimodal Context Speech Descriptors User‑USD Directive Domain RS Linguistic‑PL Evidence Domain RQ User‑USD Status
Behavioural and Expressive Analysis Harmonised Multimodal Context Linguistic‑PL Evidence User‑USD Directive Domain RS Behavioural & Expressive Indicators Domain RQ User‑USD Status
Cross‑Modal Interpretation Harmonised Multimodal Context Linguistic‑PL Evidence Behavioural & Expressive Indicators User‑USD Directive Domain RS Cross‑Modal Interpretative Evidence Domain RQ User‑USD Status
Entity State Construction Harmonised Multimodal Context Linguistic‑PL Evidence Behavioural & Expressive Indicators Cross‑Modal Interpretative Evidence User‑USD Directive Domain RS User‑USD Status (×3) User Entity State User‑USD Status (aggregated)

4.5 AIMs and JSON Metadata

Table 4 provides the links to the AIM specifications and to the JSON syntaxes. AIM1 indicates the Composite AIM and AIM2 its SubAIMs.

Table 4 – AIMs and JSON Metadata

AIM1 AIM2 Name JSON
PGM-USD User State Description X
MMC-ASR Automatic Speech Recognition
PGM-MIH Multimodal Input Harmonisation
PGM-LPA Linguistic‑Paralinguistic Analysis
PGM-BEA Behavioural and Expressive Analysis
PGM-CMI Cross‑Modal Interpretation
PGM-ESC Entity State Construction

5      JSON Metadata

https://schemas.mpai.community/PGM1/V1.0/AIMs/UserStateDescription.json

6      Profiles

7      Reference Software

8      Conformance Testing

9      Performance Assessment