1. Function 2. Reference Model 3. Input/Output Data
4. SubAIMs 5. JSON Metadata 6. Profiles
7. Reference Software 8. Conformance Testing 9.Performance Assessment

1. Function

The User State Description (PGM-USD) AIM  derives a  description of the User’s observable state by interpreting enriched audio‑visual scene information and cross‑modal correspondence evidence.

USD operates on enhanced scene descriptors and alignment evidence produced by Space and User Description and applies evidence‑based reasoning under directive control to construct a User Entity State suitable for downstream reasoning, control, and personalization.

2. Reference Model

Figure 1 gives the Reference Model of User State Description (PGM-USD) AIM.

Figure 1 – Reference Model of User State Description (PGM-USD) AIM.

The User State Description reference model consists of an evidence‑driven interpretative pipeline operating on enriched perceptual inputs.

At a conceptual level, USD performs the following functions:

  1. Multimodal Evidence Integration: Integration of enhanced audio descriptors, enhanced visual descriptors, and audio‑visual scene geometry to establish user‑centric evidence.
  2. Linguistic and Paralinguistic Analysis: Interpretation of textual input (e.g. ASR output) and associated audio evidence to extract communicative cues relevant to user state.
  3. Behavioural and Expressive Analysis: Interpretation of visual and audio evidence to derive observable behavioural and expressive indicators.
  4. Entity State Construction: Evidence‑based construction of a User Entity State under the constraints imposed by directives and policies.
  5. Output Packaging and Provenance: Packaging of the constructed User Entity State together with status and provenance metadata.

The reference model explicitly separates evidence extraction, interpretation, and state construction, ensuring traceability, auditability, and modularity.

3. Input/Output Data

Table 1 – Input and Output Data of User State Description (PGM-USD) AIM

Input Data Description
Enhanced Audio Scene Descriptors Audio Scene Descriptors augmented with derived and semantic properties by Audio Scene Enhancement.
Enhanced Visual Scene Descriptors Visual Scene Descriptors augmented with derived and semantic properties by Visual Scene Enhancement.
Speech Speech component of Enhanced Audio Scene Descriptors.
User SUD Directive Control directives specifying scope, depth, or policy constraints for user state interpretation.
User Domain Request Domain‑specific knowledge supporting user state interpretation and constraint enforcement.
Output Data Description
User Entity State Structured description of the User’s observable state derived from multimodal evidence.
User SUD Status Status information describing the execution and outcome of User State Description processing.
User Domain Response Response to domain‑specific knowledge request.

3. SubAIMs (Informative)

User State Description (PGM-USD) AIM  implementation may adopt the architecture of Figure 2.

Figure 2 – Reference Model of User State Description (PGM-USD) Composite AIM

A User State Description (PGM-USD) AIM adopting the architecture of Figure 2, will use the Input and Output Data of Table 2.

Table 2 – Input and Output Data of the User State Description (PGM-USD) Composite AIM’s SubAIMs.

SubAIM Specification Purpose Input Data Output Data
Automatic Speech Recognition Converts speech input into textual representation suitable for downstream multimodal processing. Speech TXT
Multimodal Input Harmonisation Aligns audio, visual, and textual inputs into a harmonised multimodal context without semantic interpretation. ASD
VSD
TXT
User‑SUD Directive
Harmonised Multimodal Context
Linguistic–Paralinguistic Analysis Extracts linguistic and paralinguistic evidence from harmonised multimodal context. Harmonised Multimodal Context
User‑SUD Directive
Domain RS
Linguistic‑Paralinguistic Evidence
Behavioural and Expressive Analysis Derives behavioural and expressive indicators of the User from multimodal evidence. Harmonised Multimodal Context
User‑SUD Directive
Domain RS
Behavioural & Expressive Indicators
Cross‑Modal Interpretation Integrates linguistic and behavioural evidence into cross‑modal interpretative evidence under directives and domain constraints. Linguistic‑Paralinguistic Evidence
Behavioural & Expressive Indicators
User‑SUD Directive
Domain RS
Cross‑Modal Interpretative Evidence
Entity State Construction Constructs the User Entity State from cross‑modal interpretative evidence. Cross‑Modal Interpretative Evidence
User‑SUD Directive
User Entity State
User‑SUD Status

5. JSON Metadata

https://schemas.mpai.community/PGM1/V1.0/AIMs/UserStateDescription.json

6. Profiles

7. Reference Software

8. Conformance Testing

9. Performance Assessment