1       Introduction

2       The A‑User Architecture Reference Model

3       Types of AI Modules

4       Interaction Paradigms between AIMs

5       A‑User Control (PGM-AUC)

6       Context Capture (CXC)

7       Spatial and User Enhancement (SUE)

8       Prompt Creation (PRC)

9       Basic Knowledge (BKN)

10     Domain Access (DAC)

11     User State Refinement (USR)

12     Personality Alignment (PAL)

13     Avatar Formation (AUF)

14     A‑User Store (AUS) 11

1        Introduction

Technical Specification: Pursuing Goals in the Metaverse (MPAI‑PGM) – Autonomous User Architecture (PGM‑AUA) specifies the architecture, the functions, and the interfaces by which an Autonomous User (A‑User) operating in a metaverse instance (M‑Instance) interacts with another User in the same or in another M‑Instance.

An M‑Instance is a virtual space populated by Processes, such as Users, as specified by Technical Specification: MPAI Metaverse Model (MPAI‑MMM) – Technologies (MMM‑TEC).
Processes operate under the responsibility of a human. Users act as operational mediators between responsible humans and the M‑Instance. Users may exercise varying degrees of autonomy, including considerable or complete autonomy.

The User with which an A‑User interacts may be:

  • Another A‑User, or
  • A User under the direct control of a human, referred to as a Human‑User (H‑User).

An A‑User performs the following typical functions:

  • Captures textual, audio, and visual information originated by, or surrounding, the User with which it interacts.
  • Derives the User Entity State, representing a structured description of the User’s observable cognitive, emotional, and interactional condition.
  • Produces an appropriate multimodal response rendered through a Speaking Avatar, expressing a stance vis‑à‑vis the Space and the User.
  • Performs Actions and Process Actions in the M‑Instance, as specified by the MMM‑TEC standard.

The degree of autonomy exhibited by an A‑User is determined by the level of human intervention governing its operation.

Internally, an A‑User:

  • Perceives multimodal signals produced by:
    • the responsible human outside the M‑Instance, and
    • a User within the M‑Instance, which may be an A‑User or an H‑User.
  • Constructs an explicit description of the Space and derives the User Entity State from perceptual and contextual evidence.
  • Constructs a structured interaction context and submits it for deliberation.
  • Determines, through deliberative processing, the appropriate communicative behaviour of the A‑User vis‑à‑vis the Space and the User.
  • Produces spoken content and a corresponding A‑User Entity State that is congruent with the derived User Entity State.
  • Executes Actions and Process Actions through its Speaking Avatar as specified by MMM‑TEC.
  • Monitors logical, contextual, or governance constraints and escalates to the responsible human when autonomous resolution is not possible. 

2        The A‑User Architecture Reference Model

The A-User:

  1. Operates as an instruction-driven system composed of several interacting sub-processes, orchestrated by a central controller.
  2. The sub-processes are implemented as AI Modules (AIMs) organised in an AI Workflow (AIW) executed in an AI Framework (AIF) according to Technical Specification: AI Framework (MPAI-AIF) V3.0.

Figure 1 provides the Reference Model of the A-User Architecture.

Figure 1 – Autonomous User Architecture

Context Capture is the A-User front-end providing an initial perceptual representation of the Space around the A-User in the form of Audio Scene Descriptors and Visual Scene Descriptors.

Space and User Description produces a Refined Context by improving the understanding of the Space and by deriving the User Entity State from perceptual and contextual evidence. Domain information requested from Domain Access enables the A-User to perform semantic and contextual interpretation where required.

Prompt Representation and Construction uses the Refined Context, including the User Entity State and any textual information, to construct and maintain a structured interaction representation capturing the current situation of the A-User vis-à-vis the Space and the User. This structured representation is submitted to Basic Knowledge to initiate deliberative processing.

Basic Knowledge receives the structured interaction representation from Prompt Representation and Construction and determines the appropriate communicative behaviour of the A-User. To this end, Basic Knowledge may query Prompt Representation and Construction for reformulation, Domain Access for applicable rules and constraints, User State Refinement for improved consistency of the User Entity State, and Personality Alignment to ensure coherence with the A-User’s long-term behavioural profile.

User State Refinement improves the stability and temporal coherence of the User Entity State, while Personality Alignment determines the A-User Entity State that the A-User should assume to respond appropriately to the User.

Basic Knowledge produces the Final Response to be uttered by the A-User together with the corresponding A-User Entity State, ensuring congruence with the User Entity State and the context of interaction.

A-User Formation synthesises a speaking avatar that utters the Final Response produced by Basic Knowledge and expresses the A-User Entity State through speech, facial expression, and gesture.

A-User Control coordinates the activities of the A-User AIMs by issuing Directive messages and receiving Status messages.

3        Types of AI Modules

The A-User’s AIMs belong to the following classes:

(1) Control AIMs

  • A-User Control (AUC)
    • Receives Human Commands from the responsible Human
    • Governs the runtime operation of the A‑User.
    • Issues Directive messages to AIMs.
    • Receives Status Messages from AIMs.
    • Acts in the M-Instance by rendering and animating its speaking avatar.

(2) Perception AIMs

  • Context Capture (CXC)
    • Captures media data (Text, Audio, Visual, and 3D Model) from an M-Location.
    • Produces initial snapshots of the Audio and Visual Scene Descriptors  (ASD0 and VSD0).
    • Sends the initial Audio and Visual Scene Descriptors to Space and User Description.
  • Space and User Description (SUD)
    • Produces enhanced Audio Scene Descriptors (ASD1) by acting on the initial Audio Scene Descriptors.
    • Produces enhanced Visual Scene Descriptors (VSD1) by acting on the initial Visual Scene Descriptors.
    • Produces User Entity State (UES).
    • Gets Domain information from Domain Access.
  • Domain Access (DAC)
    • Provides controlled access to domain knowledge required by the Space and User Description AIM.
    • Exposes domain constraints, object classes, and explicit relations governing the structure and rules of the domain.
    • Provides object affordances (the possible actions that an object offers, based on its physical properties), including allowed actions and their contextual conditions.

(3) Interpretation AIMs

  • Prompt Creation (PRC)
    • Using Enhanced Context, provides PRC Prompt as an interpreted input to Basic Knowledge (BKN) .
  • Basic Knowledge (BKN)
    • Performs semantic interpretation and reasoning by engaging PRC, Domain Access,  User State Refinement (USR), and Personality Alignment (PAL).
    • Uses MCP‑based semantic interactions under A-User Control.
    • Provides an interpreted, execution‑ready outcome (Final Response) to A‑User Formation (AUF).

(4) Semantic Provider AIMs

  • Prompt Creation (PRC)
    • Initiates and participates in MCP‑based semantic interactions when instructed by A-User Control.
    • Provides assembled and aligned semantic artefacts (e.g., PRC Prompt components) to the Basic Knowledge LLM to support reasoning.
  • Domain Access (DAC)
    • Provides the domain‑specific meaning of entities, relations, actions, and constraints so that the Basic Knowledge LLM can represent information using concepts consistent with the domain model.
    • Defines and exposes the canonical semantics (classes, roles, relations, and affordances) that Basic Knowledge uses to ensure consistency across reasoning and generation AIMs.
  • User State Refinement (USR)
    • Constructs an authoritative but ephemeral User Entity State by integrating evidence from A-User Storage (AUS) and M-Instance Services (e. g., Authenticate).
    • Supplies the User Entity State to Basic Knowledge to support reasoning and to Personality Alignment to support alignment of the A-User’s personality with that of the User.
  • Personality Alignment (PAL)
    • Constructs an ephemeral A‑User Entity State by aligning A-User’s Personality with the User Entity State and the current context.
    • Provides the A‑User Entity State to Basic Knowledge and to A‑User Formation to add behaviour and expression to the avatar model.

(5) Execution AIMs

  • A‑User Formation (AUF)
    • Implements communicative intent and the multimodal output of the User Persona in accordance with the A‑User Entity State.
    • Produces speech, gesture, facial expression, gaze, and body motion as required by the Final Response.
    • Executes behaviour under A-User Control governance without performing semantic interpretation or reasoning.

4        Interaction Paradigms between AIMs

The A‑User Architecture AIMs exchange information using two interaction paradigms that differ in terms of determinism, statefulness, and semantic dependency.

Operational Interfaces support stateless, deterministic runtime information exchange.
Semantic Interfaces, implemented using the Multi Call Protocol (MCP), support session‑level semantic grounding, clarification, and alignment.

For example, queries issued by Audio Scene Enhancement or Visual Scene Enhancement to Domain Access are operational in nature and do not depend on prior exchanges.

4.1       Operational Interfaces

Operational Interfaces define a lightweight, deterministic interaction paradigm used by AIMs to exchange specific information required for runtime operation, coordination, and validation within the A‑User Architecture.

An operational interaction consists of a structured, typed request issued by an AIM and a corresponding structured, typed response. Each interaction is self‑contained and stateless.

Operational interfaces:

  • Do not preserve conversational or semantic context.
  • Do not rely on memory of prior exchanges.
  • Do not support clarification loops or iterative refinement.

Operational interfaces are used when:

  • An AIM requires explicit information, constraints, or validation to perform its function.
  • The scope of ambiguity is local and does not require contextual interpretation.
  • The requested information can be provided as facts, constraints, affordances, or status.

Operational interfaces are used uniformly across:

  • Context Capture – Scene Enhancement
  • Scene Enhancement – Domain Access
  • A‑User Control – AIMs

Typical examples include:

  • Audio Scene Enhancement querying Domain Access for audio object classification constraints or processing parameters.
  • Visual Scene Enhancement querying Domain Access for object category validation or affordance constraints.
  • A‑User Control issuing Directives and receiving Status messages from AIMs.

Operational interfaces support runtime operation and coordination, but SHALL NOT perform semantic interpretation, deliberation, or communicative behaviour determination.

4.2       Semantic Interfaces (MCP)

Semantic Interfaces, implemented using the Multi Call Protocol (MCP), support interactions that require semantic interpretation, contextual alignment, or controlled refinement of meaning across multiple exchanges.

Semantic interfaces are session‑oriented. Intermediate assumptions, bindings, and semantic commitments MAY be preserved across successive exchanges within the same session.

Semantic interfaces are used when:

  • Semantic meaning must be established or refined rather than retrieved.
  • Interpretation depends on context evolution across multiple interactions.
  • Multiple semantic sources must be consulted in a coordinated manner.

Typical examples include:

  • Prompt Representation and Construction initiating semantic interaction with Basic Knowledge.
  • Basic Knowledge querying Prompt Representation and Construction to refine contextual structuring.
  • Basic Knowledge querying Domain Access for domain rules, constraints, and normative semantics.
  • Basic Knowledge querying User State Refinement and Personality Alignment for User and A‑User Entity State semantics.

Semantic interfaces provide meaning, structure, and contextual grounding. They SHALL NOT be used for operational coordination, perceptual exchange, or task execution.

4.3       Reactive vs Deliberative Execution

To ensure high responsiveness while preserving semantic correctness, the A‑User Architecture distinguishes between reactive execution and deliberative execution.

Reactive execution supports low‑latency, time‑critical A‑User behaviour. It enables immediate responses based exclusively on previously produced results, including:

  • Initial audio and visual space perceptual descriptors (ASD0, VSD0).
  • Enhanced perceptual descriptors (ASD1, VSD1).
  • User Entity State.
  • A‑User Entity State.

Reactive execution:

  • Operates without initiating Multi Call Protocol (MCP) interactions.
  • Relies on cached or last‑known semantic results.
  • Supports rapid conversational turn‑taking and micro‑behaviours (e.g., gaze, posture, or acknowledgements).
  • Is governed by A‑User Control and executed by A‑User Formation.

Reactive execution is the default mode for time‑critical behaviour.

Deliberative execution supports behaviour selection that requires semantic interpretation:

  • Uses MCP‑based semantic interactions.
  • Is centred on Basic Knowledge, interacting with Prompt Creation, Domain Access, User State Refinement, and Personality Alignment.
  • May incur additional latency due to multi‑step semantic processing.
  • Produces refined communicative behaviour, updated A‑User Entity State, and final spoken output.

Much as the brain received what the eyes send without making changes, perceptual descriptors (ASD1, VSD1) are not modified during deliberative execution.

Deliberative execution is initiated by A‑User Control and proceeds independently of reactive execution, thus continuing to support real‑time behaviour.

5 A‑User Control (PGM‑AUC)

The function of A‑User Control (AUC) is to govern the life-cycle and runtime coordination of the A‑User by issuing Instructions to AIMs, receiving their Status messages, and ensuring that autonomous operation remains consistent with human authorisation, M‑Instance Rules, and the A‑User’s Rights.

A‑User Control does not perform semantic interpretation, deliberation, or behaviour determination. Its role is rather to orchestrate and supervise the execution of AIMs that collectively produce a speaking avatar exhibiting an A‑User Entity State and uttering speech appropriate for the interaction context.

The A‑User semantically interprets its interactions with the User and the responsible human through dedicated interpretation and deliberation AIMs. A‑User Control mediates the resulting outcomes related to execution, oversight, and governance, independently of the specific human interface used.

Human–A‑User Control interactions may include:

  • Initiation or termination of A‑User activity.
  • Authorisation or denial of specific actions.
  • Escalation handling when conflicts cannot be resolved autonomously.
  • Adjustment of autonomy level or execution constraints. For example, Context Capture scanning the environment may be stopped because the A-User sets a different priority, such as analysing in detail the last scene captured.

A‑User Control governs the A‑User life-cycle by issuing one or more of the following Instruction Types. Each Instruction targets a specific set of AIMs and defines the operational scope under which they execute. The operation of the A-User described below evokes that of the humans brain.

  1. Perception and Environment Capture (PEC): Configure perceptual subsystems to sense the responsible human in the Universe, the User in the M‑Instance, or the contents of the relevant M‑Location.
  2. Goal and Language Acquisition (GLA): Capture and segment multi-modal expressions of the responsible human and/or the User. This corresponds to giving identity and spatial relationship to the objects in the scene and to understanding the User.
  3. Prompting and Knowledge Query (PKQ): Enable structured contextual representation and semantic grounding of perceptual and state information. The A-User is now in a position to make sense of what it perceives and interprets.
  4. Goal and Intent Interpretation (GII): Based upon the results of the previous Instruction, trigger deliberative processing to determine the appropriate communicative behaviour of the A‑User. The A-User Control is now able to decide the stance it should take.
  5. Policy, Rights, and Feasibility (PRF): Validate intended behaviour with respect to governance rules, User Entity State constraints, human commands, and domain feasibility conditions. The A-User Control knows that the A-User actions must comply with a variety of contraints.
  6. Plan Construction and Execution (PCE): Orchestrate execution of the behaviour based on Statuses reported by the AIMs, including speech and actions. The A-User Control is now able to executed the actions after taking constraints into accounts.
  7. Conflict Management and Escalation (CME): Detect unresolved inconsistencies or conflicts and escalate to the responsible human when required. Various impediments may be encountered and modifications to the action decided implemented
  8. Avatar Formation and Rendering (AFR): Enable synthesis and rendering of the speaking avatar. The A-User’s Avatar is formed and can be rendered.

Table 1 identifies the AIMs that receive a Directive when A‑User Control issues a specific Instruction. An X indicates that the AIM (row) receives the Instruction (column).

Table 1 – Instructions and affected AIMs

PEC GLA PKQ GII PRF PCE CME AFR
CXC (Context Capture) X
SUD (Space and User Description) X X X
PRC (Prompt Creation) X
BKN (Basic Knowlefge – LLM) X X X X X
DAC (Domain Access) X X X X
USR (User State Refinement) X X X X X
PAL (Personality Alignment) X X X X X
AUF (Avatar Formation) X X X

Note 1: Participation of ASR and VSR in PRF and PCE reflects spatial and environmental feasibility assessment and monitoring, not semantic reasoning.

Note 2: Directive-Status interactions between AUC and AIMs do not use MCP because this is reserved exclusively for semantic interactions.

6        Context Capture (CXC)

The function of Context Capture (CXC) is to capture and extract perceptual information from the environment of the User.

Context Capture performs the following operations:

  • Captures media data (Text, Audio, Visual, and 3D Model) produced by the responsible human in the universe or by the User in the M‑Instance.
  • Produces initial Audio Scene Descriptors (ASD0) and Visual Scene Descriptors (VSD0) representing the captured M‑Location, prior to any semantic reasoning or domain‑specific enhancement.
  • Ensures temporal synchronisation and modality coherence of the captured perceptual data.
  • Provides the initial Audio Scene Descriptors and Visual Scene Descriptors to Scene Enhancement components for further processing.

7        Spatial and User Description (SUD)

The function of Space and User Description (SUD) is to enhance and align perceptual descriptions of the Space and to prepare multimodal evidence for subsequent derivation of the User Entity State.

Space and User Description performs the following operations:

  1. Receive
    1. Initial Audio Scene Descriptors (ASD0) and Visual Scene Descriptors (VSD0) from Context Capture.
    2. Space and User Directives from A‑User Control.
  2. Enhance
    1. Audio Scene Descriptors by deriving enhanced audio information (ASD1) through modality‑specific processing, including spatial, environmental, and salience‑related enrichment.
    2. Visual Scene Descriptors by deriving enhanced visual information (VSD1) through modality‑specific processing, including depth, occlusion, affordance, and salience‑related enrichment.
  3. Align
    1. Perform explicit audio‑visual alignment between ASD1 and VSD1 to establish coherent multimodal correspondence within the Space.
  4. Request domain information
    1. Query Domain Access to obtain domain semantics relevant to:
      1. Spatial environments (e.g. office, workshop, kitchen).
      2. Audio‑ and visual object interpretation and affordances.
      3. Contextual constraints governing the interpretation of perceptual evidence.
  5. Produce outputs
    1. A Refined Context consisting of:
      1. ASD1: enhanced Audio Scene Descriptors.
      2. VSD1: enhanced Visual Scene Descriptors.
      3. Explicit audio‑visual alignment information.
    2. Space and User Status, reporting the scope and outcome of SUD processing to A‑User Control.

Space and User Description does not derive, refine, or own a User Entity State.
The User Entity State is produced exclusively by User State Description.

8        Prompt Creation (PRC)

The function of Prompt Creation is to assemble, organise, and expose a structured representation of the current interaction situation of the A‑User vis‑à‑vis the Space and the User, and to submit this representation to Basic Knowledge for deliberation.

Prompt Creation does not perform perception, semantic interpretation, reasoning, or behaviour determination. Its role is to bind together perceptual, state, contextual, and historical information into a coherent representation suitable for deliberative processing by Basic Knowledge.

Prompt Creation consumes the following inputs:

  • The Enhanced Context produced by Space and User Description, including:
    • Enhanced Audio Scene Descriptors (ASD1),
    • Enhanced Visual Scene Descriptors (VSD1),
    • Explicit audio‑visual alignment information,
    • Perceptually derived and domain‑grounded semantics, including any inferred goals or provisional user‑related interpretations.
  • The authoritative User Entity State produced by User State Description.
  • Relevant references to Space‑ and User‑related information stored in A‑User Storage, including persistent attributes, long‑term context, and previously established facts.
  • The Interaction History associated with the current session.

Prompt Creation performs the following operations:

  • Assembles the Enhanced Context, User Entity State, stored references, and interaction history into a structured interaction representation that captures the current situation of the A‑User.
  • Ensures that spatial, referential, and temporal relationships across modalities are coherently bound within the interaction representation.
  • Submits the structured interaction representation to Basic Knowledge to initiate deliberative processing.
  • Responds to requests from Basic Knowledge by re‑structuring, refining, or clarifying elements of the interaction representation, including resolving references to Space or User attributes.
  • Initiates and mediates Semantic (MCP‑based) interactions as required by Basic Knowledge, ensuring session continuity for multi‑stage clarification.

Prompt Creation MAY be activated by:

  • An instruction from A‑User Control to construct or update the interaction representation.
  • A request from Basic Knowledge refine, restructure, or clarify elements of an existing interaction representation.
  • Changes in Enhanced Context, User Entity State, or Interaction History that require the interaction representation to be updated.

Prompt Creation acts as a Context Representation Provider. It is the sole AIM responsible for assembling Enhanced Context, User Entity State, stored references, and Interaction History into a coherent situation representation for deliberation.

9    Basic Knowledge (BKN)

During a Goal and Intent Interpretation Instruction, Basic Knowledge executes an iterative deliberative processing pipeline to determine the appropriate communicative behaviour of the A‑User vis‑à‑vis the Space and the User.

  • Basic Knowledge receives a structured interaction representation from Prompt Creation. This representation includes:
    • The Enhanced Context produced by Space and User Description,
    • The authoritative User Entity State produced by User State Description,
    • Relevant references to Space‑ and User‑related information stored in A‑User Storage,
    • The Interaction History associated with the current session.
  • Basic Knowledge reasons over this representation to determine:
    • The communicative stance the A‑User should assume, and
    • The spoken or acted response appropriate to the current situation.
  • Throughout this process, Basic Knowledge integrates semantics from multiple supporting AIMs through the following interaction loops.

Loop 1 – Context Structuring (PRC)

Basic Knowledge MAY query Prompt Creation via MCP to refine, restructure, or clarify elements of the structured interaction representation, including spatial, referential, or temporal relationships. Prompt Creation does not introduce new semantics, but re‑binds existing information. A‑User Storage is accessed solely as a repository of stored information and historical records.

Loop 2 – Domain Semantics (DAC)

Basic Knowledge queries Domain Access via MCP to obtain domain rules, constraints, affordances, and normative semantics required to interpret entities, relations, and feasible actions within the Space.

Loop 3 – User Semantics (USR)

Basic Knowledge queries User State Refinement via MCP to obtain refined or temporally stabilised User Entity State information. When authorised by A‑User Control, User State Refinement integrates information from A‑User Storage  and M‑Instance services (e.g. Rights, preferences) to provide the best available User Entity State.

Loop 4 – A‑User State and Personality Alignment

Basic Knowledge queries Personality Alignment via MCP to derive the A‑User Entity State that aligns the A‑User’s persona with the User Entity State, interaction context, and long‑term behavioural constraints. Personality Alignment may consult User State Refinement  as required. Persistent updates to User‑ or A‑User‑related state are stored in A‑User Storage under A‑User Control governance.

Upon convergence of deliberative processing, Basic Knowledge produces the Final Response to be uttered by the A‑User together with the corresponding A‑User Entity State, and provides them for execution by A‑User Formation.

10    Domain Access (DAC)s

Domain Access provides authoritative domain‑level semantics to both capture‑level semantic grounding and deliberative reasoning. These two uses are logically separated: capture‑level domain access is situational and non‑goal‑directed, while reasoning‑level domain access is goal‑directed and deliberative.

Domain Access supplies object classes, relations, affordances, constraints, and behavioural expectations derived from domain knowledge, without performing perception or reasoning itself.

  • Provides Domain Semantics to Audio Scene Enhancement, Visual Scene Enhancement, Prompt Creation, User State Refinement, Personality Alignment, and Basic Knowledge, including object and event classes, relations, affordances, scene constraints, and expected behaviours.
  • Supports domain grounding of perceptual and situational elements, enabling consistent interpretation of entities, actions, and relations detected or inferred by other AI Modules.
  • Exposes two interaction interfaces, corresponding to the two interaction paradigms:
    • A Directed Semantic Query Interface, used by spatial AI Modules (e.g. Audio Scene Enhancement, Visual Scene Enhancement) for single‑shot semantic grounding and validation without session continuity.
    • A Dialogic Semantic Interface based on the Multi‑Call Protocol (MCP), used by reasoning and interpretation AI Modules (e.g. Basic Knowledge, Prompt Creation, User State Refinement, Personality Alignment) for multi‑turn semantic clarification and integration.
  • Returns structured, typed domain knowledge, including confidence and constraint information, suitable for direct consumption by perception pipelines or for semantic reasoning via MCP.
  • Does not maintain perception state, user state, or session state beyond that required for MCP interactions, and does not modify perceptual or user descriptors produced by other AI Modules.

11    User State Refinement (USR)

User State Refinement acts as the semantic authority for user modelling during deliberative processing. It transforms distributed internal and external user‑related evidence into a coherent, queryable User Entity State suitable for reasoning and Personality alignment.

User State Refinement derives and provides authoritative, session‑level user semantics by consolidating user‑related evidence originating from other AI Modules and from M‑Instance services, under the governance of A‑User Control:

  • Consumes user‑related evidence from A‑User Storage, including:
    • User Entity State snapshots produced by User State Description.
    • User‑related semantic interpretations generated by perception‑ and interpretation‑level AI Modules, including Audio Scene Enhancement, Visual Scene Enhancement, and Prompt Creation.
    • Interaction history and previously authorised user‑related records.
  • Draws additional user‑related information from authorised M‑Instance services, when available and permitted by A‑User Control, including:
    • Identity and role information obtained from authentication or identity services.
    • Persistent or ephemeral user profile attributes managed by the M‑Instance.
  • Refines the User Entity State by integrating behavioural cues, interaction patterns, session‑level preferences, historical fragments, and authorised M‑Instance information, producing an authoritative but ephemeral User Entity State valid for the current deliberative context.
  • Integrates profile fragments from A‑User Storage and, when authorised by A‑User Control, from M‑Instance services, without persisting inferred User Entity State unless explicitly permitted.
  • Provides refined User Entity State information to Basic Knowledge to support determination of the A‑User’s communicative behaviour in the current situation.
  • Provides refined User Entity State to Personality Alignment upon request, enabling alignment and derivation of the A‑User Entity State.
  • Does not modify User Entity State snapshots produced by User State Description and does not autonomously persist user semantic interpretations. Persistence of user‑related data is governed exclusively by A‑User Control and managed through A‑User Storage.

12    Personality Alignment (PAL)

Personality Alignment acts as the semantic authority for Personality alignment, ensuring that the A‑User’s behaviour and responses are consistent with the active Personality while remaining grounded in authoritative User semantics.

Personality Alignment derives and provides the A‑User Entity State by aligning the User Entity State with the A‑User’s Personality, together with communicative, expressive, and behavioural constraints. The resulting A‑User Entity State is authoritative but ephemeral and is constructed without performing perception or direct user modelling.

  • Consumes User semantics from User State Refinement, including the authoritative, session‑level User Entity State required to perform Personality alignment.
  • Draws Personality‑related evidence from A‑User Storage, including previously authorised A‑User Entity State fragments, Personality preferences, and interaction‑derived Personality indicators.
  • May draw additional Personality‑related information from M‑Instance services, when permitted by A‑User Control, including externally defined Personality profiles, role‑ or context‑specific Personality constraints, and policy‑driven expressive or behavioural guidelines.
  • Constructs and refines the A‑User Entity State by integrating Personality traits, communicative modulation rules, expressive and behavioural constraints, and User‑specific adaptations provided by User State Refinement.

The resulting A‑User Entity State is session‑scoped and context‑dependent.

13    Avatar Formation (AUF)

A‑User Formation realises the externally perceptible behaviour of the A‑User within the M‑Instance. It transforms deliberative outcomes produced by Basic Knowledge and Personality-aligned constraints into synchronised multimodal avatar behaviour, without performing semantic reasoning or user modelling.

  • Consumes execution plans, communicative intent, and expressive directives produced by Basic Knowledge, aligned with the current A‑User Entity State.
  • Transforms communicative intent into natural language utterances consistent with the A‑User Entity State, communicative constraints, and situational context.
  • Synthesises speech output, including prosody, rhythm, and timing, aligned with linguistic content and expressive intent.
  • Renders facial expressions, eye gaze, and head pose consistent with communicative, emotional, and attentional modulation.
  • Produces gestures and full‑body avatar motion, including posture and movement, aligned with speech and interaction context.
  • Synchronises multimodal outputs across text, speech, facial animation, gaze, and body motion to ensure coherent and natural avatar behaviour.
  • Executes the final plan steps as observable Persona Actions within the M‑Instance.
  • Emits execution status information to A‑User Control, reporting progress, completion, or execution anomalies related to avatar behaviour.

14    A‑User Store (AUS)

A‑User Storage acts as a governed evidence and information store within the A‑User Architecture. It records and exposes authorised artefacts related to perception, user‑related evidence, Personality adaptations, and Interaction History, under the control of A‑User Control.

A‑User Storage is not an AI Module. It is a functionality of the AI Framework supporting controlled persistence, retrieval, and traceability of authorised information.

14.1  Governance

A‑User Control is the sole authority governing access to A‑User Storage. It authorises:

  • Which AI Modules may write to A‑User Storage,
  • Which categories of information may be recorded,
  • Which persistence scope (temporary, session‑bound, or longer‑term) applies,
  • Which AI Modules may read specific stored artefacts.

A‑User Control enforces these policies throughout the A‑User lifecycle.

14.2  Information Stored in A‑User Storage

Subject to authorisation by A‑User Control, A‑User Storage may store:

  • Perceptual context snapshots produced by Context Capture.
  • Spatial and scene descriptors produced by scene enhancement AI Modules.
  • User‑related evidence and authorised fragments derived from interactions.
  • Personality‑related fragments and authorised adaptations.
  • Interaction History and associated session metadata.
  • Trace metadata associated with stored artefacts.

14.3  AI Module Write Access

The following AI Modules may write to A‑User Storage under explicit authorisation by A‑User Control:

  • Context Capture writes perceptual context snapshots, including initial Audio and Visual Scene Descriptors.
  • Audio Scene Enhancement and Visual Scene Enhancement write enhanced scene descriptors and associated metadata.
  • Prompt Representation and Construction may record interaction‑representation artefacts, references, and session metadata when required.
  • User State Refinement may record authorised user‑related evidence fragments or updates, but does not store the authoritative User Entity State unless explicitly authorised by A‑User Control.
  • Personality Alignment may record authorised A‑User Entity State fragments, but does not autonomously persist Personality‑derived semantics.
  • Basic Knowledge may record authorised reasoning outcomes, annotations, or decisions as structured artefacts when required for orchestration, traceability, or audit.

14.4  AI Module Read Access

The following AI Modules may read from A‑User Storage, subject to A‑User Control policy:

  • Prompt Representation and Construction reads scene descriptors, context snapshots, user‑related evidence, and Interaction History to assemble the structured interaction representation.
  • User State Refinement reads user‑related evidence, interaction history, and authorised profile fragments to derive the authoritative but ephemeral User Entity State.
  • Personality Alignment reads authorised user semantics and Personality‑related fragments to construct the A‑User Entity State.
  • Basic Knowledge reads authorised artefacts to support deliberation, semantic integration, and traceability.
  • A‑User Control reads all stored artefacts for orchestration, governance, auditing, and lifecycle management.