<–References         Go to ToC          AI Modules–>

(Tentative)

Function Reference Model Input/Output Data
Functions of AI Modules Input/output Data of AI Modules AIW, AIMs, and JSON Metadata

1       Function

The A-User Architecture is represented by an AI Workflow that

  • May receive a command from a human.
  • Captures Text Objects, Audio Objects, 3D Model Objects and Visual Objects of an Audio-Visual Scene in an M-Instance that includes one User that can be Autonomous (A-User) or Human (H-User) as a result of an Action performed by the A-User or requested by a Human Command.
  • Produces an Action or a Process Action Request that may reference the A-User’s Persona, i.e., the speaking Avatar generated by the A-User in response to the input data.
  • Receives Process Action Responses to Process Action Requests made by the A-User.

2       Reference Model

Figure 1 gives the Reference Model of the AI Workflow implementing the Autonomous User.

Figure 1 – The Reference Model of the Autonomous User Architecture (MMM-AUA) AIW

The operation of the Autonomous User unfolds in a sequence of activities carried out by the different AIMs each generating Data sent to other AIMs.

The A-User Control AIM:

  • Governs the operational lifecycle of the A-User and orchestrates its interaction with both the M-Instance and the human User.
  • Drives A-User operation by controlling the AIMs and the interaction of the A-User with the M-Location. In Figure 1, these controls and feedback are omitted for graphical reasons but will be added in the figures below.
  • Performs Actions and Process Actions in the M-Instance based on the Rights it holds and the M-Instance Rules

The Context Capture AIM produces Context to enable the A-User perceive its environment – defined as an M-Location within an M-Instance. Context includes:

  • Audio-Visual Scene Descriptors [13].
  • Text.
  • User State, comprising Personal Status [15] and other User descriptors.

The Spatial Reasoning AIM

  • Interprets the Audio, Speech, and Visual components of an M-Location.
  • Generates:
    • Spatial Output that includes:
        • Audio Spatial Output with spatial and temporal interpretations of audio objects.
        • Visual Spatial Output with spatial relationships, referent resolutions, and interaction constraints.
    • Spatial Guide that includes:
      • Audio Spatial Guide, an A-User-centric representation of the spatial audio context that enriches the User’s text with audio spatial cues, such as sound source relevance, directionality, and proximity.
      • Visual Spatial Guide, an A-User-centric representation of the visual spatial context that enriches the User’s text with visual spatial cues, such as object relevance, orientation, proximity, and affordance.
  • Sends the Audio and Visual Spatial Guide to the Prompt Creation AIM.

The Prompt Creation AIM:

  • Augments the User’s Text with the
    • Audio and Visual Spatial Guide.
    • The User description component of Context.
  • Produces PC-Prompt – a natural language expression structured according to the PC-Prompt Plan JSON Schema.
  • Sends PC-Prompt to the Basic Knowledge AIM, a generic Large Language Model (LLM) with foundational language and reasoning capabilities.

Basic Knowledge sends its Initial Response to the Domain Access AIM.

Spatial Reasoning sends Audio and Visual Spatial Outputs to Domain Access AIM.

Domain Access:

  • Accesses domain-specific knowledge (e.g., inside the Domain Access AIM of from an M-Instance Service).
  • Produces and sends Spatial Directive back to Spatial Reasoning to improve its understanding of the Audio and Visual Scene.

Domain Access

  • Integrates the Initial Response with the Audio and Visual Spatial Outputs
  • Produces a JSON object based on the DA-Prompt Plan JSON Schema.
  • Converts DA-Prompt Plan into a quasi-natural language prompt (DA-Prompt).
  • Sends DA-Prompt to Basic Knowledge.
  • Sends its improved Context understanding to User State Refinement and Personality Adaptation.

Basic Knowledge

  • Produces Enhanced Response that utilises updated information about the User State
  • Sends Enhanced Response to the User State Refinement AIM.

User State Refinement

  • Converts Enhanced Response into a structured format using the UR-Input JSON Schema.
  • Refines Understanding of User State.
  • Generates a JSON object based on the UR-Prompt Plan JSON Schema
  • Converts UR-Prompt Plan into the quasi-natural language UR-Prompt.
  • Sends UR-Prompt to Basic Knowledge
  • Sends an Expressive State Guide to the Personality Alignment AIM.

Basic Knowledge

  • Produces Refined Response,
  • Sends Refined Response to Personality Alignment.

Personality Alignment

  • Converts Adapted Response into a JSON object using PA-Input JSON Schema.
  • Selects the appropriate Personality Profile.
  • Generates a JSON object based on the PA-Prompt Plan JSON Schema.
  • Converts PA Prompt Plan into the quasi-natural language PA-Prompt.
  • Sends PA Prompt to Basic Knowledge.
  • Formulates an A-User Personal Status.
  • Sends A-User Personal Status to A-User Rendering

A-User Rendering produces a Speaking Avatar using

  • Text, Speech, Visual
  • Command from A-User Control to shape the Speaking Avatar

The propagation of information through the Basic Knowledge AIMs takes place from the AIMs in the first column to the right through the AIMs, e.g., Prompt Creation to Basic Access and then to Domain Access from the right to the left:

Table 2 – the flow of Prompts and Responses though the A-User’s Basic Knowledge

Prompt Creation PC-Prompt Plan PC-Prompt Basic Knowledge
Domain Access DA Input Initial Response
DA-Prompt Plan DA-Prompt
User State Refinement UR Input Enhanced Response
UR-Prompt Plan UR-Prompt
Personality Alignment PA Input Refined Response
PA-Prompt Plan PA-Prompt
A-User Output Final Response

3       Input/Output Data

Table 3 gives the Input/Output Data of the Autonomous User AIW.

Table 3 – Input/output data of the Autonomous User

Input Description
Human Command A command from the responsible human overtaking or complementing the control of the A-User.
Process Action Response Generated by the M-Instance Process sin response to the A-User’s Process Action Request
Text Object User input as text.
Audio Object The Audio component of the Scene where the User is embedded.
3D Model Object The 3DModel component of the Scene where the User is embedded.
Visual Object The Visual component of the Scene where the User is embedded.
Output Description
Human Command Status
Action Action performed by A-User.
Process Action Request A-User’s Process Action Request.

4       Functions of AI Modules

Table 4 gives the functions performed by PGM-AUA AIMs.

Table 4 – Functions of PGM-AUA AIMs

Acronym Name Definition
PGM-AUC A-User Control Performs Actions and Process Action Requests, such as utter a speech or move its Persona (Avatar) consequent to its interactions with the User.
PGM-CXT Context Capture Captures at least one of Text, Audio, 3D Model, and Visual, and produces Context, a representation of the User and the environment where the User is located.
PGM-ASR Audio Spatial Reasoning Transforms raw Audio Scene Descriptors and Audio cues into semantic outputs that Prompt Creation (PRC) uses to enhance User Text and to Domain Access (DAC) seeking additional information.
PGM-VSR Visual Spatial Reasoning Transforms raw Visual Scene Descriptors, gesture vectors, and gaze cues into semantic outputs that Prompt Creation (PRC) uses to enhance User Text and to Domain Access (DAC) seeking additional information.
PGM-PRC Prompt Creation Transforms semantic inputs received from Context Capture (CXC) and Audio and Visual Spatial Reasoning (SPR) and, indirectly from Domain Access (DAC) as responses provided to SPR, into natural language prompts (PR-Prompts) to Basic Knowledge.
PGM-BKN Basic Knowledge A language model – not necessarily general-purpose – that receives the enriched PC Prompt Creation (PCR), Domain Access (DAC), User State Refinement (USR), and Personality Alignment (PAL) texts and converts into responses that are used by the various AIMs to gradually produce the Final Response.
PGM-DAC Domain Access Performs the following main functions:
–        Interprets the Spatial Outputs from SPR and any User-related semantic inputs (from User).
–        Selects and activates domain-specific behaviours to deal with the specific input from SPR and BKN.
–        Produces semantically enhanced outputs to SPR and BKN.
PGM-USR User State Refinement Modulates the Enhanced Response from BKN into a User State and Context-aware UR-Prompt, which is then sent to BKN.
PGM-PAL Personality Alignment Modulates the Refined Response into an A-User Personality Profile-aware PA-Prompt, which is then sent to BKN.
PGM-AUR A-User Rendering Receives the Final Response from BKN, A-User Personal Status from Personality Alignment (PAL), and Command from A-User Control and renders the A-User as a speaking Avatar.
PGM-AUC A-User Control The User Control AIM (PGM-USC) governs the operational lifecycle of the A-User though its AIMs and orchestrates its interaction with both the M-Instance and the human User.

5       Input/output Data of AI Modules

Table 5 provides acronyms, names, and links to the specification of the AI modules composing the PGM-AUA AIW and their input/output data. The current specification is tentative but is expected to evolve from input from Responses to the Call for Technologies.

Table 5 – Input/output Data of AI Modules

Acronym AI Module Receives Produces
PGM-AUC A-User Control Human Command Human Command Status
Process Action Response Process Action Request
Context Capture Status Context Capture Directive
Audio Action Status Audio Action Directive
Visual Action Status Visual Action Directive
Prompt Plan Status Prompt Creation Directive
BK Response Trace BK Query Directive
DA Action Status DA Action Directive
User State Status User State Directive
Personality Alignment Status Personality Alignment Directive
Rendering Status Rendering Directive
Action
PGM-CXC Context Capture Text Object Context
Context Capture Directive Context Capture Status
Audio Object
3D Model Object
Visual Object
PGM-ASR Audio Spatial Reasoning Context Audio Spatial Output
Audio Action Directive Audio Action Status
Audio Spatial Directive Audio Spatial Guide
PGM-VSR Visual Spatial Reasoning Context Audio Spatial Output
Visual Action Directive Visual Action Status
Visual Spatial Directive Visual Spatial Guide
PGM-PRC Prompt Creation Audio Spatial Guide PC-Prompt
Prompt Creation Directive Prompt Plan Status
Visual Spatial Guide
Context
PGM-DAC Domain Access Audio Spatial Output Audio Spatial Directive
Visual Spatial Output Visual Spatial Directive
Initial Response Personality Context Guide
DA Action Directive DA Action Status
Refined Context Guide
DA-Prompt
PGM-BKW Basic Knowledge PC-Prompt Initial Response
DA-Prompt Enhanced Response
UR-Prompt Refined Response
PA-Prompt Final Response
BK Query Directive BK Response Trace
PGM-USR User State Refinement Refined Context Guide Expressive State Guide
Enhanced Response UR-Prompt
User State Directive User State Status
PGM-PAL Personality Alignment Personality Context Guide A-User Personal Status
Refined Response PA-Prompt
Personality Alignment Directive Personality Alignment Status
Expressive State Guide
PGM-AUR A-User Rendering A-User Personal Status Avatar
Rendering Directive Rendering Status
Final Response

6       AIW, AIMs, and JSON Metadata

Table 6 provides the links to the AIW and AIM specifications and to the JSON syntaxes.

Table 6 – AIW, AIMs, and JSON Metadata

AIW AIMs Name JSON
PGM-AUA Autonomous User X
PGM-AUC A-User Control X
PGM-CXT Context Capture X
PGM-ASR Audio Spatial Reasoning X
PGM-VSR Visual Spatial Reasoning X
PGM-PRC Prompt Creation X
PGM-BKN Basic Knowledge X
PGM-DAC Domain Access X
PGM-USR User State Refinement X
PGM-PAL Personality Alignment X
PGM-AUR A-User Rendering X

<–References         Go to ToC          AI Modules–>