Highlights
- Presentation of Autonomous Users in metaverse
- New project for Autonomous Users in an MMM metaverse instance
- Meetings in the November 2025 cycle.
Presentation of Autonomous Users in metaverse
The Call for Technologies: Pursuing Goals in metaverse (MPAI-PGM) – Autonomous User Architecture (AUA) V1.0 addressing a new project leveraging MPAI competence and standards integrated by new technologies target of the Call will be presented online on 2025/11/17:
Register
New project for Autonomous Users in an MMM metaverse instance
Agentic AI, leveraging generative AI and autonomously performing tasks, making decisions, and interacting across digital environments have created a great uproar. MPAI does not follow trends and is not developing a project just because everybody talks about something. MPAI can separate wheat and chaff, realised that Agentic AI could well not be chaff, and started early investigation on the matter in 2024. the notion of Perceptive and Agentive AI (PAAI) introduced in Technical Report AIW and AIM implementation Guidelines (MPAI-WMG) V1.0 was a precursor of the major step forward made by MPAI at its 61st General Assembly (MPAI-61) publishing Call for Technologies: Pursuing Goals in metaverse (MPAI-PGM) – Autonomous User Architecture (AUA) V1.0.
This title is quite a mouthful and requires a few explanations.
Let’s start from the Technical Specification MPAI Metaverse Model – Technologies (MMM-TEC) V2.1 standard. MMM-TEC assumes that a metaverse instance (M-Instance) is populated by Processes acting on Items (any “thing” in the metaverse) and/or other Processes. Some Processes – called Users – represent humans: Human Users (H-Users) are directly operated by humans, while Autonomous Users (A-Users) have a high degree of operational autonomy. Both types of Users may be rendered as avatars called Personae. MMM-TEC specifies technologies enabling Users to perform a variety of Actions on things in an M-Instance – called Items – such as sensing data from the real world or moving Items in the M-Instance, possibly in combination with other Processes.
However, MMM-TEC was silent on how an A-User reaches the decision of performing an Action.
With its PGM-AUA Call for Technologies, MPAI intends to develop an architecture for an A-User where some of the technologies preceding the Action made by an A-User will be considered.
MPAI has a considerable repertory of technologies relevant to the PGM-AUA standard project but needs more technologies to achieve the project goal. The Call requests interested parties – irrespective of their membership in MPAI – to submit Responses that may enable MPAI to reach the goal of developing an A-User Architecture standard.
What is precisely the scope of this planned standard? An initial formulation is the following: PGM-AUA will specify functions and interfaces by which an A-User – having considerable or complete autonomy in an M-Instance – interacts with another User, either A-User or H-User. Therefore, the term “User” means “conversational partner in the metaverse, whether autonomous or human-driven”. The A-Users can capture text and audio-visual information originated by, or surrounding, the User, extract the User State, i.e., a snapshot of the User’s cognitive, emotional, and interactional state; produce an appropriate multimodal response, rendered as a Speaking Avatar; and move in the relevant virtual space.
An A-User requires various interacting technologies that should be integrated in a system. The basic idea is to use a relatively simple Large Language Model with language and reasoning capabilities and to add missing information by using different types of technologies concerning the User and its spatial context.
MPAI assumes that an A-User is implemented as an AI Workflow (AIW) composed of AI Modules (AIMs) as specified by Technical Specification: AI Framework (MPAI-AIF) V2.2. The AIF already provides the necessary infrastructure underpinning the AU-User on which the necessary technologies can be added.
The figure represents an initial diagram of the A-User architecture.
As a start, the A-User Control AI Module (AIM) activates the Context Capture AIM to perceive the metaverse environment assumed to be an audio-visual scene that includes a Persona, i.e., an avatar rendering the User. The result of the capture is called Context, a structured and time-stamped snapshot representing the initial understanding reached by the A-User of the environment (using MPAI Audio-Visual Scene Descriptors) and of the User State, a semantic representation of the User’s cognitive, emotional, and attentional posture within the environment.
The Spatial Reasoning AIM analyses Context and transforms the Audio-Visual Scene descriptors, including Audio and Visual Objects and their gesture vectors and gaze cues, an Audio and Visual Output that is sent to the Domain Access AIM seeking additional information.
Spatial Reasoning produces Audio and Visual Spatial Guides, User-focused representations of the audio and visual spatial context, such as sound source relevance, directionality, and proximity (Audio) and object relevance, orientation, proximity, and affordance (Visual). Audio and Visual Spatial Guides are used by the Prompt Creation AIM to enrich the User’s spoken or written input when creating the Prompt sent to the Basic Knowledge, a limited-complexity language model.

Basic Knowledge provides an Initial Response that is sent to Domain Access.
Domain Access interprets the Spatial Outputs with any User-related inputs received from Spatial Reasoning by activating behaviours appropriate to the domain, produces, and sends to:
- Spatial Reasoning Spatial Directives that help it interpret spatial scenes, resolve ambiguous entities, evaluate spatial feasibility, and generate primitives aligned with the User’s perceived and interaction goals.
- Basic Knowledge a semantically enhanced DA-Prompt.
- User State Refinement Adaptive Context Guide and Personality Adaptation Personality Context Guide, its improved Context understanding.

Basic Knowledge produces and sends to the User State Refinement AIM an Enhanced Response that utilises updated information about the User State.
User State Refinement refines its understanding of User State and produces and sends to:
- Basic Knowledge a UR-Prompt.
- Personality Alignment AIM an Expressive State Guide.
- Basic Knowledge produces and sends to Personality Alignment a Refined Response.

Personality Alignment formulates an A-User Personal Status (using MPAI-Personal Status) appropriate to the User State and produces and sends to:
- Basic Knowledge a PA-Prompt.
- A-User Rendering AIM its A-User Personal Status.

A-User Rendering produces a speaking Avatar using
- Text, Speech, Visual
- Command from A-User Control to shape the speaking Avatar.

The A-User Control AIM
- Drives A-User operation by controlling how the A-User interacts with the environment.
- Performs Actions and Process Actions based on the Rights it holds and the M-Instance Rules (as defined by MMM-TEC).

The complexity of the model has prompted MPAI to extend its Call for Technologies practice. In addition to the usual Framework Licence, Template for Responses, the Call references a Tentative Technical Specification, drafted as if it were an actual Technical Specification. Respondents to the Call are free to comment on, change, or extend this document or to propose anything else relevant to the Call but unrelated to this document.
Responses to this Call shall reach the MPAI Secretariat by 2026/01/21T23:59.
Register to attend the online presentation on 2025/11/17:
Meetings in the November 2025 cycle.
