Prompt Creation: Where Words Meet Context

Leonardo Chiariglione
2025-11-26

The Prompt Creation module is the storyteller and translator in the Autonomous User’s “brain”. It takes raw sensory input – audio and visual spatial data of Context (as objects in a scene with their position, orientation and velocity) and the User State (rich description of the A‑User’s understanding of the “internal state” of the User) – and turns it into a well‑formed prompt that Basic Knowledge can actually understand and respond to.

We have already presented the system diagram of the Autonomous User (A-User), an autonomous agent able to move and interact (walk, converse, do things, etc.) with another User in a metaverse. The latter User may be an A-User or be under the direct control of a human and is thus called a Human-User (H-User). The A-User acts as a “conversation partner in a metaverse interaction” with the User.

This is the fifth of a sequence of posts aiming at illustrating more in depth the architecture of an A-User and provide an easy entry point for those who wish to respond to the MPAI Call for Technology on Autonomous User Architecture. The first four dealt with 1) the Control performed by the A-User Control AI Module on the other components of the A-User; 2) how the A-User captures the external metaverse environment using the Context Capture AI Module; 3) listens, localises, and interprets sound not just as data, but as data having a spatially anchored meaning; and 4) makes sense of what the Autonomous User sees by understanding objects’ geometry, relationships, and salience.

Prompt Creation is the storyteller and translator in the Autonomous User’s “brain.” It takes raw sensory input – audio and visual spatial data of Context and User State – and turns it into a well‑formed prompt that Basic Knowledge can sensibly understand and respond to.

The audio and visual components of Spatial Reasoning provide the information on things around the User such as “who’s in the room,” “what’s being said,” “what objects are present,” and “what’s the User doing”. Context Capture provides User State as a rich description of the A‑User’s understanding of the “internal state” of the User – which may be biologically real, if the User is a representation of a human, or simulated when the User is the representation of an agent. The task of Prompt Creation is to synthesise these sources of information into a PC‑Prompt Plan. This plan starts from what the User said, adds intent (e.g., “User wants help” or “User is asking a question”), includes the context around the User (e.g., “User is in a virtual kitchen”), and embeds User State (e.g., “User seems confused”).

This information is conveniently represented as a JSON object, converted into natural language, and passed to Basic Knowledge. Basic Knowledge then produces a natural language response called the Initial Response – initial because there are more processing elements in the A‑User pipeline that will refine and improve the answer before it is rendered in the metaverse.

Prompt Creation gives the AI a sense of narrative, so the A-User can:

Ask the right clarifying question.
Respond with relevance to the situation.
Adapt to the environment and User mood.
Maintain continuity across interactions.

If the User says: “Can you help me cook?”

Spatial Reasoning notes the User is in a virtual kitchen with utensils and ingredients.
User State suggests the User looks uncertain.
Prompt Creation combines these into: “User is asking for cooking help, is in a kitchen, seems unsure.”

This Initial Response is then passed to Domain Access, which may elaborate a new prompt enriched with domain-specific information (in this case “cooking”, when Basic Knowledge is not well informed about cooking).

Prompt Creation turns raw multimodal input and spatial information into meaningful prompts so the AI can think, speak, and act with purpose. It is the scriptwriter that ensures the A‑User’s dialogue is not only coherent but also contextually aware, emotionally attuned, and situationally precise.

What you can take away about Prompt Creation

Translates user speech into Language Model understandable prompts
Synthesises spatial data and User State
Detects User intent (e.g., help request, question)
Embeds environmental context (e.g., virtual kitchen)
Captures emotional cues (e.g., confusion, excitement)
Builds a structured PC-Prompt Plan as a JSON object to facilitate prompt creation
Converts PC-Prompt Plan into a natural language prompt
Passes the prompt to Basic Knowledge for response generation
Bridges perception and cognition for purposeful Language Model action

Cookie	Duration	Description
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Technical".
CookieLawInfoConsent	1 year	The cookie is set by the GDPR Cookie Consent plug-in and is used to store whether the user has consented to the use of cookies or not. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_pk_id.6.08a8	13 months	Used to store a few details about the user such as the unique visitor ID
_pk_ses.6.08a8	30 minutes	Short lived cookies used to temporarily store data for the visit

Prompt Creation: Where Words Meet Context

Notice