July 2024 - MPAI community

Leonardo Chiariglione
2026-04-06

The novelties in the MPAI Metaverse Model – Technologies V2.2 standard

MPAI started addressing “metaverse” as a subject for standardisation in the early days of 2023 as an objective per se but also as an opportunity to integrate a plurality of technologies for which it had developed or was developing standards. After developing two Technical Reports exploring the issues, MPAI developed the MPAI Metaverse Model – Architecture (MMM-ARC) standard. This was followed by the MPAI Metaverse Model – Technologies (MMM-TEC) standards of which MPAI-66 has recently published Version 2.2.

In three years MPAI has invested significant resources in this project achieving important results. The initial MMM notions included things (Items) and beings (Processes) operating in a metaverse instance (M-Instance) under the responsibility of a human in conformity with the Rules of the M-Instance. Processes hold Rights on Items and Processes and can thus perform Actions on them. A Process may request another Process (e.g., a Service) to perform Actions on its behalf by issuing a Process Action. Twenty-eight Actions have been specified in terms of semantics, and more than 100 Items have been specified in terms of syntax (JSON Schema) and semantics. More than 50% of Items currently identified by MMM-TEC are defined by other MPAI Standards. The notion of Qualifier – metadata giving information such as format of an Item – has been fully adopted and integrated with the notion of a “Convert Service” to facilitate M-Instance interoperability as MMM-TEC does not impose any particular technology, such as for media.

MMM-TEC V2.2 inherits all the enabling technologies from preceding versions and enhances them with new technologies that provide full support for the establishment of virtual economies in an M-Instance.

This is the full list of finance-related Data Types specified by V2.2:

Acronym	Name	JSON	Acronym	Name	JSON
MMM-ASS	Asset	X	MMM-MKC	Market Classes	X
MMM-CTO	Contract Object	X	MMM-PRV	Provenance	X
MMM-CUO	Currency Object	X	MMM-SPM	Service Pricing Model	X
MMM-FBR	Fault Behaviour Report	X	MMM-SCT	Simple Contract	X
MMM-FDR	Fault Detection Report	X	MMM-TRA	Transaction	X
MMM-FER	Financial Error Report	X	MMM-VAL	Value	X
MMM-LIC	Licence	X	MMM-VMI	Value Metadata IDs	X
MMM-MPI	Marketplace Policy IDs	X	MMM-WAL	Wallet	X

All MPAI specification entities – Data, AI Modules, and AI Workflows – are identified by six characters. The first three identify the standard (MMM in the case of Metaverse) and the remaining three the specific entity within that standard. Each entity (Data Types in this case) is specified in natural language on a specific web page (the table provides the links) and in a JSON Schema that may reference a Qualifier.

MPAI defines a “Simple Contract” that provides basic functionalities but also defines a Contract Object. The term “Object” indicates that the entity is composed of Data and a Qualifier. In the case of Contract, the Data are the contract, but the Qualifier specifies how to interpret the Data. MPAI acts as a Registration Authority that receives requests for new Contract Types not currently included in the Contract Qualifier.

The Asset Data Type is the key element of MMM-TEC finance. As for all MPAI Data Types, the natural language specification is subdivided into Definition (what the Data Type is for), Functional Requirements (the functionalities offered by the Data Type), Syntax (the JSON Schema), Semantics (the meaning of the data carried by an Asset Data Type instance), and Conformance Testing (how to test that an Asset Data Type instance is a correct implementation of the specification).

Let’s analyse what is inside the Asset Data Type, i.e., its semantics.

Label	Description
Header	Asset Header – Standard “MMM-ASS.Vx.y”
M-InstanceID	Identifier of M-Instance.
M-Environment	Identifier of a relevant Environment.
AssetID	Identifier of the Asset.
SourceItemID	Identifier of the Source Item that spawned the Asset.
AssetDate	Timestamp of Asset creation.
Capabilities	Declared process Capabilities and Rights (the Asset carries information on who holds which Rights to the Asset).
Provenance	Information about the Asset provenance through a value chain.
MarketClass	One of a set of categories of assets, rights, services, and experiences exchanged.
ValueMetadata	Identifiers used to tag values associated with transactional or functional attributes.
CurrencyID	Allowed currency type for pricing.
ServicePricingModel	Rules, parameters, and conditions under which the Asset is offered, billed, chosen, settled, and accessed.
MarketplacePolicyID	Identifier of the policy applied to listing, transaction, or operation of an Asset (Item or Service).
DataExchangeMetadata	Metadata ensuring correct transfer of information from Source to Destination.
Trace	Identity of the Process producing the Asset and its Time of production.
DescrMetadata	Free-text metadata.

Of the elements in the first column, we will now concentrate on Service Pricing Model (SPM), a Data Type of high importance for establishing a virtual economy in an M-Instance. The following semantic table is much more structured, and so the specification of the different service pricing models has been simplified. The Asset Posting, Chosen, and Settlement parts retain the full details. The SPM Status label is used to signal the intermediate or final status of the Service Pricing Model.

Label	Description
Header	Service Pricing Model Header – Standard “MMM‑SPM‑Vx.y”.
MInstanceID	Identifier of the M‑Instance.
MEnvironmentID	Identifier of the M‑Environment.
SPMID	Identifier of this Service Pricing Model instance.
SPMContext	Whether this SPM describes a Service or an AssetPosting.
SPMTime	Reference time for this SPM (OSD/V1.5 Time).
ModelType	Primary pricing model: OneTime, Subscription, PayPerUse, PayPerTime, Freemium, Tiered, AdSupported, Hybrid (when combining multiple models).
CurrencyObject	The currency for prices and values used in this SPM.
Models
├─ OneTime	One‑time purchase model.
├─ Subscription	Subscription pricing data.
├─ PayPerUse	Usage‑based (metered) pricing.
├─ PayPerTime	Time‑window pricing.
├─ Freemium	Freemium model.
├─ Tiered	Tiered plan.
├─ AdSupported	Ad‑supported access.
├─ Discounts[]	Discount definitions.
└─ Hybrid[]	Hybrid composition (combining sections).
AssetPosting	Data for posting an Asset under this SPM.
├─ AssetID	ID of the asset being posted.
├─ LicenceTerms	Terms under which the asset is licensed.
├─ SenderPreValue	Value before transaction.
├─ SenderPostValue	Value after transaction.
├─ ReceiverPostValue	Value after transaction for receiver.
├─ ValueToSender	Final value to sender.
├─ SenderWalletID	Wallet of sender.
├─ ServiceProviderWalletID	Wallet of service provider (marketplace).
├─ ServiceProviderLicence	Licence of service provider.
├─ ReceiverLicence	Licence of receiver.
└─ Transaction	Transaction.
Chosen	Frozen snapshot of the chosen model and parameters.
├─ ModelType	Chosen model type.
└─ Parameters	Set of parameters at selection time.
├─ EffectivePeriod	Validity window.
│ ├─ Start	Start time.
│ └─ End	End time.
├─ Allowances[]	Frozen quota allowances.
│ ├─ Meter	Meter type.
│ ├─ Unit	Unit of measure.
│ ├─ Quantity	Quantity allowed.
│ ├─ Window	Applicable window label.
│ └─ Rollover	Whether unused quota rolls over.
├─ Overage	Overage pricing.
│ ├─ Rate	Overage rate.
│ └─ Unit	Overage unit.
├─ RateLimit	Rate limiting constraints.
│ ├─ MaxPerWindow	Maximum allowed per window.
│ ├─ Window	Window size.
│ └─ Burst	Burst allowance.
└─ PriceBreakdown	Optional breakdown of base price, discounts, and final charges.
Settlement	Payment proof for the SPM‑governed transaction.
├─ SettlementTime	Time of settlement.
├─ TargetID	Identifier of relevant Service/Asset/Licence.
├─ Transaction	Transaction reference.
└─ Evidence[]	Optional receipts or provider references (Type, ID).
SPMStatus	Either “Model” or “Final”.
DataExchangeMetadata	Regulated/controlled exchange metadata.
Trace	Information about the Process producing the SPM and the time of production.
DescrMetadata	Free‑text descriptive metadata (≤ 2048 chars).

Version 2.2 specifies the protocols used by a Process when it requests another Process to perform a Process Action.

Here we describe the MM-Add protocol used by a Process to request a Locate Service to place an Item at a Location.

User sends MM-Add Process Action (PA) Request including Item, Point of View, Location, and Rights (Status=Model).
1. If MM-Add is a free service, goto MM-Add.
2. If MM-Add is a pay service:
  1. User sends MM-Add PA Request with Service Pricing Model (Status=Model) to the Locate Service of which MM-Add is an element.
  2. Locate sends MM-Add PA Response:
    1. If MM-Add PA Response includes Status=Err, goto End
    2. If MM-Add PA Response includes Status=Ack and Service Pricing Model including Transaction (both Status=Model), User:
      1. Transacts Value contained in Transaction.
      2. Sends MM-Add PA Request with Service Pricing Model (Status=Model) including Transaction (Status=Final) to MM-Add.
    3. MM-Add: Locate
      1. MM-Adds Item.
      2. Sends MM-Add PA Response (to requesting Process) including
        
        Rights (Status=Final) for MM-Added Item.
        
        Service Pricing Model (Status=Final), if MM-Add is a pay service.
      3. End.

As you see, the Service Pricing Model is used as the vehicle to finalise the relationship between the Requesting Process and the Locate Service.

MMM-TEC V2.2 will be presented at the online event held on on 9 April 2026 at 16 UTC. Register here to attend.

No Comments InAll posts

Leonardo Chiariglione
2026-03-21

MPAI as a Service (MaaS) for a new generation of intelligent services

The 66^th MPAI General Assembly (MPAI-66) has approved the publication of the “MPAI as a Service” Call for Technologies. To get a proper understanding of the positioning of this new standard in the MPAI Ecosystem, we should recall the basic elements of the AI Framework (MPAI-AIF) and the Governance of the MPAI Ecosystem (MPAI-GME) standards. The former specifies an environment where it is possible to initialise, dynamically configure, and control AI applications called AI Workflows (AIW) composed of connected processing elements called AI Modules (AIM). MPAI-AIF specifies two profiles – a Basic and a Security Profile.

Figure 1 depicts the MPAI-AIF Basic Profile Reference Model. You can see the Controller – the brain of the system – and the MPAI-AIF APIs enabling the Controller to obtain AIWs/AIMs from the MPAI Store, the place where implementers can submit their implementations for distribution after they have been tested for conformance with the standard and verified for security. Once the AI Framework is equipped with the desired domain-specific processing capabilities, the User Agent can activate the Controller, and the AIMs can call it via the appropriate APIs.

Figure 1 – Reference Model of MPAI-AIF Basic Profile

Let’s explore how things can unfold in this new scenario.
1 Creation of infrastructure

Creation of infrastructure is the responsibility of the deployment/control plane to avoid access to the application data plane by the control plane and vice-versa. The REST API protocol is used to specify the steps.

1.1 Connection to the SCI

SCI specifies the required security protocols that the RCA must employ for authentication and authorisation purposes. AIF should include an exemplary list of security protocols (basic, digest, bearer). The connection is required by all subsequent points and must be secured using one of the proposed security schemes described in End Point Open API.

1.2 Creation of an SCI

RCA asks the AIF end point for the creation of one or more SCIs to which all subsequent AIF API requests will be issued. The objective of SCI creation is the acquisition of an SCI identity for use in subsequent API requests to identify the intended SCIs among the many to which the message will be directed.

1.3 Workflow discovery

RCA submits a request to the Server API for AIW matching and discovery. The resulting collection of Workflow Descriptions is returned to the RCA for ultimate selection.

1.4 Launch of the desired AI Workflow

RCA submits a request to the SCI through the AIF end point for the launch of the desired AIW(s). The objective of Workflow launch is the acquisition of a Remote Workflow Instance (RWI) identity for use in subsequent API requests for identification of the intended AIW among the many with which input/output messages will be exchanged.

2 Message Exchange

Application data exchange is the responsibility of the application data plane thus ensuring non-exposure of application data to the control plane. The REST API protocol is used to specify the steps.

2.1 Delivery of messages to the input ports of the AI Workflow

RCA submits requests to the above-identified SCI, through the AIF end point for the delivery of AIF Messages containing application data to the desired input port(s) of the RWI(s).

2.2 Reception of messages from the output ports of the AI Workflow

RCA may submit requests to the above-identified SCI through the AIF end point for the reception of AIF Messages from the desired output port(s) of the launched RWI(s). The RCA makes provision for asynchronous delivery of the response when required.

2.3 Termination of infrastructure

The deployment/control plane is responsible for the avoidance of access to the application data plane by the control plane and vice-versa. The REST API protocol is used to specify the steps.

3 Termination

3.1 Termination of the AI Workflow

RCA submits requests to the SCI through the AIF end point for the termination of the RWI(s).

3.2 Release of the AIF Controller

RCA submits requests to the AIF end point for the termination of the above-identified SCI(s).

Figure 2 depicts an initial Reference Model of MPAI as a Service.

Figure 2 – MPAI as a Service Reference Model

An overview of the complete workflow is given by:

The RCA issues a request through the API Client to the API Server for the creation of an SCI.
The API Server acts as a local User Agent of a Controller.
The API Server returns the ID (created by the API server) of the newly created SCI to RCA.
The RCA issues a request via the API Client through the API Server to the indicated SCI for the instantiation of a named AIW (RWI).
The SCI retrieves the named AIW metadata (describing the AIW) from the MPAI Store and then parses, retrieves, and installs the referenced packages as required for the instantiation of the AIW.
The MPAI Store receives requests from the SCI for delivery of AIW metadata and the subordinate packages that collectively describe the complete AIW.
The MPAI Store returns the requested elements if it possesses them, otherwise it issues requests to the appropriate remote repositories so as to retrieve the missing elements. The MPAI Store could be:
1. As simple as a stand-alone web server responding to HTTP Get requests.
2. Based on a distributed file system management service, such as HDSF and other variations.
3. Based on a standard cloud object management and delivery service, such as Amazon S3 or Open Stack Swift.
4. Fronted by an object authenticity management framework, such as The Update Framework.
5. Any combination or variation of the above.
The API Server returns the AIW ID, which was provided by the SCI to the RCA.
The RCA issues a request via the API Client through the API Server to the indicated SCI for delivery of the accompanying input data message to the specified Port of an AIM of the indicated AIW.
The RCA issues a request via the API Client through the API Server to the indicated SCI for reception of an output data message from the specified Port of an AIM of the indicated AIW.
The API Server returns to the RCA the output data message received from the specified Port of an AIM, which was provided by the indicated SCI.
The RCA issues a request via the API Client through the API Server to the indicated SCI for the termination of the RWI.
The RCA issues a request via the API Client to the API Server for the termination of the indicated SCI.

To leverage the availability of AIMs and AIWs from various sources, MaaS requires that:

The access to the MPAI Store be ubiquitous to support envisaged application scenarios.
The highest level of authorisation be guaranteed by the SCI to an RCA when accessing AI Workflows and their constituent components.
The highest level of authenticity control be exercised by the SCI on AI Workflows and their constituent packaged components.\

MPAI-66 has issued a Call for Technologies requesting interested parties to propose:

An architecture for the management of the MPAI Store and the subordinate distributed repositories.
Protocol(s) that are considered suitable for supporting the above requirements.
Alternatively, a single interface enabling SCIs to access a plurality of repositories each supporting different protocols.
If needed, proposals for revision of the MPAI-AIF Basic API, to accommodate requirements of the proposed technologies.

Solutions proposed may be original, or rely on existing technologies, or be any integration thereof.

The MaaS Call will be presented at two online events held on 2026/03/30 at 8:00 UTC (register here to attend) and 15:00 UTC (register here to attend).

No Comments InAll posts

Leonardo Chiariglione
2026-03-18

MPAI publishes the “MPAI as a Service” Call for Technologies and the MPAI Metaverse Model V2.2 standard for Community Comments

Geneva, Switzerland – 18^th March 2026. MPAI – Moving Picture, Audio and Data Coding by Artificial Intelligence – the international, non-profit, unaffiliated organisation developing AI-based data coding standards – has concluded its 66^th General Assembly (MPAI-66) publishing a Call for Technologies regarding the planned “MPAI as a Service” (MaaS) standard and MPAI Metaverse Model – Technologie (MMM-TEC) V2.2.

MPAI as a Service is the code name of the target service that would make it possible for a client application to access sophisticated AI processing services by launching an AI Workflow downloaded from the MPAI Store by a remote MPAI AI Framework (MPAI-AIF) processing environment. The Call requests technologies that leverage the MPAI-AIF Application Programming Interface for the reduction of the MPAI Store traffic when downloading AI Workflow components, such as bulky neural network models, from the required certified distributed repository servers.

The next steps are:

Register to attend the 30 March online presentations of the Call at 8:00 and 15:00
Submit a response to the Call to the MPAI Secretariat by 13 April at 16 UTC.
Start the MPAI as a Service standard development on 15 April (MPAI-67).

MPAI-66 has also reached another important milestone with the publication of Version 2.2 of the now well-established MPAI Metaverse Model – Technologies (MMM-TEC) standard. This new version adds a major set of technologies enabling the deployment of a sophisticated virtual economy and a new open-source implementation of MMM-TEC based on OpenSimulator. The standard is published with a Request for Community Comments until 11 May. Learn about the opportunities offered by the new MMM-TEC version by registering to attend the public online presentation on 9 April at 15:00 UTC.

MPAI is continuing the development of its work plan that involves the following activities:

AI Framework (MPAI-AIF): developing a Call for Technologies to extend the MPAI-AIF standard to enable a Remote Client Application to access a remote MPAI-AIF Controller, download and execute an AI Workflow, and access the result of the AIW processing.
AI for Health (AIH-HSP): reviewing the specification of a system receiving and processing licenses AI Health Data and enabling clients to improve health processing models via federated learning.
Context-based Audio Enhancement (CAE-USC): developing the Audio Six Degrees of Freedom (CAE-6DF) and the Audio Object Rendering (CAE-AOR) standards.
Connected Autonomous Vehicle (CAV-TEC): developing a new version of the flagship specification CAV-TEC with security support.
Compression and Understanding of Industrial Data (CUI-CPP): developing a reference software implementation of CUI-CPP V2.0.
End-to-End Video Coding (MPAI-EEV): exploring the potential of AI-based End-to-End Video coding in compressing video sequences.
AI-Enhanced Video Coding (MPAI-EVC): exploring new standards that benefit from the use of Super Resolution filters.
Governance of the MPAI Ecosystem (MPAI-GME): operating the MPAI Ecosystem per the MPAI-GME Specification.
Human and Machine Communication (MPAI-HMC): exploring the use of AI in human-to-machine and machine-to-machine communication.
Multimodal Conversation (MPAI-MMC): developing specifications of new data types especially in the context of the PGM-AUA standard.
MPAI Metaverse Model (MMM-TEC): developing V2.2 of MMM-TEC with capabilities enabling virtual metaverse economies.
Neural Network Watermarking (NNW-TEC): Refining the new Neural Network Watermarking (MPAI-NNW) – Technologies (NNW-TEC) standard published for Community Comments.
Object and Scene Description (MPAI-OSD): developing specifications of new data types especially in the context of the PGM-AUA standard.
Portable Avatar Format (MPAI-PAF): discussing the impact of MPAI standards planned or under development on MPAI-PAF V1.5.
AI Module Profiles (MPAI-PRF): extending the scope of the current version of AI Module Profiles.
Server-based Predictive Multiplayer Gaming (MPAI-SPG): exploring new standard opportunities in the domain.
Data Types, Formats, and Attributes (MPAI-TFA) extending the standard to data types used by MPAI standards that are planned or under development.
XR Venues (XRV-LTP): developing the standard for improved execution of Live Theatrical Performances using AI.

Legal entities and representatives of academic departments supporting the MPAI mission and able to contribute to the development of standards for the efficient use of data can become MPAI members. New members joining before 31^st December 2025 have their membership extended until 31^st December 2026.

Please visit the MPAI website, contact the MPAI Secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook, Instagram, and YouTube.

No Comments InAll posts

Leonardo Chiariglione
2026-02-19

MPAI request Community Comments on its Neural Network Watermarking Technologies V1.0 standard

Geneva, Switzerland – 18^th February 2026. MPAI – Moving Picture, Audio and Data Coding by Artificial Intelligence – the international, non-profit, unaffiliated organisation developing AI-based data coding standards – has concluded its 65^th General Assembly (MPAI-65) publishing the Neural Network Watermarking – Technologies (NNW-TEC) V1.0 standard with a request for Community Comments.

Technical Specification: Neural Network Watermarking – Technologies (NNW-TEC) V1.0 assesses specific NN Traceability technologies with respect to Imperceptibility, Robustness, and Computational Cost using the methodologies specified by the previously approved Neural Network Watermarking – Traceability (NNW-NNT) V1.1 standard. NNW-TEC offers the industry a path to obtain results of Imperceptibility, Robustness, and Computational Cost evaluations for specific Neural Network Traceability Technologies based on standard evaluation methods.

There are several ways to know more about the standards:

Register to attend the public online presentation of the standard on 10 March 2026 at 15 UTC.
Read the Neural Network Watermarking – Technologies (NNW-TEC) V1.0 draft standard.
Read a short Introduction to NNW-TEC V1.0.

Comments of NNW-TEC V1.0 shall reach the secretariat by 13 April 2026.

MPAI-65 has also decided to publish the Company Performance Prediction (CUI-CPP) V2.0 standard in final form.

MPAI is continuing the development of its work plan that involves the following activities:

AI Framework (MPAI-AIF): developing a Call for Technologies to extend the MPAI-AIF standard to enable a Remote Client Application to access a remote MPAI-AIF Controller, download and execute an AI Workflow, and access the result of the AIW processing.
AI for Health (AIH-HSP): revising developing the specification of a system receiving and processing licenses AI Health Data and enabling clients to improve health processing models via federated learning.
Context-based Audio Enhancement (CAE-USC): developing the Audio Six Degrees of Freedom (CAE-6DF) and the Audio Object Rendering (CAE-AOR) standards.
Connected Autonomous Vehicle (CAV-TEC): developing a new version of the flagship specification CAV-TEC with security support.
Compression and Understanding of Industrial Data (CUI-CPP): developing a reference software implementation of CUI-CPP V2.0.
End-to-End Video Coding (MPAI-EEV): exploring the potential of AI-based End-to-End Video coding in compressing video sequences.
AI-Enhanced Video Coding (MPAI-EVC): exploring new standards that benefit from the use of Super Resolution filters.
Governance of the MPAI Ecosystem (MPAI-GME): operating the MPAI Ecosystem per the MPAI-GME Specification.
Human and Machine Communication (MPAI-HMC): exploring the use of AI in human-to-machine and machine-to-machine communication.
Multimodal Conversation (MPAI-MMC): developing specifications of new data types especially in the context of the PGM-AUA standard.
MPAI Metaverse Model (MMM-TEC): developing V2.2 of MMM-TEC with capabilities enabling virtual metaverse economies.
Neural Network Watermarking (NNW-TEC): Refining the new Neural Network Watermarking (MPAI-NNW) – Technologies (NNW-TEC) standard published for Community Comments.
Object and Scene Description (MPAI-OSD): developing specifications of new data types especially in the context of the PGM-AUA standard.
Portable Avatar Format (MPAI-PAF): discussing the impact of MPAI standards planned or under development on MPAI-PAF V1.5.
AI Module Profiles (MPAI-PRF): extending the scope of the current version of AI Module Profiles.
Server-based Predictive Multiplayer Gaming (MPAI-SPG): exploring new standard opportunities in the domain.
Data Types, Formats, and Attributes (MPAI-TFA) extending the standard to data types used by MPAI standards that are planned or under development.
XR Venues (XRV-LTP): developing the standard for improved execution of Live Theatrical Performances using AI.

No Comments InAll posts

Leonardo Chiariglione
2026-02-04

Improved Health Services with AI

The 64^th MPAI General Assembly has approved publication of Technical Specification: AI for Health (MPAI-AIH) – Health Secure Platform (AIH-HSP) V1.0 with a request for Community Comments to be received by the MPAI Secretariat by 16 March 2026. This paper gives an overview of the proposed standard introduction to help those who wish to review and comment on AIH-HSP.

The Health Secure Platform specifies the architecture of a platform offering health-related services enabling the following functionalities:

End Users use AIH-HSP Apps running on their Front Ends (personal devices) to acquire Health Data.
Health Data, combined with an associated Model Licence, are called AIH Data.
AIH Data is uniquely identified.
AIH Data is processed by the Front End using an instance of the MPAI-specified AI Framework (MPAI-AIF).
Front End processes AIH Data using AI-for-Health-recommended AI Modules (AIM) downloaded from the MPAI Store.
Neural Networks in AIMs continually learn while making inferences on AIH Data.
Un-processed and Processed AIH Data are uploaded to the AI Back End.
Back End stores the Model Licence as a Smart Contract on a Blockchain associated with the Back End.
A Smart Contract ID is added to the AIH Data.
The Smart Contract governs the use that is made of the AIH Data stored on the Back End.
Depending on the relevant Smart Contract, an instance of AIH Data stored on the Back End may be processed by the Back End itself and Third-Party Users.
The Back End may process End Users’ AIH Data in its local AI Framework based AI Data Processing AIM.
A rich AIH Taxonomy includes:
1. AIH Data Classes (currently: ECG, EEG, Genomics, and Medical Images).
2. AIH Data Users (currently: End User, Non-Profit Entity, Profit Entity, Clinical Entity, Authorised Entity, Caregiver).
3. AIH Data Statuses (currently: Anonymised, Pseudonymised, Identified).
4. AIH Data Usages (currently: Unrestricted, Pseudonymised, Anonymised, Research, Patient use, Health care).
5. AIH Data Processing Types (currently: ECG, EEG, Genomics, Medical Images).
6. Anonymisation/De-Identification Algorithms.
7. Anomaly Types.

Figure 1 depicts the Health Secure Platform specified by AI for Health. At the centre there is the Back End to which Front Ends and Third-Party Users are connected. The MPAI Store enables Back End and Front Ends to access the AI Modules they need for their processing. The Blockchain manages the licencing terms provided to it by the Model Licence.

Figure 1 – General Model of AIH-HSP V1.0

Figure 2 depicts the architecture of the AIH Back End where Back End, End User, Blockchain, and Third-Party Users perform operations.

Figure 2 – Reference Model of the Health Back End (AIH-HBE) AIW

Back End accesses the MPAI Store and downloads the AIMs required for its operation.
User Registration
1. A User wishing to access the Back End, sends a Registration Request containing Personal Profile and list of Service they intend to access.
2. Back End provides the Tokens enabling the requesting User to access the corresponding Services.
Storage of AIH Data
1. End User uploads AIH Data.
2. HBE Data Processing
  1. Extracts Model Licence from AIH Data.
  2. Issues Blockchain Licence Request to Blockchain.
3. Blockchain
  1. Converts Model Licence to a Smart Contract.
  2. Responds with a Blockchain Licence Request.
4. HBE Data Processing
  1. Attaches Blockchain Licence ID to AIH Data.
  2. Stores AIH Data in Secure Storage
5. De-Identification/Anonymisation (DIA) of AIH Data
  1. End User sends a DIA Request.
  2. HBE Data Processing
    1. Retrieves relevant AIH Data from Secure Storage.
    2. (Pseudo-)Anonymises AIH Data.
    3. Stores (Pseudo-)Anonymised AIH Data back to Secure Storage.
    4. Responds with a DIA Response.
  3. AIH Data Processing
    1. User sends AIH Process Request.
    2. HBE Data Processing sends a Licence Confirm Request to the Blockchain.
    3. Blockchain responds with a Licence Confirm Response.
    4. HBE Data Processing
      1. Performs the requested Processing, if this is included in the Licence.
      2. Stores the Processed AIH Data as new AIH Data.
      3. Responds with an AI Data Process Response.
    5. Audit
      1. End User sends Audit Request.
      2. Auditing
        
        Retrieves relevant Confirmation Responses to verify that all Processing was performed according to the Licence terms.
        
        Responds with Audit Response.
      3. Federated Learning
        
        Federated Learning sends Federated Learning Request to all Health Front Ends.
        
        Health Front Ends provide the NN Models.
        
        Federated Learning
        
        Develops and upload the new NN Model to the MPAI Store.
        
        Sends Federated Learning Response to Health Front Ends.
        
        Front Ends download the new NN Model from the MPAI Store.

Figure 3 depicts the Reference Architecture of the Health Front End (AIH-HFE) where Front End and End User perform operations.

Figure 3 – Reference Model of the Health Front End (AIH-HFE) AIW

End User registers with HFE and HBE.
End User acquires Health Data with a Health Device and provides Model Licence.
Model Licencing AIM attaches Model Licence to Health Data, produces AIH Data and Stores AIH Data.
End User processes AIH Data locally.
End User stores AIH Data to HFE.
End User processes AIH Data remotely on the Back End.
HFE receives Federated Learn request.
HFE sends the NN Model trained since last Federated Learn request to HBE.

The AIH-HSP V1.0 standard is available. An online presentation will be made on 2026/02/09 T15 UTC. Register to attend.

Comments on AIH-HSP V1.0 shall reach the MPAI Secretariat by 2026/03/16.

No Comments InAll posts

Leonardo Chiariglione
2026-01-21

MPAI publishes the AI for Health V1.0 standard with a Request for Community Comments

Geneva, Switzerland – 21^st January 2026. MPAI – Moving Picture, Audio and Data Coding by Artificial Intelligence – the international, non-profit, unaffiliated organisation developing AI-based data coding standards – has concluded its 64^th General Assembly (MPAI-64) publishing the AI for Health V1.0 standard.

Technical Specification: AI for Health (MPAI-AIH) – Health Secure Platform (AIH-HSP) V1.0 envisages that AIH-HSP subscribers use personal devices running AI Framework implementations (front ends) to locally process and submit health data to the AIH-HSP back end. A licence attached to health data specifies the types of processes that the back end or specific organisations called Third-Party Users may perform and the type of use they can make of the process data. The back end is connected to a blockchain storing the smart contracts governing the processes that the back end and the Third-Party Users can perform on subscriber data. From time to time, the backend collects neural networks trained by front ends and distributes an updated neural network used by AIH-HSP subscribers with the knowledge acquired by the community using Federated Learning.

There are several ways to know more about the standards:

Register to attend the public online presentation of the standard on 9 February 2026 at 15 UTC.
Read AI for Health (MPAI-AIH) – Health Secure Platform (AIH-HSP) V1.0
Read a short Introduction to AIH-HSP V1.0.

MPAI is also publishing V2.4 of Context-based Audio Enhancement (MPAI-CAE) – Use Cases (CAE-USC).

MPAI is continuing the development of its work plan that involves the following activities:

AI Framework (MPAI-AIF): extending the MPAI-AIF specification to enable a client to access a remote MPAI-AIF Controller and an AI Module to communicate data to another AIM with associate metadata.
AI for Health (AIH-HSP): developing the specification of a system receiving and processing licenses AI Health Data and enabling clients to improve health processing models via federated learning.
Context-based Audio Enhancement (CAE-USC): developing the Audio Six Degrees of Freedom (CAE-6DF) and the Audio Object Rendering (CAE-AOR) specifications.
Connected Autonomous Vehicle (CAV-TEC): developing a new version of the flagship specification CAV-TEC with security support.
Compression and Understanding of Industrial Data (CUI-CPP): expecting comments on the Company Performance Prediction V2.0 specification.
End-to-End Video Coding (MPAI-EEV): exploring the potential of AI-based End-to-End Video coding in compressing video sequences.
AI-Enhanced Video Coding (MPAI-EVC): exploring new standards that benefit from the use of Super Resolution filters.
Governance of the MPAI Ecosystem (MPAI-GME): operating the MPAI Ecosystem per the MPAI-GME Specification.
Human and Machine Communication (MPAI-HMC): exploring the use of AI in human-to-machine and machine-to-machine communication.
Multimodal Conversation (MPAI-MMC): developing specifications of new data types especially in the context of the PGM-AUA standard.
MPAI Metaverse Model (MMM-TEC): developing security-enabling protocols in the MMM-TEC specification.
Neural Network Watermarking (NNW-TEC): Developing the new Neural Network Watermarking (MPAI-NNW) – Technologies (NNW-TEC) including assessments of Neural Network Traceability Technologies.
Object and Scene Description (MPAI-OSD): developing specifications of new data types especially in the context of the PGM-AUA standard.
Portable Avatar Format (MPAI-PAF): discussing the impact of MPAI standards planned or under development on MPAI-PAF V1.5.
AI Module Profiles (MPAI-PRF): extending the scope of the current version of AI Module Profiles.
Server-based Predictive Multiplayer Gaming (MPAI-SPG): exploring new standard opportunities in the domain.
Data Types, Formats, and Attributes (MPAI-TFA) extending the standard to data types used by MPAI standards that are planned or under development.
XR Venues (XRV-LTP): developing the standard for improved execution of Live Theatrical Performances using AI.

Legal entities and representatives of academic departments supporting the MPAI mission and able to contribute to the development of standards for the efficient use of data can become MPAI members.

No Comments InAll posts

Leonardo Chiariglione
2026-01-08

A walk inside the Autonomous User

Table of Content

A standard for the Autonomous User Architecture
A-User Control: The Autonomous Agent’s Brain
Context Capture: The A-User’s First Glimpse of the World
Audio Spatial Reasoning: The Sound-Aware Interpreter
Visual Spatial Reasoning: The Vision‑Aware Interpreter
Prompt Creation: Where Words Meet Context
Domain Access: The Specialist Brain Plug-in for the Autonomous User
Basic Knowledge: The Generalist Engine Getting Sharper with Every Prompt
User State Refinement: Turning a Snapshot into a Full Profile
Personality Alignment: The Style Engine of A-User
A-User Formation: Building the A-User

A standard for the Autonomous User Architecture

MPAI has developed 15 standards to facilitate componentisation of AI applications. One of them is MPAI Metaverse Model – Architecture (MMM-TEC) currently at Version 2.1. MMM-TEC assumes that a Metaverse Instance (M-Instance) is populated by Processes performing Actions on Items either directly or indirectly by requesting another Process to perform Actions on their behalf. The requested Process performs the Action if the requesting and requested Processes have the Rights to do. Process Action is the means for a Process to make requests.

A particularly important Process is the User. This may be driven directly by a human (in which case it is called H-User) or may operate autonomously (in which case it is called A-User) performing Actions and requesting Process Actions.

MMM-TEC provides the technical means for an H-User to act in an M-Instance. An A-User can use the same means to act but it currently does not provide the means to decide what (Process) Actions to perform. Such means are vitally important for an A-User to achieve autonomous agency and thus make M-Instances more attractive places for humans to visit and settle.

After long discussions, MPAI has initiated the Performing Goal in metaverse (MPAI-PGM) project. The first subproject is called Autonomous User Architecture (PGM-AUA). Currently, this includes the following documents: a Call for Technologies (per the MPAI process, a technical standard based on the responses received to a Call) accompanied by Use Cases and Functional Requirements (what the standard is expected to do), Framework Licence (guidelines for the use of Essential IPR of the Standard), and a recommended Template for Responses.

The complexity of the PGM-AUA project has prompted MPAI to develop a Tentative Technical Specification (TTS). This uses the style but is NOT an MPAI Technical Specification. It has been developed as a concrete example of the goal that MPAI intends to eventually achieve with the PGM-AUA project. Respondents to the Call are free to comment on, change, or extend TTS or to propose anything else relevant to the Call whether related or not to the TTS.

Let’s have a look at the Tentative Architecture of Autonomous User.

Anybody is entitled to respond to the Call. Responses shall be submitted to the MPAI Secretariat by 2026/01/21T23:59.

In the following, an extensive techno-conversational description of the TTS is developed.

A-User Control: The Autonomous Agent’s Brain

A-User Control is the general commander of the A-User system making sure the Avatar behaves like a coherent digital entity aware of the rights it can exercise in an instance of the MPAI Metaverse Model – Architecture (MMM-TEC) standard. The command is actuated by various signals exchanged with the Ai-Modules (AIM) composing the Autonomous User.

At its core, A-User Control decides what the A-User should do, which AIM should do it, and how it should do it – all while respecting the Rights held by the A-User in and the Rules defined by the metaverse. Obviously, A-User Control either executes an Action directly or delegates another Process in the metaverse to carry it out.

A-User Control is not just about triggering actions. A-User Control also manages the operation of its AIMs, for instance A-User Formation, which can turn text produced by the Basic Knowledge (LLM) and the Entity Status selected by Personality Alignment into a speaking and gesturing Avatar. A-User Control sends shaping commands to A-User Formation, ensuring the Avatar’s behaviour aligns with metaverse-generated cues and contextual constraints.

A-User Control is not independent of human influence. The human, i.e., the A-User “owner”, can override, adjust, or steer its behaviour. This makes A-User Control a hybrid system: autonomous by design, but open to human modulation when needed.

The control begins when A-User Control triggers Context Capture to perceive the current M-Location – the spatial zone of the metaverse where the User is active. That snapshot, called Context, includes spatial descriptors and a readout of the human’s cognitive and emotional posture called Entity State. From there, the two Spatial Reasoning components – Audio and Visual – use Context to analyse the scene and sending outputs to Domain Access and Prompt Creation, which enrich the User’s input and guide the A-User’s understanding.

As reasoning flows through Basic Knowledge, Domain Access, and User State Refinement, A-User Control ensures that every action, rendering, and modulation is aligned with the A-User’s operational logic.

In summary, the A-User Control is the executive function of the A-User: part orchestrator, part gatekeeper, part interpreter. It’s the reason the Avatar doesn’t just speak – it does so while being aware of the Context – both the spatial and User components – with purpose, permission, and precision.

Context Capture: The A-User’s First Glimpse of the World

Context Capture is the A-User’s sensory front-end – the AIM that opens up perception by scanning the environment and assembling a structured snapshot of what’s out there in the moment. It is the first AI Module (AIM) in the loop providing the data and setting the stage for everything that follows.

When A-User Control decides it’s time to engage, it prompts Context Capture to focus on a specific M-Location – the zone where the User is active, rendering its Avatar.

The product of Context Capture is called Context – a time-stamped, multimodal snapshot that represents the A-User’s initial understanding of the scene. But this isn’t just raw data. Context is composed of two key ingredients: Audio-Visual Scene Descriptors and User State.

The Audio-Visual Scene Descriptors are like a spatial sketch of the environment. They describe what’s visible and audible: objects, surfaces, lighting, motion, sound sources, and spatial layout. They provide the A-User with a sense of “what’s here” and “where things are.” But they’re not perfect. These descriptors are often shallow – they capture geometry and presence but not meaning. A chair might be detected as a rectangular mesh with four legs, but Context Capture doesn’t know if it’s meant to be sat on, moved, or ignored.

That’s where Spatial Reasoning comes in. Spatial Reasoning is the AIM that takes this raw spatial sketch and starts asking the deeper questions:

“Which object is the User referring to?”
“Is that sound coming from a relevant source?”
“Does this object afford interaction, or is it just background?”

It analyses the Context and produces an enhanced Scene Description containing a refined map of spatial relationships, referent resolutions, and interaction constraints and a set of cues that enrich the user’s input – highlighting which objects or sounds are relevant, how close they are, and how they might be used.

These outputs are sent downstream to Domain Access and Prompt Creation. The former refines the spatial understanding of the scene. The latter enriches the A-User’s query when it formulates the prompt to the Basic Knowledge (LLM).

Then there is Entity State – a snapshot of the User’s cognitive, emotional, and attentional posture. Is the User focused, distracted, curious, frustrated? Context Capture reads facial expressions, gaze direction, posture, and vocal tone to infer a baseline state. But again, it’s just a starting point. User behaviour may be nuanced, and initial readings can be incomplete, noisy or ambiguous. That’s why User State Refinement exists – to track changes over time, infer deeper intent, and guide the alignment of the A-User’s expressive behaviour done by Personality Alignment.

In short, Context Capture is the A-User’s first glimpse of the world – a fast, structured perception layer that’s good enough to get started, but not good enough to finish the job. It’s the launchpad for deeper reasoning, richer modulation, and more expressive interaction. Without it, the A-User would be blind. With it, the system becomes situationally aware, emotionally attuned, and ready to reason – but only if the rest of the AIMs do their part.

Audio Spatial Reasoning: The Sound-Aware Interpreter

Audio Spatial Reasoning is the A-User’s acoustic intelligence module – the one that listens, localises, and interprets sound not just as data, but as data having a spatially anchored meaning. Therefore, Its role is not just about “hearing”, it is also about “understanding” where sound is coming from, how relevant it is, and what it implies in the context of the User’s intent in the environment.

When the A-User system receives a Context snapshot from Context Capture – including audio streams with a position and orientation and a description of the User’s emotional state (called Entity State) – Audio Spatial Reasoning start an analysis of directionality, proximity, and semantic importance of incoming sounds. The conclusion is something like “That voice is coming from the left, with a tone of urgence, and its orientation is directed at the A-User.”

All this is represented with an extension of the Audio Scene Descriptors describing:

Which audio sources are relevant
Where they are located in 3D space
How close or far they are
Whether they’re foreground (e.g., a question) or background (e.g., ambient chatter)

This guide is sent to Prompt Creation and Domain Access. Let’s see what happens with the former. The extended Audio Scene Descriptors are fused with the User’s spoken or written input and the current Entity State. The result is a PC-Prompt – a rich query enriched with text expressing the multimodal information collected so far. This is passed to Basic Knowledge for reasoning.

The Audio Scene Descriptors are further processed and integrated with domain-specific information. The response is called Audio Spatial Directive that includes domain-specific logic, scene priors, and task constraints. For example, if the scene is a medical simulation, Domain Access might tell Audio Spatial Reasoning that “only sounds from authorised personnel should be considered”. This feedback helps Audio Spatial Reasoning refine its interpretation – filtering out irrelevant sounds, boosting priority for critical ones, and aligning its spatial model with the current domain expectations.

Therefore, we can call Audio Spatial Reasoning as the A-User’s auditory guide. It knows where sounds are coming from, what they mean, and how they should influence the A-User’s behaviour. The A-User responds to a sound with spatial awareness, contextual sensitivity, and domain consistency.

Visual Spatial Reasoning: The Vision‑Aware Interpreter

When the A-User acts in a metaverse space, sound doesn’t tell the whole story. The visual scene – objects, zones, gestures, occlusions – is the canvas where situational meaning unfolds. That’s where Visual Spatial Reasoning comes in: it’s the interpreter that makes sense of what the Autonomous User sees, not just what it hears. It can be considered as the visual analyst embedded in the Autonomous User’s “brain” that understands objects’ geometry, relationships, and salience.

Visual Spatial Reasoning doesn’t just list objects; it understands their geometry, relationships, and salience. A chair isn’t just “a chair” – it’s occupied, near a table, partially occluded, or the focus of attention. By enriching raw descriptors into structured semantics, Visual Spatial Reasoning transforms objects made of pixels into actionable targets.

This is what it does

Scene Structuring: Takes and organises raw visual descriptors into coherent spatial maps.
Semantic Enrichment: Adds meaning – classifying objects, inferring affordances, and ranking salience.
Directed Alignment: Filters and prioritises based on the A-User Controller’s intent, ensuring relevance.
Traceability: Every refinement step is auditable, to trace back why, “that object in the corner” became “the salient target for interaction.”

Without Visual Spatial Reasoning, the metaverse would be a flat stage of unprocessed visuals. With it, visual scenes become interpretable narratives. It’s the difference between “there are three objects in the room” and “the User is focused on the screen, while another entity gestures toward the door.”

Of course, Visual Spatial Reasoning does not replace vision. It bridges the gap between raw descriptors and effective interaction, ensuring that the A‑User can observe, interpret, and act with precision and intent.

If Audio Spatial Reasoning is the metaverse’s “sound‑aware interpreter,” then Visual Spatial Reasoning is its “sight‑aware analyst” that starts by seeing objects and eventually can understand their role, their relevance, and their story in the scene.

Prompt Creation: Where Words Meet Context

The Prompt Creation module is the storyteller and translator in the Autonomous User’s “brain”, It takes raw sensory input – audio and visual spatial data of Context (such as objects in a scene with their position, orientation and velocity) and the Entity State – and turns it into a well‑formed prompt that Basic Knowledge can actually understand and respond to.

The audio and visual components of Spatial Reasoning provide the information on things around the User such as “who’s in the room,” “what’s being said,” “what objects are present,” and “what’s the User doing”. Context Capture provides Entity State as a rich description of the A‑User’s understanding of the “internal state” of the User – which may a representation of a biologically real User, if it represents a human, or simulated when the User represents an agent. The task of Prompt Creation is to synthesise these sources of information into a PC‑Prompt Plan. This plan starts from what the User said, adds intent (e.g., “User wants help” or “User is asking a question”), includes the context around the User (e.g., “User is in a virtual kitchen”), and embeds User State (e.g., “User seems confused”).

This information – conveniently represented as a JSON object – is converted into natural language and passed to Basic Knowledge that produces a natural language response called the Initial Response – initial because there are more processing elements in the A‑User pipeline that will refine and improve the answer before it is rendered in the metaverse.

Prompt Creation gives the AI a sense of narrative, so the A-User can:

– Ask the right clarifying question.

– Respond with relevance to the situation.

– Adapt to the environment and User mood.

– Maintain continuity across interactions.

If the User says: “Can you help me cook?”

– Spatial Reasoning notes the User is in a virtual kitchen with utensils and ingredients.

– Entity State suggests the User looks uncertain.

– Prompt Creation combines these into: “User is asking for cooking help, is in a kitchen, seems unsure.”

This Initial Response is then passed to Domain Access, which may elaborate a new prompt enriched with domain-specific information (in this case “cooking”, when Basic Knowledge is not well informed about cooking).

Prompt Creation turns raw multimodal input and spatial information into meaningful prompts so the AI can think, speak, and act with purpose. It is the scriptwriter that ensures the A‑User’s dialogue is not only coherent but also contextually aware, emotionally attuned, and situationally precise.

Domain Access: The Specialist Brain Plug-in for the Autonomous User

The Basic Knowledge module is a generalist language model that “knows a bit of everything.” In contrast, Domain Access is the expert layer that enables the Autonomous User to tap into domain-specific intelligence for deeper understanding of user utterances and their context.

How Domain Access Works

Receives Initial Response: Domain Access starts with the response of Basic Knowledge, the generalist model’s response to the prompt generated by Prompt Creation.
Converts to DA-Input: As the natural language response is not the best way to process the response, it is converted into a JSON object called DA-Input for structured processing.
Gets specilised knowledge by pulling in domain vocabulary such as, jargon and technical terms.
Creates the next prompt by using this specialised knowledge:
- Injects rules and constraints (e.g., standards, legal compliance).
- Adds reasoning patterns (e.g., diagnostic flows, contractual logic).

All enrichment happens in the JSON domain and so is the produced DA-Prompt Plan – a domain-aware structure ready for conversion into natural language – called DA-Prompt – and resubmission into the knowledge/response pipeline.

Without Domain Access, the A-User is like a clever intern: knowledgeable but lacking depth and experience. With Domain Access, it becomes n experienced professional that can:

Deliver accurate, context-aware answers.
Avoid hallucinations by grounding responses in domain rules.
Address different application domains by swapping or adding domain modules without rebuilding the entire A-User.

Basic Knowledge: The Generalist Engine Getting Sharper with Every Prompt

Basic Knowledge is the core language model of the Autonomous User – the “knows-a-bit-of-everything” brain. It’s the provider of the first response to a prompt but the Autonomous User doesn’t fire off just one answer but four of them in a progressive refinement loop, providing smarter and more context-aware responses with every refined prompt.

The Journey of a Prompt

Starts Simple: The first prompt from Prompt Creation is a rough draft because the A-User has only a superficial knowledge of the Context and User intent.
Domain Access adds expert seasoning: jargon, compliance rules, reasoning patterns. The prompt becomes richer and sharper.
User State Refinement injects dynamic knowledge about the User – refined emotions, more focused goals, better spatial context – so the prompt feels more attuned to what the User feels and wants.
Personality Alignment Tells A-User how to Behave: it ensures that the appropriate A-User’s style and mood drive the next prompt.
Final Prompt Delivery: when Basic Knowledge receives the last prompt (from Personality Alignment) the final touches have been added.

This sequence of prompts eventually provides:

Better responses: Each prompt reduces ambiguity.
Domain grounding: Avoids hallucinations by embedding rules and expert logic.
Personalisation: Adapts A-User’s tone and content to User State.
Scalability: Works across domains without retraining.

Basic Knowledge starts as a generalist, but thanks to refined prompts, it ends up delivering expert-level, context-aware, and User-sensitive responses. It starts from a rough sketch and, by iterating with specialist information sources, it provides a final response that includes all the information extracted or produced in the workflow.

User State Refinement: Turning a Snapshot into a Full Profile

When the A-User begins interacting, it starts with a basic User State captured by Context Capture – location, activity, initial intent, and perhaps a few emotional hints. This initial state is useful, but it’s like a blurry photo: the A-User knows that somebody ps there, but not the details that matter for nuanced interaction.

As the session unfolds, the A-User learns much more thanks to Prompt Creation, Spatial Reasoning, and Domain Access. Suddenly, the A-User understands not just what the User said, but what it meant, the context it operates in, and the reasoning patterns relevant to the domain. This new knowledge is integrated with the initial state so that subsequent steps – especially Personality Alignment and Basic Knowledge – are based on an appropriate understanding of the User State.

Why Update the User State?

Personality Alignment is where the A-User adapts tone, style, and interaction strategy. If it only relies on the first guess of the User State, it risks taking an incongruent attitude – formal when casual is needed, directive when supportive is expected. If the User State can be updated the A-User knows more about:

The environment incorporating jargon, compliance rules, and reasoning patterns.
The internal state and can adjust responses to confusion, urgency, or confidence.

The Refinement Process

Start with Context Snapshot: capture environment, speech, gestures, and basic emotional cues.
Inject Domain Intelligence from Domain Access: technical vocabulary, rules, structured reasoning.
Merge New Observations: emotional shifts, spatial changes, updated intent.
Validate Consistency: ensure module coherence for reliable downstream use.
Feed Forward: pass the refined state to Personality Alignment and sharper prompts to Basic Knowledge.

Personality Alignment: The Style Engine of A-User

Personality Alignment is where an A-User interacting with a User embedded in a metaverse environment stops being a generic bot and starts acting like a character with intent, tone, and flair. It’s not just a matter of what it utters – it’s about how those words land, how the avatar moves, and how the whole interaction feels.

The figure is an extract from the A-User Architecture Reference model representing Domain Access generating two streams of data related to the User and its environment and two recipient AI Modules: User State Refinement and Personality Alignment.

This is possible because the A-User receives the right inputs driving the Alignment of the A-User Personality with the refined User’s Entity State:

Personality Context Guide: Domain-specific hints from Domain Access (e.g., “medical setting → professional tone”).
Expressive State Guide: Emotional and attentional posture of the User (e.g., stressed → calming personality).
Refined Response: Text from Basic Knowledge in response to User State Refinement prompt.
Personality Alignment Directive: Commands to tweak or override the personality profile (e.g., “switch to negotiator mode”) from the A-User Control AI Module (AIM).

A smart integration of these inputs enables the A-User to deliver the following outputs:

A-User Entity State: the complete internal state of the A-User’s synthetic personality produced (tone, gestures, behavioural traits).
PA-Prompt: New prompt formulation including the final A-User personality (so the words sound right).
Personality Alignment Status: A structured report of personality and expressive alignment to the A-User Control AIM.

Here are some examples of personality profiles that Personality Alignment could use or blend:

Mentor Mode: Calm tone, structured answers, moderate gestures, empathy cues.
Entertainer Mode: Upbeat tone, humour, wide gestures, animated expressions.
Negotiator Mode: Firm tone, controlled gestures, strategic phrasing.
Assistant Mode: Neutral tone, minimal gestures, clarity-first responses.

A-User Formation: Building the A-User

If Personality Alignment gives the A-User its style, A-User Formation AIM gives the A-User its body and its voice, the avatar and the speech for the A-User Control to embed in the metaverse. The A-User stops being an abstract brain controlling various types of processing and becomes a visible, interactive entity. It’s not just about projecting a face on a bot; it’s about creating a coherent representation that matches the personality, the context, and the expressive cues.

Here is how this is achieved.

Inputs Driving A-User Formation:

A-User Entity Status: The personality blueprint from Personality Alignment (tone, gestures, behavioural traits).
Final Response: personality-tuned content from Basic Knowledge – what the avatar will utter.
A-User Control Command: Directives for rendering and positioning in the metaverse (e.g., MM-Add, MM-Move).
Rendering Parameters: Synchronisation cues for speech, facial expressions, and gestures.

What comes out of the box is a multimodal representation of the A-User (Speaking Avatar) that talks, moves, and reacts in sync with the A-User’s intent – the best expression the A-User can give of itself in the circumstances.

What Makes A-User Formation Special?

It’s the last mile of the pipeline – the point where all upstream intelligence (context, reasoning, User’s Entity Status estimation, personality) becomes visible and interactive. A-User Formation ensures:

Expressive Coherence: Speech, gestures, and facial cues match the chosen personality.
Contextual Fit: Avatar appearance and behaviour align with domain norms (e.g., formal in a medical setting, casual in a social lounge).
Technical Precision: Synchronisation across Personal Status modalities for natural and consistent interaction.
Goal: Deliver a coherent, expressive, and context-aware representation that feels natural and engaging in response to how the User was perceived at the beginning and processed during the pipeline.

No Comments InAll posts

Leonardo Chiariglione
2025-12-17

MPAI publishes the Company Performance Prediction V2.0 standard with a Request for Community Comments

Geneva, Switzerland – 17^th December 2025. MPAI – Moving Picture, Audio and Data Coding by Artificial Intelligence – the international, non-profit, unaffiliated organisation developing AI-based data coding standards – has concluded its 63^rd General Assembly (MPAI-63) publishing Version 2.0 of the Company Performance Prediction standard (CUI-CPP).

Technical Specification: Compression and Understanding of Industrial data (MPAI-CUI) – Company Performance Prediction (CUI-CPP) V2.0 is a significantly beefed-up version of the standard published in 2021. It assumes that the company, whose performance is assessed in a specified prediction horizon, provides data concerning its governance, its finances, and its risks. Governance and finances are described based on internationally recognised standards. Risks are divided into two categories: Primary Risks, i.e., risks for which a neural network model is available and can legally be used in the relevant jurisdiction, and Secondary Risks, provided by company statements pertaining to the perceived first-level Risks of the CUI-CPP Risk Taxonomy.

A first set of CUI-CPP outputs is given by: company organisation score, probability of default due to Primary Risks, probability of business discontinuity due to Primary and Secondary Risks, and business discontinuity probabilities. The second set of outputs is given by: impact that each Governance Descriptor has on the company organisation score; impact that each Governance and Financial Descriptor has on the Primary Discontinuity Probability; and impact that each Primary Risk Descriptor has on the Primary Discontinuity Probability.

Here is an Introduction to CUI-CPP V2.0.

A public online presentation of the CUI-CPP V2.0 standard will be held on 30 January 2026 at 16 UTC. Register here to attend.

MPAI is continuing the development of its work plan that involves the following activities:

AI Framework (MPAI-AIF): extending the MPAI-AIF specification to enable a client to access a remote MPAI-AIF Controller and an AI Module to communicate data to another AIM with associate metadata.
AI for Health (AIH-HSP): developing the specification of a system receiving and processing licenses AI Health Data and enabling clients to improve health processing models via federated learning.
Context-based Audio Enhancement (CAE-USC): developing the Audio Six Degrees of Freedom (CAE-6DF) and the Audio Object Rendering (CAE-AOR) specifications.
Connected Autonomous Vehicle (CAV-TEC): developing a new version of the flagship specification CAV-TEC with security support.
Compression and Understanding of Industrial Data (CUI-CPP): expecting comments on the Company Performance Prediction V2.0 specification.
End-to-End Video Coding (MPAI-EEV): exploring the potential of AI-based End-to-End Video coding in compressing video sequences.
AI-Enhanced Video Coding (MPAI-EVC): exploring use of AI to enhance the video codec performance.
Governance of the MPAI Ecosystem (MPAI-GME): operating the MPAI Ecosystem per the MPAI-GME Specification.
Human and Machine Communication (MPAI-HMC): exploring the use of AI in human-to-machine and machine-to-machine communication.
Multimodal Conversation (MPAI-MMC): exploring the impact of the PGM-AUA Call for Technologies on human-to-machine and machine-to-machine
MPAI Metaverse Model (MMM-TEC): developing security-protected protocols in the MMM-TEC specification.
Neural Network Watermarking (NNW-TEC): Developing the new Neural Network Watermarking (MPAI-NNW) – Technologies (NNW-TEC) including assessments of Neural Network Traceability Technologies.
Object and Scene Description (MPAI-OSD): discussing the impact of MPAI standards planned or under development on MPAI-OSD V1.4.
Portable Avatar Format (MPAI-PAF): discussing the impact of MPAI standards planned or under development on MPAI-PAF V1.5.
AI Module Profiles (MPAI-PRF): extending the scope of the current version of AI Module Profiles.
Server-based Predictive Multiplayer Gaming (MPAI-SPG): exploring new standard opportunities in the domain.
Data Types, Formats, and Attributes (MPAI-TFA) extending the standard to data types used by MPAI standards that are planned or under development.
XR Venues (XRV-LTP): developing the standard for improved execution of Live Theatrical Performances using AI.

No Comments InAll posts

Leonardo Chiariglione
2025-12-12

A-User Formation: Building the A-User

If Personality Alignment gives the A-User its style, A-User Formation AIM gives the A-User its body and its voice, the avatar and the speech for the A-User Control to embed in the metaverse. The A-User stops being an abstract brain controlling various types of processing and becomes a visible, interactive entity. It’s not just about projecting a face on a bot; it’s about creating a coherent representation that matches the personality, the context, and the expressive cues.

We have already presented the system diagram of the Autonomous User (A-User), an autonomous agent able to move and interact (walk, converse, do things, etc.) with another User in a metaverse. The latter User may be an A-User or be under the direct control of a human and is thus called a Human-User (H-User). The A-User acts as a “conversation partner in a metaverse interaction” with the User.

This is the tenth and last of a sequence of posts aiming to illustrate more in depth the architecture of an A-User and provide an easy entry point for those who wish to respond to the MPAI Call for Technology on Autonomous User Architecture. The first six dealt with 1) the Control performed by the A-User Control AI Module on the other components of the A-User; 2) how the A-User captures the external metaverse environment using the Context Capture AI Module; 3) listens, localises, and interprets sound not just as data, but as data having a spatially anchored meaning; 4) makes sense of what the Autonomous User sees by understanding objects’ geometry; relationships, and salience; 5) takes raw sensory input and the User State and turns them into a well‑formed prompt that Basic Knowledge can actually understand and respond to; 6) taps into domain-specific intelligence for deeper understanding of user utterances and operational context; 7) the core language model of the Autonomous User – the “knows-a-bit-of-everything” brain, the first responder to a prompt of a sequence of four; 8) converting a “blurry photo” of the User in the environment taken at the onset of the process into a focused picture; and 9) providing not only a generic bot but a character with intent, tone, and flair – not only a matter of what the avatar utters but how its words land, how the avatar moves, and how the whole interaction feels.

A-User Formation AIM gives the A-User a body and a voice, the results of a chain or AI Modules composing the A-User pipeline enabling a perceptible and coherent representation that matches the personality, the context, and the expressive cues.

The inputs driving A-User Formation are

A-User Entity Status: The personality blueprint from Personality Alignment (tone, gestures, behavioural traits).
Final Response: personality-tuned content from Basic Knowledge – what the avatar will utter.
A-User Control Command: Directives for rendering and positioning in the metaverse (e.g., MM-Add, MM-Move).
Rendering Parameters: Synchronisation cues for speech, facial expressions, and gestures.

What comes out of the box: Formation Status

A multimodal representation of the A-User (Speaking Avatar) that talks, moves, and reacts in sync with the A-User’s intent – the best expression the A-User can give of itself in the circumstances.
Structured report on the processing that led to the result.

What Makes A-User Formation Special?

Expressive Coherence: Speech, gestures, and facial cues match the chosen personality.
Contextual Fit: Avatar appearance and behaviour align with domain norms (e.g., formal in a medical setting, casual in a social lounge).
Technical Precision: Synchronisation across Personal Status modalities for natural and consistent interaction.

Key Points to Take Away about A-User Formation

Purpose: Turns the A-User’s personality and reasoning into a visible and audible interactive avatar.
Inputs: Personality-aligned final response, control commands, and rendering parameters.
Outputs: Speaking avatar, formation status.
Goal: Deliver a coherent, expressive, and context-aware representation that feels natural and engaging in response to how the User was perceived at the beginning and processed during the pipeline.

No Comments InAll posts

Leonardo Chiariglione
2025-12-12

Personality Alignment: The Style Engine of A-User

This is the ninth of a sequence of posts aiming to illustrate more in depth the architecture of an A-User and provide an easy entry point for those who wish to respond to the MPAI Call for Technology on Autonomous User Architecture. The first six dealt with 1) the Control performed by the A-User Control AI Module on the other components of the A-User; 2) how the A-User captures the external metaverse environment using the Context Capture AI Module; 3) listens, localises, and interprets sound not just as data, but as data having a spatially anchored meaning; 4) makes sense of what the Autonomous User sees by understanding objects’ geometry; relationships, and salience; 5) takes raw sensory input and the User State and turns them into a well‑formed prompt that Basic Knowledge can actually understand and respond to; 6) taps into domain-specific intelligence for deeper understanding of user utterances and operational context; 7) the core language model of the Autonomous User – the “knows-a-bit-of-everything” brain, the first responder to a prompt of a sequence of four; and 8) converting a “blurry photo” of the User in the environment taken at the onset of the process into a focused picture.

This is possible because the A-User receives the right inputs driving the Alignment of the A-User Personality with the refined User’s Entity State:

Personality Context Guide: Domain-specific hints from Domain Access (e.g., “medical setting → professional tone”).
Expressive State Guide: Emotional and attentional posture of the User (e.g., stressed → calming personality).
Refined Response: Text from Basic Knowledge in response to User State Refinement prompt.
Personality Alignment Directive: Commands to tweak or override the personality profile (e.g., “switch to negotiator mode”) from the A-User Control AI Module (AIM).

A smart integration of these inputs enables the A-User to deliver the following outputs:

A-User Entity State: the complete internal state of the A-User’s synthetic personality produced (tone, gestures, behavioural traits).
PA-Prompt: New prompt formulation including the final A-User personality (so the words sound right).
Personality Alignment Status: A structured report of personality and expressive alignment to the A-User Control AIM.

Here are some examples of personality profiles that Personality Alignment could use or blend:

Mentor Mode: Calm tone, structured answers, moderate gestures, empathy cues.
Entertainer Mode: Upbeat tone, humour, wide gestures, animated expressions.
Negotiator Mode: Firm tone, controlled gestures, strategic phrasing.
Assistant Mode: Neutral tone, minimal gestures, clarity-first responses.

Key Points to Take Away about Personality Alignment

Purpose: Makes A-User’s delivery context-aware and emotionally tuned.
Inputs: Domain context, user emotional state, refined semantic response, and directives.
Outputs: Personality blueprint (Entity Status), PA-Prompt for expressive rendering, and alignment status.
Profiles: For example, Mentor, Entertainer, Negotiator, Assistant – each with tone, gesture style, and behavioural traits.
Goal: Coherent, adaptive interaction that feels natural and persuasive in the metaverse.

No Comments InAll posts

Cookie	Duration	Description
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Technical".
CookieLawInfoConsent	1 year	The cookie is set by the GDPR Cookie Consent plug-in and is used to store whether the user has consented to the use of cookies or not. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_pk_id.6.08a8	13 months	Used to store a few details about the user such as the unique visitor ID
_pk_ses.6.08a8	30 minutes	Short lived cookies used to temporarily store data for the visit

Archives: 2024-07-11

Notice