All posts Archives - Page 3 of 18

Leonardo Chiariglione
2024-08-25

MPAI propounds the development of Collaborative Immersive Laboratories (CIL)

Collaborative Immersive Laboratory (XRV-CIL) is project designed to enable researchers in network-connected physical venues equipped with devices that create an immersive virtual environment to manipulate and visualise laboratory data with other researchers located at different places and having a simultaneous immersive experience.

XR Venue (MPAI-XRV) is an MPAI project addressing a multiplicity of use cases enabled by Extended Reality (XR) and enhanced by Artificial Intelligence (AI) technologies. MPAI-XRV specifies design methods for AI Workflows and AI Modules that automate complex processes in a variety of application domains. Venue is used as a synonym for Real and Virtual Environment. CIL is one of the XRV projects.

One use case for CIL would be to work with medical data such as scans to discover patterns within cellular data to facilitate therapy identification as part of the following workflow:

Start from a file (e.g., a LIF file for data from a confocal microscope) that contains slices of a 3D Object (+ time) produced by machines from different manufacturers and enable real time navigation of the 3D object starting from slices.
Use AI trained filter to filter out the noise. Noise is information not part of the scanned object that is found in the slices.
Preserve the slices by applying specific processes, e.g., dehydration.
Enhance some specific features of the object by using appropriate contrasting agents, e.g., monoclonal antibodies.
Use the slices in sufficient number to train a Machine:
1. To count the cells in a human tissue from different organs, different living bodies, and anatomical features presenting different health conditions.
2. To identify the typology and functions of the cells caused by the influence of genomics and environment, i.e., phenotyping.
Request the trained Machine to produces “inferences” used to count and identify the cells having specified features.
Generate statistics of the inferences produced by the Machine.
Human navigates the cleaned (noise-filtered) slices as an object and verify whether the inferences of the Machine can be trusted.
A trajectory of possible outcomes can plot towards multiple decision paths for desired outcomes based on change in how the living body changes over time guiding proactive decisions in habit and therapeutical interventions.
After a certain time, redo steps 1 to 9.

For instance, Figure 1 shows a CT or MRI dataset being normalised, analysed and the result rendered with e.g., a renderer that is common to the participating labs. Each Lab may enter annotations to the dataset or apply rendering controls that enhance appropriate parts of the rendered dataset.

Figure 1 – An example of data analysed and rendered in an XRV-CIL

Figure 1 represents a specific case of the full XRV-CIL project while Figure 2 represents the more general case.

Figure 2 – The multi-technology, multi-location XRV-CIL case

Let us assume that there are N geographically distributed labs providing datasets acquired with different technologies at different times related to a particular application domain (where each lab may provide more than one dataset). Technology-specific AI Modules normalise the datasets. A Fusion AI Module controlled by Fusion Parameters from each lab provides M Fused Data (number of Fused Data is independent of the number of input datasets).

Fused Data are processed by Analysis AI Modules driven by Analysis Parameters possibly coming from one or more labs. They produce Desired Results which are then Rendered specifically for each Lab either locally or in the cloud.

The model of Figure 2 is applicable to various domains for scientific, industrial, and educational applications such as:

Medical
Anthropological
Multi- and hyper-spectral Imaging
Spectroscopy
Chemistry
Geology and Material Science
Non-destructive testing
Oceanography
Astronomy

XRV-CIL promises to dramatically improve the way data is collaboratively acquired, processes, and shared among laboratories.

MPAI, the international, unaffiliated, non-profit organisation developing standards for AI-based data coding might contribute to the areas of dataset normalisation, specification of input/output and metadata of processing elements, interaction protocols with rendered processing results. MPAI could also contribute to identification of specific AI technologies to process datasets, e.g., cell counting mentioned above.

No Comments InAll posts

Leonardo Chiariglione
2024-08-21

MPAI publishes a new version of Context-based Audio Enhancement (MPAI-CAE) and a new standard for Data Qualifiers (MPAI-TFA)

Geneva, Switzerland – 21^st August 2024. MPAI – Moving Picture, Audio and Data Coding by Artificial Intelligence – the international, non-profit, and unaffiliated organisation developing AI-based data coding standards – has concluded its 47^th General Assembly (MPAI-47) by approving for publication the Context-based Audio Enhancement (MPAI-CAE) V2.2 standard and the new Data Types, Formats, and Attributes (MPAI-TFA) V1.0 standard for Community Comments. The new versions are released using the new full web-based publication method.

Technical Specification: Context-based Audio Enhancement (MPAI-CAE) V2-2 improves the user experience for different audio-related applications, such as entertainment, restoration, and communication in a variety of different contexts such as in the home, in the office, and in the studio. V2.2 extends the capabilities of several data formats used across MPAI standard.

Technical Specification: Data Types, Formats, and Attributes (MPAI-TFA) V1.0 specifies Qualifiers – a Data Type containing Sub-Types, Formats, and Attributes – associated to “media” Data Types – currently Text, Speech, Audio, and Visual – that facilitate-enable the operation of an AI Module receiving a Data Type instance.

The capabilities of the standards will be presented online on September 24 at 16:00 UTC for MPAI-CAE V2.2 and August 27 at T14:00 UTC for MPAI-TFA). To attend, please register at https://tinyurl.com/2wj8e4bn for MPAI-CAE V2.2 and at https://tinyurl.com/3p8j74st for MPAI-TFA V1.0.

MPAI is continuing its work plan that involves the following activities:

AI Framework (MPAI-AIF): developing open-source applications based on the AI Framework.
AI for Health (MPAI-AIH): developing the specification of a system enabling clients to improve models processing health data and federated learning to share the training.
Context-based Audio Enhancement (CAE-DC): waiting for response to the Audio Six Degrees of Freedom (CAE-6DF).
Connected Autonomous Vehicle (MPAI-CAV): developing the new MPAI-CAV Technologies (CAV-TEC) part of the standard.
Compression and Understanding of Industrial Data (MPAI-CUI): developing use cases and functional requirements for MPAI-CUI V2.0 supporting more corporate risks.
End-to-End Video Coding (MPAI-EEV): video coding using AI-based End-to-End Video coding.
AI-Enhanced Video Coding (MPAI-EVC): working on a Call for Technologies.
Governance of the MPAI Ecosystem (MPAI-GME): working on version 2.0 of the Specification.
Human and Machine Communication (MPAI-HMC): developing reference software.
Multimodal Conversation (MPAI-MMC): finalising V2.2 and developing Performance Assessment of some important AI Modules.
MPAI Metaverse Model (MPAI-MMM): developing the new MPAI-MMM Technologies (MMM-TEC) part of the standard.
Neural Network Watermarking (MPAI-NNW): developing reference software for enhanced applications.
Object and Scene Description (MPAI-PAF): finalising V1.2 and developing reference software, conformance testing and new areas for digital humans.
Portable Avatar Format (MPAI-PAF): finalising V1.2 and developing reference software, conformance testing and new areas for digital humans.
AI Module Profiles (MPAI-PRF): specifying which features an AI Workflow or and AI Module supports.
Server-based Predictive Multiplayer Gaming (MPAI-SPG): developing technical report on mitigation of data loss and cheating.
Data Types, Formats, and Attributes (MPAI-TFA): extending he standard to data types used by other MPAI standards.
XR Venues (MPAI-XRV): developing the standard enabling improved development and execution of Live Theatrical Performances.

Legal entities and representatives of academic departments supporting the MPAI mission and able to contribute to the development of standards for the efficient use of data can become MPAI members.

Please visit the MPAI website, contact the MPAI secretariat for specific information, subscribe to the MPAI Newsletter and follow MPAI on social media: LinkedIn, Twitter, Facebook, Instagram, and YouTube.

No Comments InAll posts

Leonardo Chiariglione
2024-07-11

MPAI publishes Version 1.1 of the Human and Machine Communication standard

Geneva, Switzerland – 10 July 2024. MPAI – Moving Picture, Audio and Data Coding by Artificial Intelligence – the international, non-profit, and unaffiliated organisation developing AI-based data coding standards has concluded its 46^th General Assembly (MPAI-46) by approving for publication the new Version 1.1 of the Human and Machine Communication standard.

Technical Specification: Human and Machine Communication (MPAI-HMC) V1.1 enables an Entity to hold a multimodal communication with another Entity possibly in a different Context. The standard is agnostic of the parties in a communication as an Entity can be a human in an audio-visual scene of a real space or a Machine in an Audio-Visual Scene of a Virtual Space. Humans and Machines can operate in different Contexts, e.g., language and culture. MPAI-HMC references a range of technologies specified in five MPAI Standards.

MPAI-HMC will be presented online on 22 July at 15 UTC (8 PDT, 11 EDT, 23 CST, 24 KST). To attend the presentation, register at https://us06web.zoom.us/meeting/register/tZEtde-orTwqE9x9sSkauN9CxKsLvbJrIeSF.

At previous meetings, MPAI has published four draft standards for Community Comments: Context-base Audio Enhancement V2.2, Multimodal Conversation V2.2, Object and Scene Description V1.1, and Portable Avatar Format V1.2. Interested parties should check the mentioned links and make comments as the deadline for submission has not been reached yet.

MPAI is continuing its work plan that involves the following activities:

AI Framework (MPAI-AIF): developing open-source applications based on the AI Framework.
AI for Health (MPAI-AIH): developing the specification of a system enabling clients to improve models processing health data and federated learning to share the training.
Context-based Audio Enhancement (CAE-DC): preparing new projects.
Connected Autonomous Vehicle (MPAI-CAV): Functional Requirements of the data used by the MPIA-CAV – Architecture standard.
Compression and Understanding of Industrial Data (MPAI-CUI): preparation for an extension to existing standard that includes support for more corporate risks.
End-to-End Video Coding (MPAI-EEV): video coding using AI-based End-to-End Video coding.
AI-Enhanced Video Coding (MPAI-EVC). video coding with AI tools added to existing tools.
Human and Machine Communication (MPAI-HMC): developing reference software.
Multimodal Conversation (MPAI-MMC): developing reference software and exploring new areas.
MPAI Metaverse Model (MPAI-MMM): developing reference software specification and identifying metaverse technologies requiring standards.
Neural Network Watermarking (MPAI-NNW): developing reference software for enhanced applications.
Portable Avatar Format (MPAI-PAF): developing reference software, conformance testing and new areas for digital humans.
AI Module Profiles (MPAI-PRF): to specify which features an AI Module supports.
Server-based Predictive Multiplayer Gaming (MPAI-SPG): developing technical report on mitigation of data loss and cheating.
XR Venues (MPAI-XRV): developing the standard enabling improved development and execution of Live Theatrical Performance.

Legal entities and representatives of academic departments supporting the MPAI mission and able to contribute to the development of standards for the efficient use of data can become MPAI members.

Please visit the MPAI website, contact the MPAI secretariat for specific information, subscribe to the MPAI Newsletter, and follow MPAI on social media: LinkedIn, Twitter, Facebook, Instagram, and YouTube.

No Comments InAll posts

Leonardo Chiariglione
2024-06-15

MPAI publishes new Standard, Reference Software, and Conformance Testing Specification

Geneva, Switzerland – 12 June 2024. MPAI – Moving Picture, Audio and Data Coding by Artificial Intelligence – the international, non-profit, and unaffiliated organisation developing AI-based data coding standards has concluded its 45^th General Assembly (MPAI-45) by approving for publication the new AI Module Profiles, Neural Network Watermarking Reference Software, and the Multimodal Conversation Reference Software.

Technical Specification: AI Module Profiles (MPAI-PRF) V1.0 is an important addition to the MPAI architecture because it enables an AI Module to signal its capabilities in terms of input and output data and specific functionalities.

Reference Software Specification: Neural Network Watermarking (MPAI-NNW) V1.2 makes available to the community software implementing the functionalities of the Neural Network Watermarking Standard when implemented in an AI Framework and using limited capability Microcontroller Units.

Conformance Testing Specification: Multimodal Conversation (MPAI-MMC) V2.1 publishes methods and data sets to enable a developer or a user to ascertain the claims of an implementation to conform with the specification of the Conversation with Emotion, Multimodal Question Answering, and Unidirectional Speech Translation AI Workflows.

At its previous meeting, MPAI has published three Calls for Technologies on Six Degrees of Freedom Audio, Connected Autonomous Vehicle – Technologies, and MPAI Metaverse Model – Technologies. Interested parties should check the mentioned links for update as the deadline for submission has not been reached yet.

MPAI is happy to announce that the Institute if Electrical and Electronic Engineers has adopted the companion Connected Autonomous Vehicle – Architecture standard as an IEEE standard identified as 3307-2024.

MPAI is continuing its work plan that involves the following activities:

AI Framework (MPAI-AIF): developing open-source applications based on the AI Framework.
AI for Health (MPAI-AIH): developing the specification of a system enabling clients to improve models processing health data and federated learning to share the training.
Context-based Audio Enhancement (CAE-DC): preparing new projects.
Connected Autonomous Vehicle (MPAI-CAV): Functional Requirements of the data used by the MPIA-CAV – Architecture standard.
Compression and Understanding of Industrial Data (MPAI-CUI): preparation for an extension to existing standard that includes support for more corporate risks.
End-to-End Video Coding (MPAI-EEV): video coding using AI-based End-to-End Video coding.
AI-Enhanced Video Coding (MPAI-EVC). video coding with AI tools added to existing tools.
Human and Machine Communication (MPAI-HMC): developing reference software.
Multimodal Conversation (MPAI-MMC): developing reference software and exploring new areas.
MPAI Metaverse Model (MPAI-MMM): developing reference software specification and identifying metaverse technologies requiring standards.
Neural Network Watermarking (MPAI-NNW): developing reference software for enhanced applications.
Portable Avatar Format (MPAI-PAF): developing reference software, conformance testing and new areas for digital humans.
AI Module Profiles (MPAI-PRF): to specify which features an AI Module supports.
Server-based Predictive Multiplayer Gaming (MPAI-SPG): developing technical report on mitigation of data loss and cheating.
XR Venues (MPAI-XRV): developing the standard enabling improved development and execution of Live Theatrical Performance.

Legal entities and representatives of academic departments supporting the MPAI mission and able to contribute to the development of standards for the efficient use of data can become MPAI members.

Visit the MPAI website, contact the MPAI secretariat for specific information, subscribe to the MPAI Newsletter, and follow MPAI on social media LinkedIn, Twitter, Facebook, Instagram, and YouTube.

No Comments InAll posts

Leonardo Chiariglione
2024-06-03

A standard for autonomous vehicle componentisation

The 44^th MPAI General Assembly has published three Calls for Technologies. The Connected Autonomous Vehicle – Technologies (CAV-TEC) Call requests parties having rights to technologies satisfying the CAV-TEC Use Cases and Functional Requirements and the CAV-TEC Framework Licence to respond to the Call preferably using the CAV-TEC Template for Responses. An online presentation of this Call will be held on 2024/06/6 (Thursday) at 16 UTC. Please register, if you wish to attend the presentation (recommended if you intend to respond).

MPAI kicked off the Connected Autonomous Vehicle (MPAI-CAV)) project in the first days after its establishment. The project was particularly challenging and only in September 2023, MPAI was ready to publish Version 1.0 of Technical Specification: Connected Autonomous Vehicle (MPAI-CAV) – Architecture (CAV-ARC). This specified a CAV as a system composed of Subsystems for each of which functions, input/output data, and topology of components were specified. In its turn each subsystem was broken down into components of which functions and input/output data were specified. Each subsystem was assumed to be implemented as an AI Workflow (AIW) made of Components implemented as AI Modules (AIM) executed in an AI Framework (AIF) as specified by the AI Framework (MPAI-AIF) standard.

This is illustrated in Figure 1 where a human staying outside of a CAV interacts with it via the Human-CAV Interaction Subsystem (HCI) requesting the Autonomous Motion Subsystem (AMS) to take the human to a destination. The AMS requests spatial information from the Environment Sensing Subsystem (ESS), decides a route (possibly after consulting with the HCI and the human) and starts the travel. The spatial information provided by the ESS is used to create a model of the external environment and is possibly integrated with other environment models obtained from CAVs in range. Finally, the AMS has enough information to issue a command to the Motion Actuation Subsystem (MAS) to move the CAV to the desired place that can be close if the environment is “complex” or rather far if it is “simple”.

Figure 1 – The reference model of MPAI Connected Autonomous Vehicle

CAV-ARC is a functional specification in the sense that it identifies subsystems and components and their functions, but not the precise functions of the data exchanged. This can be seen in Figure 2 where, say, the Road State data type is identified and its functions generally described but without a full specification of functional requirements.

Figure 2 – The Autonomous Motion Subsystem.

The purpose of the CAV-TEC Call is to identify and characterise all the data types required by the CAV reference model and stimulate specific technology proposals.

Because many components of the HCI are shared with other MPAI standards, in September 2023 MPAI has published Multimodal Conversation (MPAI-MMC) V2.0 that includes the HCI specification whose scope goes beyond the general CAV-ARC scope. Most of the CAV-ARC data types of the CAV-HCI reference model of Figure 3 are fully specified.

Figure 4 – The Human-CAV Interaction Subsystem

What is still missing – and is part of the Call – is the full specification of the messages exchanged by the HCI with the AMS and its peer HCIs in remote CAVs.

The Use Cases and Functional Requirements document attached to the Call contains an initial form of JSON syntax and semantics of all data types and requests comments on their appropriateness and proposals for data type formats and attributes.

It is interesting to note that MPAI assumes that a CAV generates a “private” metaverse used to plan decisions to move the CAV in a real environment. A CAV may request – and the requested CAV may decide to share – part of their private metaverses to facilitate understanding of the common real space(s) they traverse. MPAI investigations have shown that a CAV’s private metaverse can be represented _and_ shared by the same or slightly extended MPAI-MMM metaverse technologies.

This observation has been put in practice and part of the Technical Specification: MPAI Metaverse Model (MPAI-CAV) – Architecture (MPAI-TEC) V1.1 is referenced by the MPAI-TEC Call. It should be noted that the parallel MMM-TEC Call for Technologies seeks to enhance the current MMM-ARC specification by providing an initial form of JSON syntax and semantics of all data types and requesting comments on their appropriateness and proposals for data type formats and attributes.

Bringing to reality the dream of autonomous vehicles will be a major contribution to improving our life and environment. Standards can greatly contribute to the conversion of CAVs from siloed systems to systems made of standard components that are more reliable, explainable, and affordable.

No Comments InAll posts

Leonardo Chiariglione
2024-05-29

Achieving metaverse interoperability

The 44^th MPAI General Assembly has published three Calls for Technologies. The MPAI Metaverse Model – Technologies. requests parties having rights to technologies satisfying the MMM-TEC Use Cases and Functional Requirements and the MMM-TEC Framework Licence to respond to the Call preferably using the MMM-TEC Template for Responses. An online presentation of this Call will be held on 2024/05/31 (Tuesday) at 15 UTC. Please register, if you wish to attend the presentation (recommended if you intend to respond).

MPAI kicked off the MPAI Metaverse Model (MPAI-MMM) project some 30 months ago. The project has already produced two Technical Reports exploring the field, one on Functionalities and one on Functional Profiles. In September 2023, MPAI published Version 1.0 of Technical Specification: MPAI Metaverse Model (MPAI-MMM) – Architecture (MMM-TEC). This specified the MMM Operation Model, composed of interacting Processes (specifically, Devices, Services, and Users representing humans), exchanging Items (data) and performing Actions on the Items. Two metaverse instances implementing the MMM Operation Model can interoperate – i.e., exchange and perform Actions on Items – if they satisfy the MMM-ARC specified Functional Requirements to enable Conversion Services to overcome possible technology incompatibility.

This is illustrated in Figure 3 where there are three humans (green rectangles) staying outside of an MMM instance and communicating with the MMM instance via Devices located half-way between the real world (universe) and the virtual world (metaverse). Each of human1 and human3 has a Device connected to a User while human2 has two Devices connected to one User each. The first User of the human2 is rendered as two Personae (Avatars) and the User of the third human is not rendered (i.e., it is just a Process performing Actions in the MMM).

Figure 3 – MPAI-MMM Operation Model

The links in the Figure represent possible interactions between MMM Processes. While not represented here for simplicity, Processes in different Metaverse Instances (M-Instances) may also interact. While MMM-ARC provided an initial form of interoperability, the MMM-TEC Call for Technologies published on 15 May seeks to provide a stronger form of interoperability.

The Use Cases and Functional Requirements attached to the Call contains an initial form of JSON syntax and semantics of Items and requests comments on their appropriateness and proposals for Items formats and attributes.

It is interesting to note that MPAI assumes that a CAV generates a “private” metaverse used to plan decision to move. A CAV may request and a CAV may decide to share part of their private metaverses to facilitate understanding of the common real space(s) they traverse. Investigations carried out by MPAI have shown that a CAV’s private metaverse can be represented _and_ shared by the same MPAI-MMM metaverse technologies.

This is the link to the next online presentation on the third MPAI Call for Technologies on the 6^th of June at 16 UTC.

No Comments InAll posts

Leonardo Chiariglione
2024-05-25

An MPAI standard for new dimensions of experience

What are the dimensions targeted by the new MPAI standard? To enable humans to experience virtual replicas of an audio scene of the real world from different perspectives while moving in it and orienting their heads. The standard will be called Six Degrees of Freedom Audio, and the acronym will be CAE-6DF.

As a rule, before developing a new standard, MPAI publishes a Call for Technologies where it describes the purpose of the Call and what a respondent to the Call should do to have it accepted for consideration. The Call is complemented by two documents – one specifying the functional requirements and one the commercial requirements the planned standard should satisfy.

The 44^th MPAI General Assembly has published three Calls for Technology, one for Six Degrees of Freedom Audio. The standard will be developed by the Context-based Audio Enhancement Development Committee (CAE-DC). An online presentation of this Call will be held on 2024/05/28 (Tuesday) at 16 UTC. If you wish to attend the presentation (recommended if you intend to respond), please register.

State-of-the-art VR headsets provide high-quality realistic visual content by tracking both the user’s orientation and position in 3D space. This capability opens new opportunities for enhancing the degree of immersion in VR experiences. VR games have become increasingly immersive over the years based on these developments.

However, despite the success of synthetic virtual environments such as 3D first-person games, those that feature content dynamically captured from the real world are yet to be widely deployed. Recent developments, such as dynamic Neural Radiance Fields (NERFs) and 4D Gaussian splatting, promise to give users the ability to be fully immersed in visual scenes populated by both static and dynamic entities.

Capturing audiovisual scenes with both static and dynamic entities promises a full immersion experience, but visual immersion alone is not sufficient without an equally convincing auditory immersion. CAE-6DF should enable users to experience an immersive theatre production through a VR headset, for example walking around actors and getting closer to different conversations, or a concert where a user can choose different seats to experience the performance with a 360° video associated with those viewing positions. Additionally, CAE-6DF should enable experiencing the acoustics of the concert hall from different perspectives.

The CAE-6DF Call is seeking innovative technologies that enable and support such experiences, specifically looking for technologies to efficiently represent content of scene-based or object-based formats, or a mixture of these, process it with low latency and provide high responsiveness to user movements. It should be possible to render the audio scene over loudspeakers or headphones. These technologies should also consider audio-visual cross-modal effects to present a high level of auditory immersion that complements the visual immersion provided by state-of-the-art volumetric environments.

Figure 1 depicts a reference model of the planned CAE-6DF standard where a lowercase or Capital initial letter of a term implies that the term represents an entity that is either part of the real space or of a Virtual Space.

Figure 1 – real spaces and Virtual Spaces in CAE-6DF

On the left-hand side there are real audio spaces. In the middle there is a Virtual Space generated by a computing platform which host digital representations of acoustical scenes and synthetic Audio Objects generated by the platform. Rendering of arbitrary user-selected Points of Views of the Audio Scene is performed on the right-hand side real space in a perceptually veridical fashion.

The Use Cases and Functional Requirements document attached to the Call considers four use cases:

Immersive Concert Experience (Music plus Video).
Immersive Radio Drama (Speech plus Foley/Effects).
Virtual lecture (Audio plus Video).
Immersive Opera/Ballet/Dance/Theatre experience (Music, Drama with 360° Video/6DoF Visual).

From these, a set of Functional Requirements are derived.

Audio experience and impact of visual conditions on the Audio experience:
1. Audio-Visual Contract, i.e. alignment of audio scenes with visual scenes.
2. Effects of locomotion on human audio-visual perception.
3. Orientation response, i.e., turning toward a sound source of interest.
4. Distance perception where visual and auditory experiences affect each other.
Content profiles:
1. Scene-based: the captured Audio Scene, for example using Ambisonics, is accurately reconstructed with a high degree of correspondence to the audio scene’s acoustic ambient characteristics.
2. Object-based: the Audio Scene comprises Audio Objects and associated metadata to allow synthesising a perceptually veridical, but not necessarily physically accurate, representation of the captured Audio Scene.
3. Mixed: a combination of scene-based and object-based profiles where Audio Objects can be overlaid or mixed with Scene-based Content.
Rendering modalities:
1. Loudspeaker-based, i.e., the content is rendered through at least two loudspeakers.
2. Headphone-based, i.e., the content is rendered through headphones.
Characteristics of rendering space when content is rendered through loudspeakers:
1. Shape and dimensions: Not larger than the captured space.
2. Acoustic ambient characteristics:
  1. Early decay time (EDT) lower than the captured space.
  2. Frequency mode density lower than the captured space.
  3. Echo density lower than the captured space.
  4. Reverberation time (T60) lower than the captured space.
  5. Energy decay curve characteristics same or lower than the captured space.
  6. Background noise less than 50dB(A) SPL.
The rendering space, if the headphones block ambient acoustical characteristics of the rendering space, should have the following characteristics:
1. Shape and dimensions: Not larger than the captured space.
2. Acoustic ambient characteristics: No constraints on the ambient characteristics as defined in point 2.2
User movement in the rendering space:
1. May be the result of actual locomotion/orientation of the User as tracked by sensors.
2. May be the result of virtual locomotion/orientation as actuated by controlling devices.
3. The maximum responsive latency of the audio system to user movement should be 20 ms or less (some applications may have higher latency).

A comment about the mentioned “Commercial Requirements” is that this is a misnomer because MPAI is not “selling” anything. Even the MPAI standards are freely downloadable from the MPAI web site. Indeed, the formal name used by MPAI is Framework Licence and it is a document that includes a set of guidelines that a submitter of a proposal commits to adopt when the standard will be approved and a licence for the use of patented items is issued. The CAE-6DF Framework Licence is available.

Finally, to facilitate the work of those submitting a response, MPAI is providing a document called Template for Responses.

CAE-6DF will join the growing list of MPAI standards: eleven standards have already been published – on application environment, audio, connected autonomous vehicle, company performance prediction, ecosystem governance, human and machine communication, object and scene description, and portable avatar format – and is about to publish a new one on AI Module Profiles. Reference software and conformance testing specifications are in the course of being published. The standards are revised and extended and new versions published when necessary. New standards are under development such as online gaming, AI for health, and XR venues and several projects in new areas such as AI-based video coding are being investigated.

No Comments InAll posts

Leonardo Chiariglione
2024-05-20

More members for the MPAI Standards Family

The latest – 15 June 2024 – MPAI General Assembly (MPAI-44) proved, if ever there was a need, that MPAI produces non only standards, but also promising new projects. MPAI-44 launched three new projects on 6 degrees of freedom audio (CSAE-6DF), technologies for connected autonomous vehicle components, and technologies for the MPAI Metaverse Model.

The CAE-6DF Call is seeking innovative technologies that enable users to walk in a virtual space representing a remote real space and enjoy the same experience as if they were in the remote space. Visit the CAE-6DF page and register to attend the online presentation of the Call of the 28^th of May, communicate your intention to respond to the Call by the 4^th of June, and submit your response by the 16^th of September.

The CAV-TEC Call requests technologies that build on the Reference Architecture of the Connected Autonomous Vehicle (CAV-ARC) to achieve a componentisation of connected autonomous vehicles. Visit the CAV-TEC page and register to attend the online presentation of the Call of the 6^th of July, communicate your intention to respond to the Call by the 14^th of June, and submit your response by the 5^th of July.

The MMM-TEC Call requests technologies that build on the Reference Architecture of the MPAI Metaverse Model (MMM-ARC) and enable a metaverse instance to interoperate with other metaverse instances. Visit the MMM-TEC page and register to attend the online presentation of the Call of the 31^st of July, communicate your intention to respond to the Call by the 7^th of June, and submit your response by the 6^th of July.

MPAI-44 has brought more results.

Four existing standards have been republished with significant new material that extends the current functionalities of the four and supports the needs of the three new projects in the form of extensions of existing standards that are published for Community Comments:

Object and Scene Description (MPAI-OSD) V1.1 adds a new Use Case for automatic audio-visual analysis of television programs and new functionalities for Visual and Audio-Visual Objects and Scenes.
The Cases of Context-based Audio Experience (MPAI-CAE) V2.2 standard (CAE-USC) supports new functionalities for Audio Object and Scene Descriptors.
The Multimodal Conversation (MPAI-MMC) V2.2 standard introduces new AI Modules and new Data Formats to support the new MPAI-OSD Television Media Analysis Use Case.
The Portable Avatar Format (MPAI-PAF) V1.2 standard extends the specification of the Portable Avatar to support new functionality requested by the CAV-TEC and MMM-TEC Calls.

Everybody is welcome to review the draft standards (new versions of existing standards) and send comments to the MPAI Secretariat by 23:59 UTC of the relevant day. Comments will be considered when the standards will be published in final form.

The standards mentioned above cover a significant share of the MPAI portfolio, but your navigation need not stop here. If you wish to delve into the other MPAI standards, you can go to their appropriate web pages where you can read overviews and find many links to relevant web pages. Each MPAI page contains a web version and other support material such as PowerPoint presentations and video recordings.

No Comments InAll posts

Leonardo Chiariglione
2024-05-19

MPAI publishes a record three Calls for Technologies, four new Standards for Community Comments and one Reference Software Specification

Geneva, Switzerland – 15 May 2024. MPAI – Moving Picture, Audio and Data Coding by Artificial Intelligence – the international, non-profit, and unaffiliated organisation developing AI-based data coding standards has concluded its 44^th General Assembly (MPAI-44) with an unprecedented display of productivity.

Three Calls for Technologies:

Context-based Audio Enhancement (MPAI-CAE): Six Degrees of Freedom (CAE-6DF) to develop an ambitious standard that will enable users to have the spatial experience of a remote audio environment by “walking into it”. Register for the online presentation on 2024/05/28 16:00 UTC.
Connected Autonomous Vehicle (MPAI-CAV) – Technologies to extend the existing MPAI-CAV – Architecture standard by supporting reference to specific technologies that are the target of the Call. Register for the online presentation on 2024/06/06 16:00 UTC.
MPAI Metaverse Model (MPAI-MMM) – Technologies to extend the existing MPAI-MMM – Architecture standard by supporting reference to specific technologies that are the target of the Call. Register for the online presentation on 2024/05/31 16:00 UTC.

Four new Standards published with a request for Community Comments:

Object and Scene Description (MPAI-OSD) V1.1 includes a new Use Case for automatic audio-visual analysis of television programs and new functionalities for Visual and Audio-Visual Objects and Scenes.
Context-based Audio Experience (MPAI-CAE) Use Cases (CAE-USC) V2.2 supports new functionalities in Audio Object and Audio Scene Descriptors.
Multimodal Conversation (MPAI-MMC) V2.2 introduces new AI Modules and new Data Formats to support the new MPAI-OSD Television Media Analysis Use Case.
Portable Avatar Format (MPAI-PAF) V1.2 extends the specification of the Portable Avatar to support new functionality requested by the MPAI-CAV and MPAI-MMM Calls.

One Reference Software Specification:

Context-based Audio Experience (MPAI-CAE) Use Cases (CAE-USC) V1.1.

MPAI’s scope of activities is quite wide as shown by its 11 already published standards that include 18 Use Cases and some 75 AI Modules, and 65 Data Types shared across different standards. MPAI has succeeded in developing a common layer of technologies supporting a variety of application domains.

MPAI is continuing its work plan that involves the following activities:

AI Framework (MPAI-AIF): developing open-source applications based on the AI Framework.
AI for Health (MPAI-AIH): developing the specification of a system enabling clients to improve models processing health data and federated learning to share the training.
Context-based Audio Enhancement (CAE-DC): preparing new projects.
Connected Autonomous Vehicle (MPAI-CAV): Functional Requirements of the data used by the MPIA-CAV – Architecture standard.
Compression and Understanding of Industrial Data (MPAI-CUI): preparation for an extension to existing standard that includes support for more corporate risks.
End-to-End Video Coding (MPAI-EEV): video coding using AI-based End-to-End Video coding.
AI-Enhanced Video Coding (MPAI-EVC). video coding with AI tools added to existing tools.
Human and Machine Communication (MPAI-HMC): developing reference software.
Multimodal Conversation (MPAI-MMC): developing reference software and conformance testing and exploring new areas.
MPAI Metaverse Model (MPAI-MMM): developing reference software specification and identifying metaverse technologies requiring standards.
Neural Network Watermarking (MPAI-NNW): developing reference software for enhanced applications.
Portable Avatar Format (MPAI-PAF): developing reference software, conformance testing and new areas for digital humans.
AI Module Profiles (MPAI-PRF): to specify which features an AI Module supports.
Server-based Predictive Multiplayer Gaming (MPAI-SPG): developing technical report on mitigation of data loss and cheating.
XR Venues (MPAI-XRV): developing the standard enabling improved development and execution of Live Theatrical Performance.

Legal entities and representatives of academic departments supporting the MPAI mission and able to contribute to the development of standards for the efficient use of data can become MPAI members.

No Comments InAll posts

Leonardo Chiariglione
2024-04-20

A new brick for the MPAI architecture

At its 43^rd General Assembly of 2024 April17, MPAI approved the publication of the draft AI Module Profiles (MPAI-PRF) standard with a request for Community Comments. The scope of MPAI-PRF is to provide a solution to the problem that MPAI finds more and more often: the AI Modules (AIM) it specifies in different standards have the same basic functionality but may have different features.

First two words about AIMs. MPAI develops application-oriented standards for applications that MPAI calls AI Workflows (AIW) that can be broken down into components called Ai Modules. AIWs are specified by what they do (functions), by the input and output data and by how its AIMs are interconnected (Topology). Similarly, AIMs are specified by what they do (functions) and by the input and output data. AIMs are Composite if they include interconnected AIMs or Basic if its internal structure is unknown.

Let’s look at the Natural Language Understanding (MMC-NLU) AIM of Figure 1.

Figure 1 – The Natural Language Understanding (MMC-NLU) AIM

The NLU AIM’s basic function is to receive a Text Object – directly from a keyboard or through an Automatic Speech Recognition (ASR) AIM (in which case it is called Recognised Text) and produce a Text Object that can be Refined Text in case it is the output of an ASR AIM and the Meaning of the text. The NLU AIM, however, can also receive “spatial information” about the Audio and/or Visual Objects in terms of their position and orientation in the Scene that the machine is processing. Obviously, this additional information helps the machine produce a response that is more attuned to the context.

This a case shows that there is a need to unambiguously name these two functionally equivalent but very different instances of the same NLU AIMs.

The notion of Profiles, originally developed by MPEG in the summer of 1992 for the MPEG-2 standard and then universally adopted in the media domain comes in handy. An AIM Profile is a label that uniquely identifies the set of AIM Attributes of an AIM instance where Attribute is “input data, output data, or functionality that uniquely characterises an AIM instance”. In the case of the NLU AIM, Text Object (TXO), Recognised Text (TXR), Object Instance Identifier (OII), Audio-Visual Scene Geometry (AVG), and Meaning (TXD or Text Descriptors).

The Draft AI Module Profiles (MPAI-PRF) Standard offers two ways to signal the Attributes of an AIM: those that are supported or those that are not supported. Both can be used, but likely the first (list of those that are supported) if it is shorter than the second (list of those that are not supported) and vice versa. The Profile of an NLU AIM instance that does not handle spatial information can thus be labelled in two ways:

List of supported Attributes	MMC-NLU-V2.1(ALL-AVG-OII)
List of unsupported Attributes	MMC-NLU-V2.1(NUL+TXO+TXR)

V2.1 refers to the version of the Multimodal Conversation MPAI-MMC standards that specifies the NLU AIM. ALL signals that the Profile is expressed in “negative logic” in the sense that the removed Attributes are AVG for Audio-Visual Scene Geometry and OII. NUL signals that the Profile is expressed in “positive logic” in the sense that the added Attributes are TXO for Text Object from a keyboard and TXR for Recognised Text.

The Profile story does not end here. Attributes are not always sufficient to identify the capabilities of an AIM instance. Let’s take the Entity Dialogue Processing (MMC-EDP) of Figure 2 an AIM that uses different information sources derived from the information issued by an Entity, typically a human – but potentially also a machine – with which this machine is communicating.

Figure 2 – The Entity Dialogue Processing (MPAI-EDP)

The input data is Text Object and Meaning (output of the NLU), Audio or Visual Instance ID and Scene Geometry (already used by the NLU AIM) and Personal Status, a data type that represents the internal state of the Entity in terms of three Factors (Cognitive State, Emotion, and Social Attitude) and four Modalities (Text, Speech, Face, and Body) for each Factor.

The output of the EDP AIM is Text that can be fed to a regular Text-To-Speech AIM, but can additionally be the machine’s Personal Status, obviously pretended by the machine, but of great value for the Personal Status Display (PAF-PSD) AIM depicted in Figure 3.

Figure 3 – The Personal Status Display AIM

This uses the machine’s Text and Personal Status (IPS) to synthesise the machine using an Avatar Model (AVM) as a speaking avatar. An AIM instance of the PSD AIM may support the Personal Status, but only its Speech (PS-Speech, PSS) and Face (PS-Face, PSF) Factors, as in the case of a PSD AIM designed for sign language. This is formally represented by the following two expressions:

List of supported Attributes	PAF-PSD-V1.1(ALL@IPS#PSS#PSF)
List of unsupported Attributes	PAF-PSD-V1.1(NUL+TXO+AVM@IPS#PSF#PSG

@IPS#PSS#PSF in the first expression indicates that the PSD AIM supports all Attributes, but the Personal Status only includes the Speech and Face Factors. In the second expression +TXO+AVM indicates that the PSD AIM supports Text and Avatar Model and @IPS#PSF#PSG that the Personal Status Factors supported are Face (PSF) and Gesture (PSG).

AI Module Profiles is another element of the AI application infrastructure that MPAI is building with its standards. Read the AI Module Profiles standard for an in-depth understanding. Anybody can submit comments to the draft by sending an email to the MPAI secretariat by 2024/05/08T23:59. MPAI will consider each comment received for possible inclusion in the final version of MPAI-PRF.

No Comments InAll posts

Cookie	Duration	Description
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Technical".
CookieLawInfoConsent	1 year	The cookie is set by the GDPR Cookie Consent plug-in and is used to store whether the user has consented to the use of cookies or not. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_pk_id.6.08a8	13 months	Used to store a few details about the user such as the unique visitor ID
_pk_ses.6.08a8	30 minutes	Short lived cookies used to temporarily store data for the visit

Category All posts

Notice