Highlights

Online presentations of new and revised Technical Specifications

Stay up to date with the developments in MPAI by attending the online presentations of the latest standards and versions!

NameAcronymVersionRegistration links
Context-based Audio EnhancementMPAI-CAEV2.2https://tinyurl.com/2wj8e4bn
Data Types, Formats, and AttributesMPAI-TAFV1.0https://tinyurl.com/3p8j74st

MPAI approves Data Types, Formats, and Attributes (MPAI-TAF) V1.0

A key element of the MPAI approach to AI-based Data Coding standards is Technical Specification: AI Framework (MPAI-AIF) V2.0 based on the notion of a framework – called the AI Framework (AIF) –running applications – called AI Workflows (AIW) – where interconnected components – called AI Modules (AIM) – communicate Data obtained by applying specific functions to other AIMs in the AIW.

Figure 1 – The AI Framework (MPAI-AIF)

AIMs can improve the effectiveness of the functions they perform if they know more about the capabilities of the AIMs they are connected to and the data they receive. The Natural Language Processing example highlights some issues related to AIM capability. An instance of the MPAI Natural Language Processing (MMC-NLU) AIM should produce refined text and Meaning from the text possibly received from an Automatic Speech Recognition (MMC-ASR) AIM with at least three levels of sophistication: 1) it may only be able to process the recognised text and Meaning using that text; or 2) it can process the information identifying an object referenced by the text being analysed; or 3) it can also utilise information about the position of the object in the relevant space. The accuracy of the produced refined text and Meaning is expected to improve when moving from the first to the third case.

Technical Specification: AI Module Profiles (MPAI-PRF) V1.0 enables an AIM instance to signal its Attributes – such as input data, output data, and functionality – and Sub-Attributes – such as languages supported by a Text and Speech Translation AIM – that uniquely characterise the AIM. Currently, MPAI-PRF defines the Attributes of eight AIMs but Profiles for more AIMs are likely to be defined in the future.

The effectiveness of the functions performed by an AIM can also be enabled or enhanced if the AIM knows more about the characteristics of the Data received. Examples of characteristics can include the CIE 1931 colour space of an instance of the Visual Data Type; the MP3 digital representation of an instance of a Speech Data Type; the WAV file format of an instance of an Audio Data Type; the gamma correction applied to the device that produced an instance of a Visual Data Type; the Instance ID of an object in an instance of Audio Data Type; or the Text conveyed by an instance of a Speech  Data Type.

Technical Specification: Data Types, Formats, and Attributes (MPAI-TFA) V1.0 specifies a new Data Type called Qualifier, a container that can be used to represent, for instance, that a Visual Data Type instance uses a given colour space (Sub-Type), is digitally represented as an AVC video (Format) and is described by Dublin Core Metadata (Attribute). The current versions of the MPAI Technical Specifications generally assume that most of the Media Objects exchanged by AIMs are composed of “Content” and “Qualifiers”.

Therefore, Qualifiers are a specialised type of metadata intended to support the operation of AIMs receiving data from other AIMs and conveying information on Sub-Types, Formats, and Attributes related to the Content. Qualifiers are intended to convey information for use by an AIM. Although they are human-readable, they are intended only to be used by AIMs.

Future versions of MPAI-TFA will likely be published because of the large variety of application needs that will require the specification of Qualifiers for additional Data Types. MPAI-TFA users are invited to communicate their need for extension of existing and specification of additional Data Types in MPAI-TFA to the MPAI Secretariat. Therefore, versioning of Qualifiers is a critical component of MPAI-TFA.

The specification of Types, Formats, and Attributes can be found here.

Here is an example of Speech Qualifier extracted from the Audio-Visual Scene Descriptors representing the analysis of a series of video frames and accompanying speech (see the figure here).

 

“SpeechObjectQualifier”: {

    “Header”: “TFA-SPQ-V1.0”,

    “Format”: {

        “Transport”: “WAV”

    },

    “Attributes”: {

        “Source”: “Real”,

        “Metadata”: {

            “Language”: “eng”,

            “SpeakerIdentity”: {

                “InstanceLabel”: “Leonardo”,

                “LabelConfidenceLevel”: 0.86

            },

            “ContentDescription”: {

                “TextObject”: {

                  “TextObjectPayload”: ” I don’t think Lance gave it to me.”,

                    “TextObjectQualifier”: {

                        “Format”: {

                            “Static”: “UTF-8”

                        },

                        “Attributes”: {

                            “LanguageFormats”: “ISO636-3”

                        }

                    }

                }

            }

        }

    }

}

MPAI approves a new version of MPAI-CAE Technical Specification

MPAI Technical Specifications are a combination of statics and dynamics. They are static because, once approved, a Technical Specification with a given name and a given version remains unchanged forever. It is also dynamic because MPAI produces new versions of a Technical Specification (also of Reference Software, Conformance Testing, and Performance Assessment).

The new versions of Context-based Audio Enhancement (MPAI-CAE) V2.2 was initiated several months ago. When its was considered mature, it was posted requesting comment from the MPAI Community and was considered ready for publication by the 47th General Assembly (MPAI-47).

The Context-based Audio Enhancement project seeks to improve the user experience for different audio-related applications, such as entertainment, restoration, and communication in a variety of different contexts such as in the home, in the office, and in the studio. MPAI-CAE V2.2 refines two uses cases (Emotion Enhanced Speech and Speech Restoration System), confirms two use cases (Audio Recording Preservation and Enhanced Audioconference Experience), provides a new specification of the Audio Object and Audio Scene Descriptors aligned with the Object and Scene Descriptors for Speech, Visual, and Audio-Visual.

Figure 2 shows the reference model of Audio Scene Description AI Module.

Figure 2 – Audio Scene Descriptors V2.2 Reference Model

Here we can see two innovations: the first is that an Audio Scene does not only included Audio Objects but also Audio Scenes. The second is that the data flowing through the workflow are mostly Audio Objects (in addition to Spatial Attitudes and Audio Scene Geometry). Appropriate Qualifiers (specified in the new MPAI-TFA standard) describe what the Audio Object specifically is. A third innovation (not visible in the Figure) is that the Audio Scene Geometry is strictly a subset of the Audio Scene Descriptors.

 

A reminder of Call for Technologies: Audio Six Degrees of Freedom (MPAI-6DF)

MPAI has published a Call for Audio Six Degrees of Freedom Technologies with a deadline for responses set to 19 September 2024, just before the next edition of this Newsletter.

The Call for Technologies invites any party able and wishing to contribute to the development of the planned CAE-6DF Technical Specification (TS) to submit a response. If they own technologies relevant to this Call, they are required to eventually license their technologies according to the Framework Licence if their technologies are selected by MPAI for possible modification and inclusion in the planned CAE-6DF TS.

Any respondent who is not an MPAI member and wishes to participate in the development of CAE-6DF TS shall join MPAI. If they own accepted technologies and do not join MPAI, they lose the opportunity to have their technologies included in the planned CAE-6DF TS.

Respondents are required to deliver their submissions to the MPAI secretariat by 2024/08/19 T23:59 UTC. The secretariat will acknowledge receipt of the submission via email.

Meetings in the coming August-September meeting cycle

(*) Public Presentation

This newsletter serves the purpose of keeping the expanding diverse MPAI community connected.

We are keen to hear from you, so don’t hesitate to give us your feedback.