1      Version

V2.1

2     Functions

Natural Language Understanding (MMC-NLU):

  1. Receives
    1. Input Text
    2. Recognised Text
    3. Audio Instance ID
    4. Audio Scene Geometry
    5. Visual Instance ID
    6. Visual Scene Geometry.
  2. Performs the following:
    1. Refines Recognised Text
    2. Extracts Meaning from Input Text or Recognised Text considering
      1. The spatial position of the selected Audio Instance and Visual Instance
      2. The semantics of the two Instances obtained from Audio Instance ID and Visual Instance ID.
  3. Produces
    1. Refined Text
    2. Text Descriptors (Meaning).

3      Reference Architecture

Figure 1 depicts the Reference Architecture of the Natural Language Understanding AIM.

Figure 1 – The Natural Language Understanding AIM

Note that Output Data in italic are passed directly from the homonymous Input Data.

4      I/O Data

Table 1 specifies the Input and Output Data of the Natural Language Understanding AIM.

Table 1 – I/O Data of the Natural Language Understanding AIM

Input Description
Text Object ID of the Entity emitting an Audio-Visual Scene or a Communication Item.
Recognised Text Input from the Automatic Speech Recognition AIM
Audio Scene Geometry The digital representation of the spatial arrangement of the Audio Objects of the Scene.
Audio Instance Identifier The Identifier of the specific Audio Object belonging to a level in the taxonomy.
Visual Scene Geometry The digital representation of the spatial arrangement of the Visual Objects of the Scene.
Visual Instance Identifier The Identifier of the specific Visual Object belonging to a level in the taxonomy.
Output Description
Meaning Descriptors of the Refined Text.
Refined Text The refined version of the Recognised Text.

5.     SubAIMs

No SubAIMs.

6     JSON Metadata

https://schemas.mpai.community/CAE/V2.1/AIMs/AudioObjectIdentification.json