1 Version
V2.1
2 Functions
Natural Language Understanding (MMC-NLU):
- Receives
- Input Text
- Recognised Text
- Audio Instance ID
- Audio Scene Geometry
- Visual Instance ID
- Visual Scene Geometry.
- Performs the following:
- Refines Recognised Text
- Extracts Meaning from Input Text or Recognised Text considering
- The spatial position of the selected Audio Instance and Visual Instance
- The semantics of the two Instances obtained from Audio Instance ID and Visual Instance ID.
- Produces
- Refined Text
- Text Descriptors (Meaning).
3 Reference Architecture
Figure 1 depicts the Reference Architecture of the Natural Language Understanding AIM.
Figure 1 – The Natural Language Understanding AIM
Note that Output Data in italic are passed directly from the homonymous Input Data.
4 I/O Data
Table 1 specifies the Input and Output Data of the Natural Language Understanding AIM.
Table 1 – I/O Data of the Natural Language Understanding AIM
Input | Description |
Text Object | ID of the Entity emitting an Audio-Visual Scene or a Communication Item. |
Recognised Text | Input from the Automatic Speech Recognition AIM |
Audio Scene Geometry | The digital representation of the spatial arrangement of the Audio Objects of the Scene. |
Audio Instance Identifier | The Identifier of the specific Audio Object belonging to a level in the taxonomy. |
Visual Scene Geometry | The digital representation of the spatial arrangement of the Visual Objects of the Scene. |
Visual Instance Identifier | The Identifier of the specific Visual Object belonging to a level in the taxonomy. |
Output | Description |
Meaning | Descriptors of the Refined Text. |
Refined Text | The refined version of the Recognised Text. |
5. SubAIMs
No SubAIMs.
6 JSON Metadata
https://schemas.mpai.community/CAE/V2.1/AIMs/AudioObjectIdentification.json