Multimodal Question Answering (MMC-MQA):
- Selector – communicating use of Text or Speech.
- Input Text – replacing Speech, where appropriate.
- Input Visual – the object for which a question is asked
- Input Speech – the question asked.
- Produces Text or Speech conveying the answer.
3 Reference Module
Figure 1 depicts the Reference Module of the Multimodal Question Answering AIW.
Figure 1 – The Multimodal Question Answering AIW
4 I/O Data
Table 1 specifies the Input and Output Data of the Multimodal Question Answering AIW.
Table 1 – I/O Data of Multimodal Question Answering
|Text typed by the human as a replacement for Input Speech.
|Data determining the use of Speech or Text.
|Video of the human showing an object held in hand.
|Speech of the human asking a question the Machine.
|The Text generated by Machine in response to human input.
|The Speech generated by Machine in response to human input.
|Visual Scene Description
|Visual Object Identification
|Automatic Speech Recognition
|Natural Language Understanding
|Question Analysis Module
|Answer to Question Module Answering