Human-CAV Interaction (HCI)

<–References Go to ToC Environment Sensing Subsystem (ESS)) –>

Index

4.1 Functions of Subsystem

4.2 Reference Architecture of Subsystem

4.3 Input/Output Data of Subsystem

4.4 Functions of the AI Modules

4.5 Input/Output Data of AI Modules

4.1 Functions of Subsystem

The Human-CAV Interaction (HCI) Subsystem performs the following high-level functions:

Authenticates humans e.g., for the purpose of letting them into the CAV.
Interprets and executes commands provided by humans, possibly after a dialogue, e.g., to go to a Waypoint, issue commands such as turn off air conditioning, open window, call a person, search for information, etc.
Displays Full Environment Representation to passengers via a viewer and allows passengers to control the display.
Interprets conversation utterances with the support of the extracted Personal Statuses of the humans, e.g., on the fastest way to reach a Waypoint because of an emergency, or during a casual conversation.
Displays itself as a Body and Face with a mouth uttering Speech showing a Personal Status comparable to the Personal Status that a human counterpart (e.g., driver, tour guide, interpreter) would display in similar circumstances.

The HCI operation is highly influenced by the notion of Personal Status, the set of internal characteristics of conversation humans and machines. See Annex 1 Section 1.

Reference Architecture of Subsystem

4.2 Reference Architecture of Subsystem

Figure 3 gives the Human-CAV Interaction (HCI) Reference Model supporting the case of a group of humans approaching the CAV from outside the CAV and sitting inside the CAV.

Figure 3 – Human-CAV Interaction Reference Model

The HCI operation is considered in two outdoor and indoor human-CAV interaction scenarios:

When a group of humans approaches the CAV from outside the CAV:
1. The Audio Scene Description AIM creates the Audio Scene Descriptions in the form of Audio (Speech) Objects corresponding to each speaking human in the Environment (close to the CAV).
2. The Visual Scene Description creates the Visual Scene Description and provides 1) the Face and Physical Objects and 2) the Body and Face Descriptors corresponding to each human in the Environment (close to the CAV).
3. The Speaker Recognition and Face Recognition AIMs authenticate the humans the HCI is interacting with.
4. The Speech Recognition AIM recognises the speech of each human.
5. The Language Understanding AIM produces the refined Text (Language Understanding) and extracts the Meaning.
6. The Personal Status Extraction AIM extracts the Personal Status of the humans from 1) Speech, 2) Face and Body Descriptors, 3) Text (Language Understanding) and 4) Meaning.
7. The Dialogue Processing AIM 1) validates the human Identities, 2) responds to human utterances, 3) displays the Face and Body of the HCI Personal Status, and 4) issues commands to the Autonomous Motion Subsystem.
When a group of humans sit inside the CAV:
1. The Audio Scene Description AIM creates the Audio Scene Descriptions in the form of Audio (Speech) Objects corresponding to each speaking human in the cabin.
2. The Visual Scene Description creates the Visual Scene Descriptors in the form of Face Descriptors corresponding to each human in the cabin.
3. The Speaker Recognition and Face Recognition AIMs identify the humans the HCI is interacting with.
4. The Speech Recognition AIM recognises the speech of each human.
5. The Language Understanding AIM extracts the Meaning and produces the refined Text (Language Understanding).
6. The Personal Status Extraction AIM extracts the Personal Status of the humans.
7. The Dialogue Processing AIM validates the human Identities, responds to human utterances, displays the HCI Personal Status, and issues commands to the Autonomous Motion Subsystem.

Notes related to the two scenarios:

HCI interacts with the humans sitting in the cabin in two ways:
1. By responding to commands/queries from one or more humans at the same time, e.g.:
  1. Commands to go to or park at a Waypoint, etc.
  2. Commands with an effect on the cabin, e.g., turn off air conditioning, turn on the radio, call a person, open window or door, search for information etc.
2. By conversing with and responding to questions from one or more humans at the same time about travel-related issues (in-depth domain-specific conversation), e.g.:
  1. Humans request information, e.g., time to destination, route conditions, weather at destination, etc.
  2. Humans ask questions about objects in the cabin or held by humans.

CAV offers alternatives to humans, e.g., long but safe way, short but likely to have interruptions.

By following the conversation on travel matters held by humans in the cabin if:
1. The passengers allow the HCI to follow the conversation, and
2. The processing is carried out inside the CAV and is held confidential.
While in the cabin, passengers can become aware of the external Environment by issuing Full Environment Representation (FER) Commands to navigate the Full Environment Representation.
When conversing with the humans in the cabin, the HCI displays itself as a speaking avatar via the Personal Status Display AI Module.

4.3 Input/Output Data of Subsystem

Table 3 gives the input/output data of the Human-CAV Interaction Subsystem.

Table 3 – I/O data of Human-CAV Interaction

Input data	From	Comment
Audio (ESS)	Environment Sensing Subsystem	User authentication User command User conversation
Audio	Cabin Passengers	User’s social life Commands/interaction with HCI
Video (ESS)	Environment Sensing Subsystem	Commands/interaction with HCI
Video	Cabin Passengers	User’s social life Commands/interaction with HCI
Full Environment Representation	Autonomous Motion Subsystem	Rendered by Full Environment Representation Viewers
Full Environment Representation Commands	Cabin Passengers	To control rendering of Full Environment Representation
Output data	To	Comments
Output Speech	Humans in Environment Cabin Passengers	HCI’s response to passengers
Output Face	Cabin Passengers	HCI’s face when conversing
Output Body	Cabin Passengers	HCI’s body when conversing
Output Text	Cabin Passengers	HCI’s response to passengers
Full Environment Representation Audio	Passenger Cabin	For passengers to hear external Environment
Full Environment Representation Video	Passenger Cabin	For passengers to view external Environment

1.4 Functions of the AI Modules

Table 4 gives the functions of all Environment Sensing Subsystem AIMs.

Table 4 – AI Modules of the Environment Sensing Subsystem

AIM	Function
Audio Scene Description	Produces the Audio Scene Descriptors using the Audio captured by the appropriate (indoor or outdoor) Microphone Array.
Visual Scene Description	Produces the Visual Scene Descriptors using the visual information captured by the appropriate (indoor or outdoor) visual sensors.
Speech Recognition	Converts speech into Text.
Physical Object Identification	Provides the ID of the class of objects of which the Physical Object is an Instance
Full Environment Representation Viewer	Converts the Full Environment Representation produced by the Autonomous Motion Subsystem into Audio-Visual Scene Descriptors that can be perceptibly rendered.
Language Understanding	Improves the Text from Speech Recognition by using context information (e.g., Instance ID of object).
Speaker Recognition	Provides Speaker ID from Speech.
Personal Status Extraction	Provides the Personal Status of human.
Face Recognition	Provides Face ID from Face.
Dialogue Processing	Provides: Text containing the response of the HCI to the human. Personal Status of HCI congruous with the Text produced by the HCI.
Personal Status Display	Produces Speech, and Machine Face and Body.

1.5 Input/Output Data of AI Modules

Table 5 gives the input/output data of the Human-CAV Interaction AIMs.

Table 5 – AI Modules of Human-CAV Interaction

AIM	Input	Output
Audio Scene Description	Environment Audio (outdoor) Environment Audio (indoor)	Speech Objects
Visual Scene Description	Environment Video (outdoor) Environment Video (indoor)	Face Objects Physical Objects Body Descriptors Face Descriptors
Speech Recognition	Speech Object	Recognised Text
Physical Object Identification	Physical Object Human Object	Object ID
Full Environment Representation Viewer	FER Commands	FER Audio FER Visual
Language Understanding	Recognised Text Personal Status Object ID	Meaning Personal Status Text (Language Understanding)
Speaker Recognition	Speech Descriptors	Speaker ID
Personal Status Extraction	Recognised Text Speech Object Face Object Human Object	Personal Status
Face Recognition	Face Object	Face ID
Dialogue Processing	Speaker ID Meaning Text (Language Understanding) Personal Status Face ID AMS-HCI Response	AMS-HCI Commands Output Text Output Personal Status
Personal Status Display	Machine Text Output Personal Status	Machine Avatar Machine Text Machine Speech

<–References Go to ToC Environment Sensing Subsystem (ESS)) –>

Cookie	Duration	Description
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Technical".
CookieLawInfoConsent	1 year	The cookie is set by the GDPR Cookie Consent plug-in and is used to store whether the user has consented to the use of cookies or not. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_pk_id.6.08a8	13 months	Used to store a few details about the user such as the unique visitor ID
_pk_ses.6.08a8	30 minutes	Short lived cookies used to temporarily store data for the visit

Human-CAV Interaction (HCI)

Index

4.1 Functions of Subsystem

4.2 Reference Architecture of Subsystem

4.3 Input/Output Data of Subsystem

1.4 Functions of the AI Modules

1.5 Input/Output Data of AI Modules

Notice