Human-CAV Interaction operates based on the assumptionthat the CAV is impersonated by an avatar, selected/produced by the CAV owner, the users interacts with. The CAV avatar features are:

  1. Visible: head, face and torso.
  2. Audible: speech embedding the expression, e.g., emotion, that a human driver would display.

The CAV’s avatar is reactive to:

  1. Environment, e.g., it shows an altered face if a human driver has done what it considers an improper action.
  2. Humans, e.g., it shows an appropriate face to a human in the cabin who has made a joke gazing at them.

Other forms of interaction are:

  1. CAV authenticates humans interacting with it by speech and face.
  2. A human issues commands to a CAV, e.g.:
    1. Commands to Autonomous Motion Subsystem, e.g.: go to a Waypoint or display Full World Representation, etc.
    2. Other commands, e.g.: turn off air conditioning, turn on radio, call a person, open window or door, search for information etc.
  3. A human entertains a dialogue with a CAV, e.g.:
    1. CAV offers a selection of options to human (e.g., long but safe way, short but likely to have interruptions).
    2. Human requests information, e.g., time to destination, route conditions, weather at destination etc.
    3. Human entertains a casual conversation.
  4. A CAV monitors the passenger cabin, e.g.:
    1. Physical conditions, e.g., temperature level, media being played, sound level, noise level, anomalous noise, etc.
    2. Passenger data, e.g., number of passengers, ID, estimated age, destination of passengers.
    3. Passenger activity, e.g., level of passenger activity, level of passenger-generated sound, level of passenger movement, emotion on face of passengers.
    4. Passenger-to-passenger dialogue, two passengers shake hands, or passengers hold everyday conversation.


Reference architecture

Figure 4 represents the Human-CAV Interaction (HCI) Reference Model.

Figure 1 – Human-CAV Interaction Reference Model

The operation of the HCI is the following:

  1. A human approaches the CAV and identified as follows
    1. The speech of the human is separated from the environment audio.
    2. The human is identified by their speech.
    3. The human object is separated from the environment.
    4. The human is identified by their face.
  2. In the cabin:
    1. The locations and identities of the passengers are determined.
    2. Meaning and emotion are extracted from speech, face and gesture.
    3. Object information is extracted from video.
    4. Emotions are fused.
    5. Intention is derived.
    6. Expression (Speech) and Expression (Face) are produced to animate the CAV avatar with a realistic gazing.
    7. Human commands are issued and responses from Autonomous Motion Subsystem processed
    8. Full World Representation is presented to let humans get a complete view of the Environment.

Depending on the technology used (data processing or AI), the AIMs in Figure 1 may need to access external information, such as Knowledge Bases, to perform their functions.

Return to About MPAI-CAV