1 Introduction
The CAV-TEC standard assumes that a land-based Connected Autonomous Vehicle (CAV) is composed of the following elements:
- Connected Autonomous Operation (CAO), a CAV component that:
- Receives a request from an authorised human or process to start from a specified Position and Orientation (called Point of View) and reach a specified destination point of view.
- Uses onboard sensors to capture data from the environment, which is populated by humans carrying devices, other CAVs, and objects such as vehicles, roadside units, and traffic lights. These entities may be “CAV-aware”, i.e., capable of transmitting information understood by the CAV.
- Exchanges relevant data with peer CAOs onboard other CAVs within range.
- Issues control instructions.
- Complies with applicable traffic laws, properly represented in digital form.
- Three subsystems:
- Motors, to increase and decrease the vehicle’s speed.
- Steering wheels, to change the vehicle’s direction of motion.
- Brakes, to substantially reduce the speed of the vehicle.
- The rest of the vehicle.
A Connected Autonomous Operation instance is implemented as an AI Module executed in an AI Framework instance, according to the AI Framework (MPAI-AIF) standard. After trust has been established as specified by the Process Instance Trust Framework (MPAI-PTF) standard, the AI Modules can interact by exchanging data enriched with Data Exchange Metadata.
Figure 1 depicts the interaction of CAV subsystems with infrastructure, other vehicles, and the environment.

Figure 1 – CAV subsystems, infrastructure/CAVs, and environment
The basic steps of a CAO workflow are:
- A human or service interacts with the Human–CAV Interaction (HCI) subsystem to activate and communicate with the Autonomous Motion Subsystem (AMS).
- The AMS acts as the central intelligence, coordinating perception, decision-making, and control, and activating the Environment Sensing Subsystem (ESS).
- The ESS acquires sensory data from the environment, receives spatial information from the AMS, consolidates it, and provides a raw scene description to the AMS.
- The AMS exchanges information with infrastructure and other CAVs, and generates motion commands directed to the Motion Actuation Subsystem (MAS).
- The MAS performs low-level control and manages execution through actuators such as brakes, motors, and steering, which affect the environment and return status information to the MAS and upstream subsystems.
The Motion Actuation Subsystem (MAS) issues commands to and receives responses from brakes, motors, and steering wheels. These components operate as processes rather than AI Modules (AIMs).
Figure 2 depicts the four Subsystems composing a CAO. Each Subsystem is implemented as a Composite AIM conforming with Technical Specification AI Framework (MPAI-AIF) V3.0 and Process Instance Trust Framework (MPAI-PTF) V1.0.

Figure 2 – The Reference Model of a CAV
AI Modules (AIMs) may be located in different subsystems, provided that the interfaces specified by CAV-TEC are preserved.
A human approaching a CAV can request, via the Human-CAV Interaction Subsystem (HCI), to be taken to a specified Point of View using a combination of audio, visual, and LiDAR signals. A remote process can issue a similar request to the CAV.
In both cases, the request is passed to the Autonomous Motion Subsystem (AMS), which queries the Environment Sensing Subsystem (ESS) to obtain the current Point of View of the CAV. Using this information from the ESS, together with the destination Point of View and access to offline maps, the AMS can propose one or more routes, from which the human or process can select.
With the human aboard, the AMS continuously receives environmental information from the ESS – possibly complemented by data received from other CAVs within range – and instructs the Motion Actuation Subsystem to move the vehicle accordingly.
2 Human-CAV Interaction
The Human–CAV Interaction Subsystem (HCI) has been designed to enable the following operations:
- Use visual and speech information to recognise a human addressing the CAV as a legitimate user of the vehicle.
- Recognise the request made by the human to be taken to a specific Point of View and establish a dialogue with the Autonomous Motion Subsystem (AMS) to respond to and actuate the request.
- Recognise the identities of multiple humans, if present.
- Recognise objects indicated by humans.
- Conduct a dialogue with human passengers, including the ability to understand their emotional state and to present itself as an avatar with a congruent emotional expression.
- Respond to human requests to see or hear what the AMS perceives outside the CAV by requesting and rendering the information received from the AMS.
Figure 3 depicts all the connected components (AI Modules) required to satisfy the design objectives.

Figure 3 – Reference Model of CAV-HCI
The Audio-Visual Scene Description (AVS) monitors the environment and produces Audio–Visual Scene Descriptors. It extracts Speech Scene Descriptors from the scene and, from these, derives Speech Objects corresponding to any speaking human. Visual Scene Descriptors may also be extracted in the form of face and body descriptors of all humans involved.
The CAV activates Automatic Speech Recognition (ASR) to recognise the speech of each human and convert it into text. Natural Language Understanding (NLU) processes this text and extracts the meaning of each input speech.
Speaker Identity Recognition (SIR) and Face Identity Recognition (FIR) enable the CAV to reliably determine the identities of the humans with whom the HCI interacts. If the face identifiers provided by FIR correspond to those provided by SIR, the CAV may proceed to handle further requests.
The CAV interacts with humans through Entity Dialogue Processing (EDP). When a human requests to be taken to a destination, the EDP interprets the request and communicates it to the Autonomous Motion Subsystem (AMS). A dialogue may then ensue, in which the AMS offers alternative options to satisfy user preferences (e.g., a longer but more comfortable route or a shorter but less predictable one).
While the CAV moves towards the destination, the HCI may converse with the passengers, present the Full Environment Descriptors generated by the AMS, and provide information about the CAV from the Ego AMS or, more generally, from the HCIs of remote CAVs.
3 Environment Sensing Subsystem
The Environment Sensing Subsystem (ESS) receives data captured by a variety of Environment Sensing Technologies (EST). CAV-TEC assumes that these sensors may operate in the visual or audio range, at microwave frequencies (RADAR), and at near-infrared frequencies. Ultrasound sensors may also be used. Digital maps—called offline maps—can be accessed as well, although they may not represent the current state of the environment. Figure 4 depicts the sensors listed above.

Figure 4 – Reference Model of CAV-ESS
When the CAV is activated in response to a request from a human or a process, Spatial Attitude Generation continuously computes the CAV’s spatial attitude based on the initial Point of View provided by the Motion Actuation Subsystem and on information from Global Navigation Satellite Systems (GNSS), if available.
An EST-specific Scene Description AI Module produces EST-specific Scene Descriptors, which are integrated into the Basic Environment Descriptors (BED) by the Basic Environment Description AI Module. This integration uses all available sensing technologies, weather data, road state information, and Full Environment Descriptors from previous instants.
Although Figure 5 represents each sensing technology as being processed by an individual Environment Sensing Technology (EST), an implementation may combine two or more Scene Description AI Modules to handle multiple ESTs, provided that the relevant interfaces are preserved. An EST-specific Scene Description AI Module may also produce alerts that are immediately communicated to the Autonomous Motion Subsystem (AMS).
The objects included in the Basic Environment Descriptors may carry Annotations specifically related to traffic signaling, such as: the point of view of traffic signals in the environment; traffic policemen; road signs (e.g., lane markings, turn indications, one-way signs, stop signs, and words painted on the road); vertical traffic signs (e.g., overhead signs, signs on objects, and poles with signs); traffic lights; walkways; and traffic sounds (e.g., sirens, whistles, and horns).
4 Autonomous Motion Subsystem
Figure 5 depicts the reference model of the Autonomous Motion Subsystem (AMS).

Figure 5 – Reference Model of CAV-AMS
When the Human–CAV Interaction Subsystem (HCI) sends the Autonomous Motion Subsystem (AMS) a request from a human or a process to move the CAV to a destination Point of View, Route Selection Planning uses the Basic Environment Descriptors provided by the Environment Sensing Subsystem (ESS) to generate a set of waypoints starting from the current Point of View.
When the CAV is in motion, Route Selection Planning triggers Path Selection Planning to generate a sequence of Points of View to reach the next waypoint. The Full Environment Description AI Module may request the AMSs of remote CAVs to provide subsets of their scene descriptors and integrates all environment descriptor sources into the Full Environment Descriptors (FED). It may also respond to similar requests from remote CAVs.
Motion Selection Planning generates a trajectory to reach the next Point of View along each path. Traffic Obstacle Avoidance receives the trajectory and checks whether any alert has been received that could lead to a collision with the current trajectory. If such a condition is detected, Traffic Obstacle Avoidance requests Motion Selection Planning to generate a new trajectory. Otherwise, Traffic Obstacle Avoidance issues an AMS–MAS message to the Motion Actuation Subsystem (MAS).
The MAS sends an AMS-MAS message containing information about the execution of the received command. Based on these messages, the AMS may discontinue the execution of a previous command, issue a new AMS–MAS message, and inform Traffic Obstacle Avoidance. The decisions made by each element in this chain may be recorded in the AMS memory (“black box”).
5 Motion Actuation Subsystem
When the AMS Message Interpretation AIM receives an AMS‑MAS Message from the AMS, it interprets the message, partitions it into commands, and sends commands to the Brake, Motor, and Wheel subsystems as depicted in Figure 6.

Figure 6 – Reference Model of CAV-MAS
CAV-TEC does not specify how the three mechanical subsystems process commands. However, it defines the format of the responses issued to and received by the AMS Message Interpretation AI Module. The result of this interpretation is sent to the Autonomous Motion Subsystem (AMS) as an AMS-MAS message.
The Motion Actuation Subsystem includes additional AI Modules:
- Inertial Sensing, which includes devices such as odometers, speedometers, accelerometers, and inclinometers, and produces spatial data.
- Spatial Attitude Generation, which computes the initial spatial attitude of the ego CAV using the spatial data provided by Inertial Sensing. This initial spatial attitude is sent to the Environment Sensing Subsystem (ESS).
- Weather Sensing, which includes devices such as thermometers, hygrometers, anemometers, and others, and produces weather data.
- Ice Condition Analysis, which augments the weather data by analysing Brakes, Motors, and Wheels responses, and sends the augmented weather data to the ESS.