| CAE | CAV | HMC | MMC | OSD | PAF | 
| AI Workflows | AI Modules | 
1 AI Workflows
1.1 Television Media Analysis
OSD-TMA is a PAAI including a variety of collaborating PAAIs:
| TV Splitting | Separates the audio and visual (video) components of a multiplexed audio-visual (television) stream also using Text related to the contents of the audio-visual file. | 
| Visual Change Detection | Separates the audio-visual file in files that represents different scenes. | 
| Audio Segmentation | Separates the audio file in files where the Speaker IDs are different. | 
| Face ID Recognition | Extracts human participants in a scene in bounding boxes using the input Auxiliary Text. | 
| Speaker ID Recognition | Provides the ID of a Speaker. | 
| Automatic Speech Recognition | Converts speech into text facilitated by Auxiliary Text. | 
| Audio-Visual Alignment | Combines a face with a speaker. | 
| Audio-Visual Scene Description | Combines all elements generated upstream into AV Scene Descriptors. | 
| Audio-Visual Event Description | Stacks all the Scene Descriptors for the duration of the analysis. | 

Figure 1 – Reference Model of OSD-TMA
The following links analyse the AI Modules:
- Audio Segmentation
 - Audio-Visual Alignment
 - Audio-Visual Event Description
 - Audio-Visual Scene Description
 - Automatic Speech Recognition
 - Face Identity Recognition
 - Speaker Identity Recognition
 - Television Splitting
 - Visual Change Detection
 - Visual Object Identification
 - Visual Scene Description
 
OSD-TMA performs Descriptors-Interpretation Level Operations.
2 AI Modules
2.1 Audio-Visual Alignment
OSD-AVA
| Receives | Speech Scene Descriptors | of a present Speech Scene. | 
| Audio Scene Descriptors | of a present Audio Scene. | |
| Visual Scene Descriptors | of a present Visual Scene. | |
| Identifies | Speech-Audio-Visual Objects | that share the same Position. | 
| Produces | Audio-Visual Scene Descriptors | whose Speech, Audio, and Visual Objects at the same Position have compatible Identifiers. | 
OSD-AVA performs Descriptors Level Operations. More complex situations may require Reasoning Level Operations.
2.2 Visual Object Identification
OSD-VOI
| Receives | Visual Scene Geometry | The arrangement of the Visual Objects in the Scene. | 
| Visual Objects | The Visual Objects in the Scene. | |
| Body Descriptors | The Descriptors of the Body indicating the object. | |
| Produces | Visual Instance ID | The ID of the Visual Object in the Scene crossed by he line of the Point of View of the Body. | 
OSD-VOI is a PAAI including three collaborating PAAIs:
- OSD-VDI
- Analyses the Body.
 - Finds the index fingers that points to an object.
 - Assigns the direction of the finger to the Point of View.
 
 - OSD-VOE follows the line until it crosses a Visual Object.
 - OSD-VII recognises the identity of the Visual Object.
 

Figure 2 – The Visual Object Identification (OSD-VOI) Composite AIM
OSD-VOI performs Descriptors (OSD-VDI and OSD-VOE) and Interpretation Operations.
2.3 Visual Scene Description
Visual Scene Description (OSD-VSD) is a PAAI produces the Descriptors of a Scene composed by Visual Objects and Visual Scenes:
| Receives | Space-Time | of the input Visual Objects having the same time base. | 
| Visual Objects | The Visual Object(s) to be converted to Scene Descriptors | |
| Scene Descriptors | Scene Descriptors possibly of different media co-located with the Visual Objects | |
| Merges | Visual Scene Descriptors | Into a single set of Visual Scene Descriptors. | 
| Produces | Visual Scene Descriptors | Output#1 of AIM | 
| Alert | Output#2 of AIM signalling potential anomalies in Object. | 
OSD-VSD
- Receives the Visual Objects with their Space-Time information. As exemplified by CAV-ESS, the OSD-VSD may rely on a collaborative process with other AIMs producing Scene Descriptors provided by other EST-specific Scene Descriptions (not necessarily visual). In turn, they are assisted by OSD-VSD-produced Visual Scene Descriptions.
 - Analyses the individual Visual Objects and may
- Discover sudden changes in the Visual Scene (e.g., the sudden appearance of a previously occluded Traffic Signal or traffic agent).
 - Extract portions and add Annotations to it, such as the text of a Traffic Sign, the paddle of a traffic agent, etc.
 
 
OSD-VSD performs Descriptors Level Operations.