CAE CAV HMC MMC OSD PAF
AI Workflows AI Modules

1       AI Workflows

1.1 Television Media Analysis

OSD-TMA is a PAAI including a variety of collaborating PAAIs:

TV Splitting Separates the audio and visual (video) components of a multiplexed audio-visual (television) stream also using Text related to the contents of the audio-visual file.
Visual Change Detection Separates the audio-visual file in files that represents different scenes.
Audio Segmentation Separates the audio file in files where the Speaker IDs are different.
Face ID Recognition Extracts human participants in a scene in bounding boxes using the input Auxiliary Text.
Speaker ID Recognition Provides the ID of a Speaker.
Automatic Speech Recognition Converts speech into text facilitated by Auxiliary Text.
Audio-Visual Alignment Combines a face with a speaker.
Audio-Visual Scene Description Combines all elements generated upstream into AV Scene Descriptors.
Audio-Visual Event Description Stacks all the Scene Descriptors for the duration of the analysis.

Figure 1 – Reference Model of OSD-TMA

The following links analyse the AI Modules:

OSD-TMA performs Descriptors-Interpretation Level Operations.

2       AI Modules

2.1      Audio-Visual Alignment

OSD-AVA

Receives Speech Scene Descriptors of a present Speech Scene.
Audio Scene Descriptors of a present Audio Scene.
Visual Scene Descriptors of a present Visual Scene.
Identifies Speech-Audio-Visual Objects that share the same Position.
Produces Audio-Visual Scene Descriptors whose Speech, Audio, and Visual Objects at the same Position have compatible Identifiers.

OSD-AVA performs Descriptors Level Operations. More complex situations may require Reasoning Level Operations.

2.2   Visual Object Identification

OSD-VOI

Receives Visual Scene Geometry The arrangement of the Visual Objects in the Scene.
Visual Objects The Visual Objects in the Scene.
Body Descriptors The Descriptors of the Body indicating the object.
Produces Visual Instance ID The ID of the Visual Object in the Scene crossed by he line of the Point of View of the Body.

OSD-VOI is a PAAI including three collaborating PAAIs:

  1. OSD-VDI
    1. Analyses the Body.
    2. Finds the index fingers that points to an object.
    3. Assigns the direction of the finger to the Point of View.
  2. OSD-VOE follows the line until it crosses a Visual Object.
  3. OSD-VII recognises the identity of the Visual Object.

Figure 2 – The Visual Object Identification (OSD-VOI) Composite AIM

OSD-VOI performs Descriptors (OSD-VDI and OSD-VOE) and Interpretation Operations.

2.3   Visual Scene Description

Visual Scene Description (OSD-VSD) is a PAAI produces the Descriptors of a Scene composed by Visual Objects and Visual Scenes:

Receives Space-Time of the input Visual Objects having the same time base.
Visual Objects The Visual Object(s) to be converted to Scene Descriptors
Scene Descriptors Scene Descriptors possibly of different media co-located with the Visual Objects
Merges Visual Scene Descriptors Into a single set of Visual Scene Descriptors.
Produces Visual Scene Descriptors Output#1 of AIM
Alert Output#2 of AIM signalling potential anomalies in Object.

OSD-VSD

  1. Receives the Visual Objects with their Space-Time information. As exemplified by CAV-ESS, the OSD-VSD may rely on a collaborative process with other AIMs producing Scene Descriptors provided by other EST-specific Scene Descriptions (not necessarily visual). In turn, they are assisted by OSD-VSD-produced Visual Scene Descriptions.
  2. Analyses the individual Visual Objects and may
    1. Discover sudden changes in the Visual Scene (e.g., the sudden appearance of a previously occluded Traffic Signal or traffic agent).
    2. Extract portions and add Annotations to it, such as the text of a Traffic Sign, the paddle of a traffic agent, etc.

OSD-VSD performs Descriptors Level Operations.