MPAI-WMG V1.0 Object and Scene Description V1.0

1 AI Workflows

OSD-TMA is a PAAI including a variety of collaborating PAAIs:

TV Splitting	Separates the audio and visual (video) components of a multiplexed audio-visual (television) stream also using Text related to the contents of the audio-visual file.
Visual Change Detection	Separates the audio-visual file in files that represents different scenes.
Audio Segmentation	Separates the audio file in files where the Speaker IDs are different.
Face ID Recognition	Extracts human participants in a scene in bounding boxes using the input Auxiliary Text.
Speaker ID Recognition	Provides the ID of a Speaker.
Automatic Speech Recognition	Converts speech into text facilitated by Auxiliary Text.
Audio-Visual Alignment	Combines a face with a speaker.
Audio-Visual Scene Description	Combines all elements generated upstream into AV Scene Descriptors.
Audio-Visual Event Description	Stacks all the Scene Descriptors for the duration of the analysis.

Figure 1 – Reference Model of OSD-TMA

The following links analyse the AI Modules:

OSD-TMA performs Descriptors-Interpretation Level Operations.

OSD-AVA

Receives	Speech Scene Descriptors	of a present Speech Scene.
	Audio Scene Descriptors	of a present Audio Scene.
	Visual Scene Descriptors	of a present Visual Scene.
Identifies	Speech-Audio-Visual Objects	that share the same Position.
Produces	Audio-Visual Scene Descriptors	whose Speech, Audio, and Visual Objects at the same Position have compatible Identifiers.

OSD-AVA performs Descriptors Level Operations. More complex situations may require Reasoning Level Operations.

OSD-VOI

Receives	Visual Scene Geometry	The arrangement of the Visual Objects in the Scene.
	Visual Objects	The Visual Objects in the Scene.
	Body Descriptors	The Descriptors of the Body indicating the object.
Produces	Visual Instance ID	The ID of the Visual Object in the Scene crossed by he line of the Point of View of the Body.

OSD-VOI is a PAAI including three collaborating PAAIs:

OSD-VDI
1. Analyses the Body.
2. Finds the index fingers that points to an object.
3. Assigns the direction of the finger to the Point of View.
OSD-VOE follows the line until it crosses a Visual Object.
OSD-VII recognises the identity of the Visual Object.

Figure 2 – The Visual Object Identification (OSD-VOI) Composite AIM

OSD-VOI performs Descriptors (OSD-VDI and OSD-VOE) and Interpretation Operations.

Visual Scene Description (OSD-VSD) is a PAAI produces the Descriptors of a Scene composed by Visual Objects and Visual Scenes:

Receives	Space-Time	of the input Visual Objects having the same time base.
	Visual Objects	The Visual Object(s) to be converted to Scene Descriptors
	Scene Descriptors	Scene Descriptors possibly of different media co-located with the Visual Objects
Merges	Visual Scene Descriptors	Into a single set of Visual Scene Descriptors.
Produces	Visual Scene Descriptors	Output#1 of AIM
	Alert	Output#2 of AIM signalling potential anomalies in Object.

OSD-VSD

Receives the Visual Objects with their Space-Time information. As exemplified by CAV-ESS, the OSD-VSD may rely on a collaborative process with other AIMs producing Scene Descriptors provided by other EST-specific Scene Descriptions (not necessarily visual). In turn, they are assisted by OSD-VSD-produced Visual Scene Descriptions.
Analyses the individual Visual Objects and may
1. Discover sudden changes in the Visual Scene (e.g., the sudden appearance of a previously occluded Traffic Signal or traffic agent).
2. Extract portions and add Annotations to it, such as the text of a Traffic Sign, the paddle of a traffic agent, etc.

OSD-VSD performs Descriptors Level Operations.

Cookie	Duration	Description
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Technical".
CookieLawInfoConsent	1 year	The cookie is set by the GDPR Cookie Consent plug-in and is used to store whether the user has consented to the use of cookies or not. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_pk_id.6.08a8	13 months	Used to store a few details about the user such as the unique visitor ID
_pk_ses.6.08a8	30 minutes	Short lived cookies used to temporarily store data for the visit