Audio-Visual Scene Description (OSD-AVS)

V1.0

The Audio-Visual Scene Description Composite AIM (OSD-AVD):

Receives the Audio-Visual Scene composed of:
- Text
- Audio Objects that are Speech Objects or generic Audio Objects whose source is a assumed to be a point.
- Visual Objects that are either Entities or generic Object.
Produces the Audio-Visual Scene Descriptors.

Figure 10 depicts the Reference Architecture.

Figure 10 – Audio-Visual Scene Description

Table 5 specifies the Input and Output Data of the Audio-Visual Description.

Table 5 – I/O Data of the Audio-Visual Description Composite AIM

Input	Description
Input Audio	The audio scene captured by Machine.
Input Visual	The visual scene captured by Machine.
Output	Description
Audio-Visual Scene Descriptors	The Descriptors of of all Audio, Visual, and Audio-Visual Objects.

Cookie	Duration	Description
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Technical".
CookieLawInfoConsent	1 year	The cookie is set by the GDPR Cookie Consent plug-in and is used to store whether the user has consented to the use of cookies or not. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_pk_id.6.08a8	13 months	Used to store a few details about the user such as the unique visitor ID
_pk_ses.6.08a8	30 minutes	Short lived cookies used to temporarily store data for the visit