Audio

Definitions

This page contains definition for Audio-related Data Types.

Audio: A Data Type that

Represents analogue signals sampled at a frequency between 8-192 kHz with a number of bits/sample between 8 and 32.
Is rendered in the human-audible range (16 Hz – 20 kHz ).

Audio Block: a set of consecutive samples without time code.

Audio File: a wave file conforming to WAV RF64 file format.

Audio Segment: An Audio Block with Time Labels.

Emotionless Speech: An Audio File containing only speech in which music and other sounds are absent, and in which little or no identifiable emotion is perceptible by native listeners.

Enhanced Audio: Multichannel Audio whose samples are Enhanced Audio samples.

Enhanced Transform Audio: Transform Multichannel Audio whose samples are samples of Transform Enhanced Audio

Input Audio: Multichannel Audio as provided by a Microphone Array.

Microphone Array Audio: Interleaved Multichannel Audio whose channels are sampled at a minimum of 5.33 ms (i.e., 256 samples at 48 kHz) to a maximum of 85.33 ms (i.e., 4096 samples at 48 kHz) and each sample is in single or double precision float.

Model Utterance: An Audio Segment used as a model or demonstration of the Emotion to be added to Emotionless Speech in order to produce Speech with Emotion (Emotion Enhanced Speech Use Case).

Multichannel Audio: a Data Type whose structure contains between 4 and 256 time-aligned interleaved Audio Channels organised in blocks.

Multichannel Audio Stream: Interleaved Multichannel Audio packaged with Time Code.

Neural Network Speech Model: A Neural Network Model trained on Speech Segments for Modelling and used to synthesise replacements for the entire Damaged Segment or Damaged Sections within it.

Output Audio: Audio information such as provided by the Audio-Visual Rendering AIM.

Speech: Data Type representing an analogue audio signal sampled at a frequency between 8-192 kHz with a bits/sample number between 8 and 32 and non-uniform or uniform quantisation.

Spherical Harmonic Decomposition: Data Type representing the captured sound field in the spatial frequency domain.

Synthesised Speech: Speech produced by a Text-To-Speech AIM.

Transform Audio: A frequency representation of Audio.

Transform Multichannel Audio: Data Type obtained from the transformation of Multichannel Audio.

Utterance: An Audio Segment.

Cookie	Duration	Description
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Technical".
CookieLawInfoConsent	1 year	The cookie is set by the GDPR Cookie Consent plug-in and is used to store whether the user has consented to the use of cookies or not. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_pk_id.6.08a8	13 months	Used to store a few details about the user such as the unique visitor ID
_pk_ses.6.08a8	30 minutes	Short lived cookies used to temporarily store data for the visit

Definitions

Notice