This page contains definition for Audio-related Data Types.
Audio: A Data Type that
- Represents analogue signals sampled at a frequency between 8-192 kHz with a number of bits/sample between 8 and 32.
- Is rendered in the human-audible range (16 Hz – 20 kHz ).
Audio Block: a set of consecutive samples without time code.
Audio File: a wave file conforming to WAV RF64 file format.
Audio Segment: An Audio Block with Time Labels.
Emotionless Speech: An Audio File containing only speech in which music and other sounds are absent, and in which little or no identifiable emotion is perceptible by native listeners.
Enhanced Audio: Multichannel Audio whose samples are Enhanced Audio samples.
Enhanced Transform Audio: Transform Multichannel Audio whose samples are samples of Transform Enhanced Audio
Input Audio: Multichannel Audio as provided by a Microphone Array.
Microphone Array Audio: Interleaved Multichannel Audio whose channels are sampled at a minimum of 5.33 ms (i.e., 256 samples at 48 kHz) to a maximum of 85.33 ms (i.e., 4096 samples at 48 kHz) and each sample is in single or double precision float.
Model Utterance: An Audio Segment used as a model or demonstration of the Emotion to be added to Emotionless Speech in order to produce Speech with Emotion (Emotion Enhanced Speech Use Case).
Multichannel Audio: a Data Type whose structure contains between 4 and 256 time-aligned interleaved Audio Channels organised in blocks.
Multichannel Audio Stream: Interleaved Multichannel Audio packaged with Time Code.
Neural Network Speech Model: A Neural Network Model trained on Speech Segments for Modelling and used to synthesise replacements for the entire Damaged Segment or Damaged Sections within it.
Output Audio: Audio information such as provided by the Audio-Visual Rendering AIM.
Speech: Data Type representing an analogue audio signal sampled at a frequency between 8-192 kHz with a bits/sample number between 8 and 32 and non-uniform or uniform quantisation.
Spherical Harmonic Decomposition: Data Type representing the captured sound field in the spatial frequency domain.
Synthesised Speech: Speech produced by a Text-To-Speech AIM.
Transform Audio: A frequency representation of Audio.
Transform Multichannel Audio: Data Type obtained from the transformation of Multichannel Audio.
Utterance: An Audio Segment.