This page contains definition for Audio-related Data Types.

Audio: A Data Type that

  • Represents analogue signals sampled at a frequency between 8-192 kHz with a number of bits/sample between 8 and 32.
  • Is rendered in the human-audible range (16 Hz – 20 kHz ).

Audio Block: a set of consecutive samples without time code.

Audio File: a wave file conforming to WAV RF64 file format.

Audio Segment: An Audio Block with Time Labels.

Emotionless Speech: An Audio File containing only speech in which music and other sounds are absent, and in which little or no identifiable emotion is perceptible by native listeners.

Enhanced Audio: Multichannel Audio whose samples are Enhanced Audio samples.

Enhanced Transform Audio: Transform Multichannel Audio whose samples are samples of Transform Enhanced Audio

Input Audio: Multichannel Audio as provided by a Microphone Array.

Microphone Array Audio: Interleaved Multichannel Audio whose channels are sampled at a minimum of 5.33 ms (i.e., 256 samples at 48 kHz) to a maximum of 85.33 ms (i.e., 4096 samples at 48 kHz) and each sample is in single or double precision float.

Model Utterance: An Audio Segment used as a model or demonstration of the Emotion to be added to Emotionless Speech in order to produce Speech with Emotion (Emotion Enhanced Speech Use Case).

Multichannel Audio: a Data Type whose structure contains between 4 and 256 time-aligned interleaved Audio Channels organised in blocks.

Multichannel Audio Stream: Interleaved Multichannel Audio packaged with Time Code.

Neural Network Speech Model: A Neural Network Model trained on Speech Segments for Modelling and used to synthesise replacements for the entire Damaged Segment or Damaged Sections within it.

Output Audio: Audio information such as provided by the Audio-Visual Rendering AIM.

Speech: Data Type representing an analogue audio signal sampled at a frequency between 8-192 kHz with a bits/sample number between 8 and 32 and non-uniform or uniform quantisation.

Spherical Harmonic Decomposition: Data Type representing the captured sound field in the spatial frequency domain.

Synthesised Speech: Speech produced by a Text-To-Speech AIM.

Transform Audio: A frequency representation of Audio.

Transform Multichannel Audio: Data Type obtained from the transformation of Multichannel Audio.

Utterance: An Audio Segment.