CAE CAV HMC MMC OSD PAF
AI Workflows AI Modules

1       AI Workflows

1.1      Audio Recording Preservation

USC-ARP is a PAAI restores an open reel audio tape by detecting the audio and visual irregularities.

USC-ARP is composed of a set of collaborating PAAIs:

Audio Analysis for Preservation – Receives Video Analysis for Preservation’s Video Irregularity File.
– Detects irregularities.
– Extracts Audio files of each detected/received Audio Irregularity
– Sends Irregularity Audio File & Audio Irregularity File for each irregularity to Tape Irregularity Classification.
Video Analysis for Preservation – Receives: Audio Analysis for Preservation’s Audio Irregularity File and offset between Preservation Audio File and Preservation Audio-Visual File.
– Detects irregularities.
– Extracts Images of each detected/received Video Irregularity.
– Sends Irregularity Image & Video Irregularity File of each irregularity to TIC
Tape Audio Restoration Produces the Restored Audio Files and the Editing List used to restore portions of the Preservation Audio File using the Irregularity Files.
Tape Irregularity Classification – Produces Irregularity Files from:
– Irregularity Files of the Audio component and corresponding Irregularity Audio Files.
– Irregularity Files of the Video component and corresponding Irregularity Images.
Tape Audio Restoration Produces the Restored Audio Files and the Editing List used to restore portions of the Preservation Audio File using the Irregularity Files.
Packaging for Audio Preservation Assembles the output files.

Figure 8 – Reference Model of USC-ARP

The following links analyse the AI Modules:

Audio Analysis for Preservation

Packaging for Audio Preservation

Tape Audio Restoration

Tape Irregularity Classification

Video Analysis for Preservation

1.2      Emotion-Enhanced Speech

USC-EES is a PAAI that inserts a prescribed emotion into an Emotionless Speech includes two PAAIs in two configurations that use different collaborating PAAIs:

#1 Speech Feature Analysis1 Extracts Prosodic Speech Features from a Model Utterance.
Prosodic Emotion Insertion Adds the Prosodic Speech Features to an Emotionless Speech with an Emotion whose type is indicated by a user.
#2 Speech Feature Analysis1 Extracts Emotionless Speech Features from an Emotionless Speech segment.
Emotion Features Production Extracts Neural Speech Features from the Emotionless Speech Features based on an Emotion List and an indication of the language.
Neural Speech Insertion Adds Neural Speech Features to the Emotionless Speech.

Figure 9 – Reference Model of USC-EES

The following links analyse the AI Modules:

Emotion Feature Production

Neural Emotion Insertion

Prosodic Emotion Insertion

Speech Feature Analysis 1

Speech Feature Analysis 2

1.3      Enhanced Audioconference Experience

USC-EAE is a PAAI that produces a Multichannel Audio Stream acting on an  input Microphone Array Audio.

USC-EAE is composed of the following PAAIs:

Audio Analysis Transform Represents the input Multichannel Audio in a new form amenable to further processing by the subsequent PAAIss in the architecture.
Sound Field Description Produces Spherical Harmonic Decomposition Coefficients of the Transformed Multichannel Audio.
Speech Detection and Separation Separates speech and non-speech signals in the Spherical Harmonic Decomposition producing Transform Speech and Audio Scene Geometry.
Noise Cancellation Module Removes noise and/or suppresses reverberation in the Transform Speech producing Enhanced Transform Audio.
Audio Synthesis Transform Effects inverse transform of Enhanced Transform Audio producing Enhanced Audio Objects ready for packaging.
Audio Description Packaging Multiplexes Enhanced Audio Objects and the Audio Scene Geometry.

Figure 10 – Reference Model of USC-EAE

Audio Description Packaging

Audio Synthesis Transform

Noise Cancellation Module

Speech Analysis Transform

Sound Field Description

Speech Detection and Separation

1.4      Speech Restoration System

CAE-SRS is a PAAI collecting speech segments of a particular speaker, training a Neural Network Model to synthesise Speech with the so-trained Neural Network Speech Model form Text and use the synthesised Speech to replaced damaged Speech Segments.

CAE-SRS is composed of three collaborating PAAIs:

Speech Model Creation Trains a Neural Network Model with Speech Segments.
Speech Synthesis for Restoration Uses the Neural Network Speech Model to synthesise a Speech Object from a Text Object.
Speech Restoration Assembly Replaces a Damaged Segmented indexed by a Damaged List with the Synthesised Speech.

Figure 11 – Reference Model of CAE-SRS

Speech Synthesis for Restoration trains a Neural Network but this is done before the restoration process begins, not while restoring the Speech.

The following links analyse the AI Modules:

Speech Model Creation

Speech Restoration Assembly

Speech Synthesis for Restoration

2       AI Modules

2.1      Audio Analysis for Preservation

CAE-AAP is a PAAI

Receives Preservation Audio File From Preservation Audio File
Preservation Audio-Visual File From Preservation Audio-Visual File
Video Irregularity File From Video Analysis for Preservation
Produces Audio Irregularity File To Tape Irregularity Classification
Irregularity Audio File To Tape Irregularity Classification

CAE-AAP detects irregularities in the Preservation Audio File weighing them against the Video Irregularity received from CAE-VAP. This process may be performed with regular data processing techniques or with a Neural Network trained with a sufficiently large training dataset.

CAE-AAP performs Descriptors-Interpretation Level Operation.

2.2      Emotion Feature Production

CAE-EFP is a PAAI that:

Receives Emotionless Speech Features From Speech Feature Analysis2
Emotion List Emotion that the Neural Speech Features should convey.
Language Selector Language of the Emotionless Speech.
Produces Neural Speech Features To feed Neural Emotion Insertion

CAE-EFP is implemented as a Neural Network trained to extract Speech Features from an Emotionless Speech.

CAE-EFPP performs Descriptors Level Operations.

2.3      Neural Emotion Insertion

CAE-NEI  is a PAAI that:

Receives Emotionless Speech
Neural Speech Features From CAE-EFP
Produces Speech With Emotion Adding Neural Speech Features to Emotionless Speech

CAE-NEI is implemented as a Neural Network cognizant of the semantics of the Neural Speech Features it receives for insertion into the Emotionless Speech so that it carries the desired Emotion.

CAE-NEI performs Data Processing Level Operations.

2.4      Prosodic Emotion Insertion

CAE-PEI is a PAAI

Receiving Prosodic Speech Features From Speech Feature Analysis2
Emotion List Emotion that the Neural Speech Features should convey.
Emotionless Speech Input to be made emotional.
Producing Speech with Emotion The resulting Emotion-carrying Speech.

CAE-PEI must be cognizant of the semantics of the Prosodic Speech Features so that it can add Emotion to the Emotionless Speech.

CAE-PEI performs Data Processing Level Operations.

2.5      Speech Feature Analysis 1

CAE-SF1 is a PAAI that:

Receives Model Utterance containing emotion.
Extracts Speech Features1 from the Model Utterance.
Produces Prosodic Speech Features. for insertion into Emotionless Speech.

CAE-SF1 can be implemented with data processing techniques to or with a Neural Network trained to extract Prosodic Speech Features from Emotion-carrying utterances.

CAE-SFI performs Descriptors Level Operations.

2.6      Speech Feature Analysis 2

CAE-SF2 is a PAAI:

Receiving Emotionless Speech to be made Emotion-carrying.
Extracting Emotionless Speech Features from Emotionless Speech
Producing Emotionless Speech Features for insertion into Emotionless Speech.

CAE-SF2 is implemented as a Neural Network trained to extract Speech Features.

CAE-SF2 performs Descriptors Level Operations.

2.7      Speech Model Creation

CAE-SMC is a PAAI

Collecting Speech Segments in sufficient number for NN training
Producing Neural Network Speech Model for text-to-speech synthesis.

CAE-SMC can only be implemented with a Neural Network training set up.

CAE-SMC performs Training Level Operations.

2.8      Tape Irregularity Classification

CAE-TIC is a PAAI that:

Receives Audio Irregularity Files
Irregularity Audio Files
Video Irregularity Files
Irregularity Images
Irregularity Files
Irregularity Images
Produces Irregularity File

CAE-TIC is implemented as a Neural Network trained to confirm as an Irregularity an Audio and/or Visual Irregularity with the corresponding Audio or Image Irregularity

CAE-TIC performs Reasoning-Level Operations.

2.9      Video Analysis for Preservation

CAE-VAPAP is a PAAI that:

Receives Preservation Audio-Visual File Input to the CAE-ARP.
Audio  Irregularity File From Audio Analysis for Preservation
Producing Audio Irregularity File To Tape Irregularity Classification
Irregularity Image To Tape Irregularity Classification

CAE-VAP detects irregularities in the Preservation Audio-Visual File weighing them against the corresponding Audio Irregularity received from CAE-VAP.

This process may be performed with regular data processing techniques or with a Neural Network trained with a sufficiently large training dataset.

CAE-AAP performs Descriptors-Interpretation Level Operation.