CAE | CAV | HMC | MMC | OSD | PAF |
AI Workflows | AI Modules |
1 AI Workflows
1.1 Audio Recording Preservation
USC-ARP is a PAAI restores an open reel audio tape by detecting the audio and visual irregularities.
USC-ARP is composed of a set of collaborating PAAIs:
Audio Analysis for Preservation | – Receives Video Analysis for Preservation’s Video Irregularity File. – Detects irregularities. – Extracts Audio files of each detected/received Audio Irregularity – Sends Irregularity Audio File & Audio Irregularity File for each irregularity to Tape Irregularity Classification. |
Video Analysis for Preservation | – Receives: Audio Analysis for Preservation’s Audio Irregularity File and offset between Preservation Audio File and Preservation Audio-Visual File. – Detects irregularities. – Extracts Images of each detected/received Video Irregularity. – Sends Irregularity Image & Video Irregularity File of each irregularity to TIC |
Tape Audio Restoration | Produces the Restored Audio Files and the Editing List used to restore portions of the Preservation Audio File using the Irregularity Files. |
Tape Irregularity Classification | – Produces Irregularity Files from: – Irregularity Files of the Audio component and corresponding Irregularity Audio Files. – Irregularity Files of the Video component and corresponding Irregularity Images. |
Tape Audio Restoration | Produces the Restored Audio Files and the Editing List used to restore portions of the Preservation Audio File using the Irregularity Files. |
Packaging for Audio Preservation | Assembles the output files. |
Figure 8 – Reference Model of USC-ARP
The following links analyse the AI Modules:
Audio Analysis for Preservation
Packaging for Audio Preservation
Tape Audio Restoration
Tape Irregularity Classification
Video Analysis for Preservation
1.2 Emotion-Enhanced Speech
USC-EES is a PAAI that inserts a prescribed emotion into an Emotionless Speech includes two PAAIs in two configurations that use different collaborating PAAIs:
#1 | Speech Feature Analysis1 | Extracts Prosodic Speech Features from a Model Utterance. |
Prosodic Emotion Insertion | Adds the Prosodic Speech Features to an Emotionless Speech with an Emotion whose type is indicated by a user. | |
#2 | Speech Feature Analysis1 | Extracts Emotionless Speech Features from an Emotionless Speech segment. |
Emotion Features Production | Extracts Neural Speech Features from the Emotionless Speech Features based on an Emotion List and an indication of the language. | |
Neural Speech Insertion | Adds Neural Speech Features to the Emotionless Speech. |
Figure 9 – Reference Model of USC-EES
The following links analyse the AI Modules:
Speech Feature Analysis 1
Speech Feature Analysis 2
1.3 Enhanced Audioconference Experience
USC-EAE is a PAAI that produces a Multichannel Audio Stream acting on an input Microphone Array Audio.
USC-EAE is composed of the following PAAIs:
Audio Analysis Transform | Represents the input Multichannel Audio in a new form amenable to further processing by the subsequent PAAIss in the architecture. |
Sound Field Description | Produces Spherical Harmonic Decomposition Coefficients of the Transformed Multichannel Audio. |
Speech Detection and Separation | Separates speech and non-speech signals in the Spherical Harmonic Decomposition producing Transform Speech and Audio Scene Geometry. |
Noise Cancellation Module | Removes noise and/or suppresses reverberation in the Transform Speech producing Enhanced Transform Audio. |
Audio Synthesis Transform | Effects inverse transform of Enhanced Transform Audio producing Enhanced Audio Objects ready for packaging. |
Audio Description Packaging | Multiplexes Enhanced Audio Objects and the Audio Scene Geometry. |
Figure 10 – Reference Model of USC-EAE
Audio Description Packaging
Audio Synthesis Transform
Noise Cancellation Module
Speech Analysis Transform
Sound Field Description
Speech Detection and Separation
1.4 Speech Restoration System
CAE-SRS is a PAAI collecting speech segments of a particular speaker, training a Neural Network Model to synthesise Speech with the so-trained Neural Network Speech Model form Text and use the synthesised Speech to replaced damaged Speech Segments.
CAE-SRS is composed of three collaborating PAAIs:
Speech Model Creation | Trains a Neural Network Model with Speech Segments. |
Speech Synthesis for Restoration | Uses the Neural Network Speech Model to synthesise a Speech Object from a Text Object. |
Speech Restoration Assembly | Replaces a Damaged Segmented indexed by a Damaged List with the Synthesised Speech. |
Figure 11 – Reference Model of CAE-SRS
Speech Synthesis for Restoration trains a Neural Network but this is done before the restoration process begins, not while restoring the Speech.
The following links analyse the AI Modules:
Speech Restoration Assembly
Speech Synthesis for Restoration
2 AI Modules
2.1 Audio Analysis for Preservation
CAE-AAP is a PAAI
Receives | Preservation Audio File | From Preservation Audio File |
Preservation Audio-Visual File | From Preservation Audio-Visual File | |
Video Irregularity File | From Video Analysis for Preservation | |
Produces | Audio Irregularity File | To Tape Irregularity Classification |
Irregularity Audio File | To Tape Irregularity Classification |
CAE-AAP detects irregularities in the Preservation Audio File weighing them against the Video Irregularity received from CAE-VAP. This process may be performed with regular data processing techniques or with a Neural Network trained with a sufficiently large training dataset.
CAE-AAP performs Descriptors-Interpretation Level Operation.
2.2 Emotion Feature Production
CAE-EFP is a PAAI that:
Receives | Emotionless Speech Features | From Speech Feature Analysis2 |
Emotion List | Emotion that the Neural Speech Features should convey. | |
Language Selector | Language of the Emotionless Speech. | |
Produces | Neural Speech Features | To feed Neural Emotion Insertion |
CAE-EFP is implemented as a Neural Network trained to extract Speech Features from an Emotionless Speech.
CAE-EFPP performs Descriptors Level Operations.
2.3 Neural Emotion Insertion
CAE-NEI is a PAAI that:
Receives | Emotionless Speech | |
Neural Speech Features | From CAE-EFP | |
Produces | Speech With Emotion | Adding Neural Speech Features to Emotionless Speech |
CAE-NEI is implemented as a Neural Network cognizant of the semantics of the Neural Speech Features it receives for insertion into the Emotionless Speech so that it carries the desired Emotion.
CAE-NEI performs Data Processing Level Operations.
2.4 Prosodic Emotion Insertion
CAE-PEI is a PAAI
Receiving | Prosodic Speech Features | From Speech Feature Analysis2 |
Emotion List | Emotion that the Neural Speech Features should convey. | |
Emotionless Speech | Input to be made emotional. | |
Producing | Speech with Emotion | The resulting Emotion-carrying Speech. |
CAE-PEI must be cognizant of the semantics of the Prosodic Speech Features so that it can add Emotion to the Emotionless Speech.
CAE-PEI performs Data Processing Level Operations.
2.5 Speech Feature Analysis 1
CAE-SF1 is a PAAI that:
Receives | Model Utterance | containing emotion. |
Extracts | Speech Features1 | from the Model Utterance. |
Produces | Prosodic Speech Features. | for insertion into Emotionless Speech. |
CAE-SF1 can be implemented with data processing techniques to or with a Neural Network trained to extract Prosodic Speech Features from Emotion-carrying utterances.
CAE-SFI performs Descriptors Level Operations.
2.6 Speech Feature Analysis 2
CAE-SF2 is a PAAI:
Receiving | Emotionless Speech | to be made Emotion-carrying. |
Extracting | Emotionless Speech Features | from Emotionless Speech |
Producing | Emotionless Speech Features | for insertion into Emotionless Speech. |
CAE-SF2 is implemented as a Neural Network trained to extract Speech Features.
CAE-SF2 performs Descriptors Level Operations.
2.7 Speech Model Creation
CAE-SMC is a PAAI
Collecting | Speech Segments | in sufficient number for NN training |
Producing | Neural Network Speech Model | for text-to-speech synthesis. |
CAE-SMC can only be implemented with a Neural Network training set up.
CAE-SMC performs Training Level Operations.
2.8 Tape Irregularity Classification
CAE-TIC is a PAAI that:
Receives | Audio Irregularity Files |
Irregularity Audio Files | |
Video Irregularity Files | |
Irregularity Images | |
Irregularity Files | |
Irregularity Images | |
Produces | Irregularity File |
CAE-TIC is implemented as a Neural Network trained to confirm as an Irregularity an Audio and/or Visual Irregularity with the corresponding Audio or Image Irregularity
CAE-TIC performs Reasoning-Level Operations.
2.9 Video Analysis for Preservation
CAE-VAPAP is a PAAI that:
Receives | Preservation Audio-Visual File | Input to the CAE-ARP. |
Audio Irregularity File | From Audio Analysis for Preservation | |
Producing | Audio Irregularity File | To Tape Irregularity Classification |
Irregularity Image | To Tape Irregularity Classification |
CAE-VAP detects irregularities in the Preservation Audio-Visual File weighing them against the corresponding Audio Irregularity received from CAE-VAP.
This process may be performed with regular data processing techniques or with a Neural Network trained with a sufficiently large training dataset.
CAE-AAP performs Descriptors-Interpretation Level Operation.