This chapter specifies the steps enabling a User to design a neural network able to up-sample a video sequence to a higher resolution than the current video resolution through the following steps:

  1. Selection of video sequences for use in the development of the Training Dataset.
  2. Creation of the Training Dataset.
  3. Selection of Training Approach.
  4. Selection of the NN Architecture (initial architecture and modification).
  5. Training of the Neural Network (e.g., how many epochs for SDtoHD).

Data Preparation

The pair of frames to be used in the training process, training dataset, consists of an input frame of resolution n/2 by m/2 and an output frame of resolution n by m. The input frame, if not available, may be obtained from the output frame by using a down-sampling filtering process.

To reduce the computational time required for training and inferencing, as well as to overcome memory management issues related to the graphics unit, patches extraction may be used. The resolution of the patches is h by k where h and k and much smaller than the resolution of the pair frames.

To perform patches extraction different approaches may be used, i.e., random, features based etc.

To increase the generalization capabilities of the network, augmentation can be used, where modified copies of the patches or the frames are created to enlarge the training dataset.

Training

The deep-learning model can be trained either with fine tuning, starting from a pre-trained model, or trained from scratch.

(Do we want to say what they are?)

What do we want to add? the training strategies? I mean the the type of optimizer, and various training parameters?