Sarfi, Amirmohammad (2023) On Using Simulated Annealing in Training Deep Neural Networks. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
947kBSarfi_MSc_S2023.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
In deep learning, overfitting is a major problem that makes it difficult for a neural network to perform well on new data. This issue is especially prevalent in low-data regimes, or when training for too many epochs. Iterative learning methods have been devised to improve the generalization performance of neural networks when trained for a prolonged duration. These techniques periodically reduce the training accuracy of a network which is called forgetting. The primary objective of the forgetting stage is to allow the network to learn more from the same data and surpass its previous performance over the long run.
In this thesis, we propose a new forgetting technique motivated by simulated annealing. Although simulated annealing is a powerful tool in optimization, its application in deep learning has been overlooked. In our study, we highlight the potential of this method in deep learning and illustrate its usefulness through experiments. Essentially, we select a subset of layers to undergo brief periods of gradient ascent, followed by gradient descent. In the first scenario, we utilize Simulated Annealing in Early Layers (SEAL) during the training process. Through extensive experiments on the Tiny-ImageNet dataset, we demonstrate that our method has a much better prediction depth, in-distribution, and transfer learning performance compared to the state-of-the-art works in iterative training. In the second scenario, we expand the application of simulated annealing beyond the realms of classification and computer vision, by employing it in text-to-3D generative methods. In this scenario, we apply simulated annealing to the entire network and illustrate its effectiveness compared to normal training. These two scenarios collectively demonstrate the potential of simulated annealing as a valuable tool for optimizing deep neural networks and emphasize the need for further exploration of this technique in the literature.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | Sarfi, Amirmohammad |
Institution: | Concordia University |
Degree Name: | M. Sc. |
Program: | Computer Science |
Date: | 3 April 2023 |
Thesis Supervisor(s): | Mudur, Sudhir and Belilovsky, Eugene |
Keywords: | Machine Learning, Deep Learning, Computer Vision, Computer Graphics, 3D Graphics, Transfer Learning, Simulated Annealing, Optimization, Few-shot Transfer Learning, Self-Distillation, Diffusion Models, Generative Models |
ID Code: | 992094 |
Deposited By: | Amirmohammad Sarfi |
Deposited On: | 21 Jun 2023 14:43 |
Last Modified: | 21 Jun 2023 14:43 |
Related URLs: |
Repository Staff Only: item control page