On Using Simulated Annealing in Training Deep Neural Networks

Title:

On Using Simulated Annealing in Training Deep Neural Networks

Sarfi, Amirmohammad (2023) On Using Simulated Annealing in Training Deep Neural Networks. Masters thesis, Concordia University.

Preview

Text (application/pdf)
Sarfi_MSc_S2023.pdf - Accepted Version
Available under License Spectrum Terms of Access.

947kB

Abstract

In deep learning, overfitting is a major problem that makes it difficult for a neural network to perform well on new data. This issue is especially prevalent in low-data regimes, or when training for too many epochs. Iterative learning methods have been devised to improve the generalization performance of neural networks when trained for a prolonged duration. These techniques periodically reduce the training accuracy of a network which is called forgetting. The primary objective of the forgetting stage is to allow the network to learn more from the same data and surpass its previous performance over the long run.

In this thesis, we propose a new forgetting technique motivated by simulated annealing. Although simulated annealing is a powerful tool in optimization, its application in deep learning has been overlooked. In our study, we highlight the potential of this method in deep learning and illustrate its usefulness through experiments. Essentially, we select a subset of layers to undergo brief periods of gradient ascent, followed by gradient descent. In the first scenario, we utilize Simulated Annealing in Early Layers (SEAL) during the training process. Through extensive experiments on the Tiny-ImageNet dataset, we demonstrate that our method has a much better prediction depth, in-distribution, and transfer learning performance compared to the state-of-the-art works in iterative training. In the second scenario, we expand the application of simulated annealing beyond the realms of classification and computer vision, by employing it in text-to-3D generative methods. In this scenario, we apply simulated annealing to the entire network and illustrate its effectiveness compared to normal training. These two scenarios collectively demonstrate the potential of simulated annealing as a valuable tool for optimizing deep neural networks and emphasize the need for further exploration of this technique in the literature.

Divisions:	Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:	Thesis (Masters)
Authors:	Sarfi, Amirmohammad
Institution:	Concordia University
Degree Name:	M. Sc.
Program:	Computer Science
Date:	3 April 2023
Thesis Supervisor(s):	Mudur, Sudhir and Belilovsky, Eugene
Keywords:	Machine Learning, Deep Learning, Computer Vision, Computer Graphics, 3D Graphics, Transfer Learning, Simulated Annealing, Optimization, Few-shot Transfer Learning, Self-Distillation, Diffusion Models, Generative Models
ID Code:	992094
Deposited By:	Amirmohammad Sarfi
Deposited On:	21 Jun 2023 14:43
Last Modified:	21 Jun 2023 14:43
Related URLs:	Author

Repository Staff Only: item control page

Download Statistics

Downloads per month over past year

Research related to the current document (at the CORE website)

Spectrum Research Repository

On Using Simulated Annealing in Training Deep Neural Networks

On Using Simulated Annealing in Training Deep Neural Networks

Abstract