Login | Register

On Using Simulated Annealing in Training Deep Neural Networks


On Using Simulated Annealing in Training Deep Neural Networks

Sarfi, Amirmohammad (2023) On Using Simulated Annealing in Training Deep Neural Networks. Masters thesis, Concordia University.

[thumbnail of Sarfi_MSc_S2023.pdf]
Text (application/pdf)
Sarfi_MSc_S2023.pdf - Accepted Version
Available under License Spectrum Terms of Access.


In deep learning, overfitting is a major problem that makes it difficult for a neural network to perform well on new data. This issue is especially prevalent in low-data regimes, or when training for too many epochs. Iterative learning methods have been devised to improve the generalization performance of neural networks when trained for a prolonged duration. These techniques periodically reduce the training accuracy of a network which is called forgetting. The primary objective of the forgetting stage is to allow the network to learn more from the same data and surpass its previous performance over the long run.

In this thesis, we propose a new forgetting technique motivated by simulated annealing. Although simulated annealing is a powerful tool in optimization, its application in deep learning has been overlooked. In our study, we highlight the potential of this method in deep learning and illustrate its usefulness through experiments. Essentially, we select a subset of layers to undergo brief periods of gradient ascent, followed by gradient descent. In the first scenario, we utilize Simulated Annealing in Early Layers (SEAL) during the training process. Through extensive experiments on the Tiny-ImageNet dataset, we demonstrate that our method has a much better prediction depth, in-distribution, and transfer learning performance compared to the state-of-the-art works in iterative training. In the second scenario, we expand the application of simulated annealing beyond the realms of classification and computer vision, by employing it in text-to-3D generative methods. In this scenario, we apply simulated annealing to the entire network and illustrate its effectiveness compared to normal training. These two scenarios collectively demonstrate the potential of simulated annealing as a valuable tool for optimizing deep neural networks and emphasize the need for further exploration of this technique in the literature.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Thesis (Masters)
Authors:Sarfi, Amirmohammad
Institution:Concordia University
Degree Name:M. Sc.
Program:Computer Science
Date:3 April 2023
Thesis Supervisor(s):Mudur, Sudhir and Belilovsky, Eugene
Keywords:Machine Learning, Deep Learning, Computer Vision, Computer Graphics, 3D Graphics, Transfer Learning, Simulated Annealing, Optimization, Few-shot Transfer Learning, Self-Distillation, Diffusion Models, Generative Models
ID Code:992094
Deposited By: Amirmohammad Sarfi
Deposited On:21 Jun 2023 14:43
Last Modified:21 Jun 2023 14:43
Related URLs:
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top