Martins Gomes, Damien (2025) Towards Practical Second-Order Optimizers in Deep Learning: Insights from Fisher Information Analysis. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
18MBMartinsGomes_Msc_S2025.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
First-order optimization methods remain the standard for training deep neural networks (DNNs). Optimizers like Adam incorporate limited curvature information by preconditioning the stochastic gradient with a diagonal matrix. Despite the widespread adoption of first-order methods, second-order optimization algorithms often exhibit superior convergence compared to methods like Adam and SGD. However, their practicality in training DNNs is still limited by a significantly higher per-iteration computational cost compared to first-order methods. In this thesis, we present AdaFisher, a novel adaptive second-order optimizer that leverages a diagonal block-Kronecker approximation of the Fisher information matrix to adaptively precondition gradients. AdaFisher aims to bridge the gap between the improved convergence and generalization of second-order methods and the computational efficiency needed for training DNNs. Despite the traditionally slower speed of second-order optimizers, AdaFisher is effective for tasks such as image classification and language modeling, exhibiting remarkable stability and robustness during hyperparameter tuning. We demonstrate that AdaFisher outperforms state-of-the-art optimizers in both accuracy and convergence speed. The
Code is available from https://github.com/AtlasAnalyticsLab/AdaFisher.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | Martins Gomes, Damien |
Institution: | Concordia University |
Degree Name: | M.A. |
Program: | Computer Science |
Date: | 26 March 2025 |
Thesis Supervisor(s): | Hosseini, Mahdi S. |
Keywords: | Second Order Optimization, Fisher Information, Kronecker-factored Approximate Curvature, Deep Learning, Computer Vision, Natural Language Processing |
ID Code: | 995445 |
Deposited By: | Damien Martins Gomes |
Deposited On: | 17 Jun 2025 17:34 |
Last Modified: | 17 Jun 2025 17:34 |
Related URLs: |
Repository Staff Only: item control page