Early Layer Optimization

Title:

Early Layer Optimization

Karimpour, Zahra (2025) Early Layer Optimization. Masters thesis, Concordia University.

[thumbnail of Karimpour_MCompSc_S2025.pdf]

Preview

Text (application/pdf)
Karimpour_MCompSc_S2025.pdf - Accepted Version
Available under License Spectrum Terms of Access.

2MB

Abstract

In deep learning, early layers play a fundamental role in building general and transferable representations. In this thesis, we demonstrate how improving early layer features can consistently enhance generalization across diverse training settings.
First, we propose a novel iterative training method called Simulated Annealing in Early Layers (SEAL), which applies intermittent gradient ascent followed by descent to the early layers during training. This enables the early layers to escape local minima and refine their representations over time. Doing so reduces overfitting leading to state-of-the-art in in-distribution and transfer generalization in iterative training regime.
Second, we observed poor transfer generalization in greedy learning which we attribute to the lack of generic information especially in the early layers of the network. To address this, we utilize CS-KD regularization to encourage information gain in the early layers. Our results show that this adjustment mitigates the transfer performance drop typically observed in greedy training, while maintaining in-distribution accuracy.
Finally, we extend our investigation to federated learning, where early layer divergence due to gradient accumulation across clients can lead to poor representation learning, even under IID data distributions. We demonstrate that greedy training, by avoiding end-to-end backpropagation, mitigates divergence in the early layers and improves overall performance, particularly in challenging scenarios with deeper models or many clients.
Overall, this thesis highlights the importance of early layer learning in building models that generalize well, and introduces practical strategies for improving it across iterative, greedy, and federated learning paradigms.

Divisions:	Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:	Thesis (Masters)
Authors:	Karimpour, Zahra
Institution:	Concordia University
Degree Name:	M. Comp. Sc.
Program:	Computer Science
Date:	April 2025
Thesis Supervisor(s):	Mudur, Sudhir
ID Code:	995516
Deposited By:	Zahra Karimpour
Deposited On:	04 Nov 2025 15:39
Last Modified:	04 Nov 2025 15:39

Repository Staff Only: item control page

Download Statistics

Downloads per month over past year

Research related to the current document (at the CORE website)

Spectrum Research Repository

Early Layer Optimization

Early Layer Optimization

Abstract