Fulleringer, Alexander (2023) Manipulating Explanations: Modifying Feature Visualizatio in in Artificial Neural Networks. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
15MBFulleringer_MCompSc_S2024.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
As Deep Neural Networks become increasingly ubiquitous and increasingly large, there has been an increasing concern with their uninterpretable nature, and a push towards stronger techniques for interpretation.
Feature visualization is one of the most popular techniques to interpret the internal behavior of individual units of trained deep neural networks. Based on activation maximization, it consists of finding synthetic or natural inputs that maximize neuron activations.
This work introduces an optimization framework that aims to deceive feature visualization through adversarial model manipulation.
It consists of fine-tuning a pre-trained model with a specifically introduced loss that aims to maintain model performance, while also significantly changing feature visualization.
We provide evidence of the success of this manipulation on several pre-trained models for the ImageNet classification task.
Additionally, several model pruning strategies are tested as potential defences against the manipulations developed, with the aim of producing resilient and performative models.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | Fulleringer, Alexander |
Institution: | Concordia University |
Degree Name: | M. Comp. Sc. |
Program: | Computer Science |
Date: | 7 December 2023 |
Thesis Supervisor(s): | Belilovsky, Eugene |
ID Code: | 993282 |
Deposited By: | Alexander Fulleringer |
Deposited On: | 04 Jun 2024 15:04 |
Last Modified: | 04 Jun 2024 15:04 |
Repository Staff Only: item control page