Manipulating Explanations: Modifying Feature Visualizatio in in Artificial Neural Networks

Title:

Manipulating Explanations: Modifying Feature Visualizatio in in Artificial Neural Networks

Fulleringer, Alexander (2023) Manipulating Explanations: Modifying Feature Visualizatio in in Artificial Neural Networks. Masters thesis, Concordia University.

[thumbnail of Fulleringer_MCompSc_S2024.pdf]

Preview

Text (application/pdf)
Fulleringer_MCompSc_S2024.pdf - Accepted Version
Available under License Spectrum Terms of Access.

15MB

Abstract

As Deep Neural Networks become increasingly ubiquitous and increasingly large, there has been an increasing concern with their uninterpretable nature, and a push towards stronger techniques for interpretation.
Feature visualization is one of the most popular techniques to interpret the internal behavior of individual units of trained deep neural networks. Based on activation maximization, it consists of finding synthetic or natural inputs that maximize neuron activations.
This work introduces an optimization framework that aims to deceive feature visualization through adversarial model manipulation.
It consists of fine-tuning a pre-trained model with a specifically introduced loss that aims to maintain model performance, while also significantly changing feature visualization.
We provide evidence of the success of this manipulation on several pre-trained models for the ImageNet classification task.
Additionally, several model pruning strategies are tested as potential defences against the manipulations developed, with the aim of producing resilient and performative models.

Divisions:	Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:	Thesis (Masters)
Authors:	Fulleringer, Alexander
Institution:	Concordia University
Degree Name:	M. Comp. Sc.
Program:	Computer Science
Date:	7 December 2023
Thesis Supervisor(s):	Belilovsky, Eugene
ID Code:	993282
Deposited By:	Alexander Fulleringer
Deposited On:	04 Jun 2024 15:04
Last Modified:	04 Jun 2024 15:04

Repository Staff Only: item control page

Download Statistics

Downloads per month over past year

Research related to the current document (at the CORE website)

Spectrum Research Repository

Manipulating Explanations: Modifying Feature Visualizatio in in Artificial Neural Networks

Manipulating Explanations: Modifying Feature Visualizatio in in Artificial Neural Networks

Abstract