Ahmadi, Sana (2025) Scaling up Machine Learning Models for fMRI Brain Encoding. PhD thesis, Concordia University.
Preview |
Text (application/pdf)
8MBAhmadi_PhD_F2025.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
This thesis investigates techniques for optimizing brain encoding models, emphasizing computational efficiency and the scalability of both data and models within the framework of large-scale functional magnetic resonance imaging (fMRI) datasets. Brain encoding aims to predict neural responses to complex stimuli, such as video frames, by utilizing latent feature representations from artificial neural networks. The first study explores the acceleration of
ridge regression, a widely used predictive model in brain encoding, particularly when applied to large fMRI datasets like the CNeuroMod Friends dataset. By implementing a novel batch-parallelization strategy using Dask, we achieved significant computational speedups of up to 33× with 8 compute nodes and 32 threads compared to a single-threaded scikit-learn.
The second study investigates how dataset size and model scaling affect brain encoding performance using vision Transformers. To do so, the VideoGPT model was trained end-to-end to extract spatiotemporal features from the Shinobi video game dataset with varying sample sizes (10K, 100K, 1M, and 6M) and model size (number of training parameters). Ridge regression is then used to predict brain activity based on fMRI data and the extracted features from video games. Our results show that larger datasets lead to significantly improved
encoding accuracy, with the 6M-sample dataset producing the highest Pearson correlation coefficients across subjects. Additionally, while increasing hidden layer dimensions in the transformer model greatly enhances performance, the number of attention heads appears to have a minimal effect. These findings emphasize the importance of data scaling for improving brain encoding, offering practical insights for optimizing neural network architectures in
the context of large-scale stimuli data.
This research advances the field of computationally efficient brain encoding, which is crucial for enhancing both computational speed and accuracy. These advancements are essential not only for improving our understanding of brain function but also for enabling scalable machine learning models on high-dimensional data and sophisticated stimuli, including applications
in neuroprosthetics and clinical neuroscience.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering |
---|---|
Item Type: | Thesis (PhD) |
Authors: | Ahmadi, Sana |
Institution: | Concordia University |
Degree Name: | Ph. D. |
Program: | Computer Science |
Date: | 22 January 2025 |
Thesis Supervisor(s): | Glatard, Tristan and Bellec, Pierre Lune |
ID Code: | 995062 |
Deposited By: | Sana Ahmadi |
Deposited On: | 17 Jun 2025 13:57 |
Last Modified: | 17 Jun 2025 13:57 |
Repository Staff Only: item control page