Low-shot learning of substrate specificity on transmembrane transport proteins

Title:

Low-shot learning of substrate specificity on transmembrane transport proteins

Ataei, Sima (2024) Low-shot learning of substrate specificity on transmembrane transport proteins. PhD thesis, Concordia University.

Preview

Text (application/pdf)
Ataei_PhD_S2025.pdf - Accepted Version
Available under License Spectrum Terms of Access.

10MB

Abstract

Transmembrane transport proteins are essential for cellular processes, selectively moving substrates across membranes. Traditional wet lab methods for detecting substrate
specificity, like binding and uptake assays, are costly and impractical for large-scale studies. Computational approaches, particularly machine learning (ML), could offer efficient
alternatives. State-of-the-art (SOTA) models to date can predict general groups of substrates carried by transporters, such as organic ions. The SOTA model TooT-SC achieves
a Matthew’s Correlation Coefficient (MCC) of 0.82 predicting 11 general substrate classes.
This research presents novel computational methods for predicting transported substrates using few-shot, one-shot, and zero-shot learning techniques to handle imbalanced datasets,
leveraging large language models (LLMs). These low-shot learning models enhance substrate specificity prediction.
An automatic pipeline is introduced to create machine learning (ML)-ready datasets for specific substrate groups, integrating the Chemical Entities of Biological Interest (ChEBI) and Gene Ontology (GO) databases to address the lack of annotated protein
sequence data. Initial studies confirm the effectiveness of transformer-based Protein Language Models (PLMs), adapted from natural language processing (NLP), in this context.
The research focuses on three key projects: TooT-Open-ICAT (Open-world classification of Inorganic Cations and Anions Transporters) predicts inorganic ion transport using
open-world classification; TooT-Triplet-SPEC (Triplet training for substrate SPECificity prediction) predicts specific substrates through metric learning; TooTranslator shifts from
classification to regression to predict substrates of uncharacterized proteins.
The TooTranslator model advances the SOTA by improving predictions for fine-grained classes accurately predicting 93 specific substrates with an MCC of 0.92. Furthermore, the models show promise in predicting true labels for unseen classes

Divisions:	Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:	Thesis (PhD)
Authors:	Ataei, Sima
Institution:	Concordia University
Degree Name:	Ph. D.
Program:	Computer Science
Date:	2 October 2024
Thesis Supervisor(s):	Butler, Gregory
ID Code:	995090
Deposited By:	Sima Ataei
Deposited On:	17 Jun 2025 14:00
Last Modified:	17 Jun 2025 14:00

Repository Staff Only: item control page

Download Statistics

Downloads per month over past year

Research related to the current document (at the CORE website)

Spectrum Research Repository

Low-shot learning of substrate specificity on transmembrane transport proteins

Low-shot learning of substrate specificity on transmembrane transport proteins

Abstract