Alballa, Munira, Aplop, Faizah and Butler, Gregory (2020) TranCEP: Predicting the substrate class of transmembrane transport proteins using compositional, evolutionary, and positional information. PLOS ONE, 15 (1). e0227683. ISSN 1932-6203
Preview |
Text (application/pdf)
1MBbutler-trancep-journal.pone.0227683.pdf - Published Version Available under License Spectrum Terms of Access. |
Official URL: http://dx.doi.org/10.1371/journal.pone.0227683
Abstract
Transporters mediate the movement of compounds across the membranes that separate the cell from its environment and across the inner membranes surrounding cellular compartments. It is estimated that one third of a proteome consists of membrane proteins, and many of these are transport proteins. Given the increase in the number of genomes being sequenced, there is a need for computational tools that predict the substrates that are transported by the transmembrane transport proteins. In this paper, we present TranCEP, a predictor of the type of substrate transported by a transmembrane transport protein. TranCEP combines the traditional use of the amino acid composition of the protein, with evolutionary information captured in a multiple sequence alignment (MSA), and restriction to important positions of the alignment that play a role in determining the specificity of the protein. Our experimental results show that TranCEP significantly outperforms the state-of-the-art predictors. The results quantify the contribution made by each type of information used.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering |
---|---|
Item Type: | Article |
Refereed: | Yes |
Authors: | Alballa, Munira and Aplop, Faizah and Butler, Gregory |
Journal or Publication: | PLOS ONE |
Date: | 14 January 2020 |
Funders: |
|
Digital Object Identifier (DOI): | 10.1371/journal.pone.0227683 |
ID Code: | 986462 |
Deposited By: | GREGORY BUTLER |
Deposited On: | 17 Mar 2020 15:24 |
Last Modified: | 17 Mar 2020 15:24 |
Related URLs: |
References:
1. Buehler L. The Structure of Membrane Proteins. Cell Membranes. Garland Science; 2015.2. Kozma D, Simon I, Tusnády GE. PDBTM: Protein Data Bank of transmembrane proteins after 8 years. Nucleic Acids Research. 2013;41(D1):D524–D529.
3. Gromiha M, Ou Y. Bioinformatics approaches for functional annotation of membrane proteins. Briefings in Bioinformatics. 2014;15(2):155–168.
4. Butt AH, Rasool N, Khan YD. A treatise to computational approaches towards prediction of membrane protein and its subtypes. The Journal of Membrane Biology. 2017;250(1):55–76. pmid:27866233
5. Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, Feng Z, Gilliland GL, Iype L, Jain S, Fagan P. The protein data bank. Acta Crystallographica Section D: Biological Crystallography. 2002;58(6):899–907.
6. Schaadt NS, Christoph J, Helms V. Classifying substrate specificities of membrane transporters from Arabidopsis thaliana. Journal of Chemical Information and Modeling. 2010;50(10):1899–1905. pmid:20925375
7. Chen S, Ou Y, Lee T, Gromiha MM. Prediction of transporter targets using efficient RBF networks with PSSM profiles and biochemical properties. Bioinformatics. 2011;27(15):2062–2067. pmid:21653515
8. Schaadt N, Helms V. Functional classification of membrane transporters and channels based on filtered TM/non-TM amino acid composition. Biopolymers. 2012;97(7):558–567. pmid:22492257
9. Barghash A, Helms V. Transferring functional annotations of membrane transporters on the basis of sequence similarity and sequence motifs. BMC Bioinformatics. 2013;14(1):343. pmid:24283849
10. Mishra NK, Chang J, Zhao PX. Prediction of membrane transport proteins and their substrate specificities using primary sequence information. PLoS One. 2014;9(6):1–14.
11. Gromiha MM, Yabuki Y. Functional discrimination of membrane proteins using machine learning techniques. BMC Bioinformatics. 2008;9(1):135. pmid:18312695
12. Li H, Benedito VA, Udvardi MK, Zhao PX. TransportTP: A two-phase classification approach for membrane transporter prediction and characterization. BMC Bioinformatics. 2009;10(418):1–13.
13. Ou YY, Chen SA, Gromiha MM. Classification of transporters using efficient radial basis function networks with position-specific scoring matrices and biochemical properties. Proteins: Structure, Function, and Bioinformatics. 2010;78(7):1789–1797.
14. Busch W, Saier M Jr. The IUBMB-endorsed transporter classification system. Methods in Molecular Biology. 2003;227:21. pmid:12824641
15. Saier MH Jr, Tran CV, Barabote RD. TCDB: the Transporter Classification Database for membrane transport protein analyses and information. Nucleic Acids Research. 2006;34(suppl_1):D181–D186.
16. Saier MH Jr, Reddy VS, Tsu BV, Ahmed MS, Li C, Moreno-Hagelsieb G. The transporter classification database (TCDB): recent advances. Nucleic Acids Research. 2016;44(D1):D372–D379. pmid:26546518
17. Thiele I, Palsson BØ. A protocol for generating a high-quality genome-scale metabolic reconstruction. Nature protocols. 2010;5(1):93–121. pmid:20057383
18. Sahoo S, Aurich MK, Jonsson JJ, Thiele I. Membrane transporters in a human genome-scale metabolic knowledgebase and their implications for disease. Frontiers in Physiology. 2014;5:91. pmid:24653705
19. Dias O, Rocha M, Ferreira EC, Rocha I. Reconstructing genome-scale metabolic models with merlin. Nucleic Acids Research. 2015;43(8):3899–3910. pmid:25845595
20. Loira N, Zhukova A, Sherman DJ. Pantograph: A template-based method for genome-scale metabolic model reconstruction. Journal of Bioinformatics and Computational Biology. 2015;13(02):1550006. pmid:25572717
21. Aplop F, Butler G. TransATH: transporter prediction via annotation transfer by homology. ARPN Journal of Engineering and Applied Sciences. 2017;12(2).
22. Aplop F. Computational approaches to improving the reconstruction of metabolic pathway. Concordia University; 2016.
23. Farwick A, Bruder S, Schadeweg V, Oreb M, Boles E. Engineering of yeast hexose transporters to transport D-xylose without inhibition by D-glucose. Proceedings of the National Academy of Sciences. 2014;111(14):5159–5164.
24.Teppa E, Wilkins AD, Nielsen M, Buslje CM. Disentangling evolutionary signals: conservation, specificity determining positions and coevolution. Implication for catalytic residue prediction. BMC Bioinformatics. 2012;13(1):235. pmid:22978315
25. Chakraborty A, Chakrabarti S. A survey on prediction of specificity-determining sites in proteins. Briefings in Bioinformatics. 2014;16(1):71–88. pmid:24413183
26. Pirovano W, Feenstra KA, Heringa J. PRALINE™: a strategy for improved multiple alignment of transmembrane proteins. Bioinformatics. 2008;24(4):492–497. pmid:18174178
27. Chang JM, Di Tommaso P, Taly JF, Notredame C. Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee. BMC Bioinformatics. 2012;13(Suppl 4):S1. pmid:22536955
28. Floden EW, Tommaso PD, Chatzou M, Magis C, Notredame C, Chang JM. PSI/TM-Coffee: a web server for fast and accurate multiple sequence alignments of regular and transmembrane proteins using homology extension on reduced databases. Nucleic Acids Research. 2016;44(W1):W339–W343. pmid:27106060
29. Bhat B, Ganai NA, Andrabi SM, Shah RA, Singh A. TM-Aligner: Multiple sequence alignment tool for transmembrane proteins with reduced time and improved accuracy. Scientific reports. 2017;7(1):12543. pmid:28970546
30. Chang JM, Di Tommaso P, Notredame C. TCS: a new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction. Molecular Biology and Evolution. 2014; p. 1625–1637. pmid:24694831
31. Lee TJ, Paulsen I, Karp P. Annotation-based inference of transporter function. Critical Reviews in Biochemistry and Molecular Biology. 2008;24:i259–i267.
32. Karp PD, Riley M, Paley SM, Pellegrini-Toole A. The MetaCyc database. Nucleic Acids Research. 2002;30(1):59–61. pmid:11752254
33. Reddy VS, Saier MH. BioV Suite—a collection of programs for the study of transport protein evolution. FEBS Journal. 2012;279(11):2036–2046. pmid:22568782
34. Saier MH Jr, Tran CV, Barabote RD. TCDB: the Transporter Classification Database for membrane transport protein analyses and information. Nucleic Acids Research. 2006;34(suppl_1):D181–6.
35. Tusnady GE, Simon I. The HMMTOP transmembrane topology prediction server. Bioinformatics. 2001;17(9):849–50. pmid:11590105
36. Paparoditis P, Västermark Å, Le AJ, Fuerst JA, Saier MH. Bioinformatic analyses of integral membrane transport proteins encoded within the genome of the planctomycetes species, Rhodopirellula baltica. Biochimica et Biophysica Acta (BBA)-Biomembranes. 2014;1838(1):193–215.
37. Li H, Dai X, Zhao X. A nearest neighbor approach for automated transporter prediction and categorization from protein sequences. Bioinformatics. 2008;24(9):1129–1136. pmid:18337257
38. Ren Q, Chen K, Paulsen IT. TransportDB: a comprehensive database resource for cytoplasmic membrane transport systems and outer membrane channels. Nucleic Acids Research. 2007;35:D274–D279. pmid:17135193
39. Lin H, Han L, Cai C, Ji Z, Chen Y. Prediction of transporter family from protein sequence by support vector machine approach. Proteins: Structure, Function, and Bioinformatics. 2006;62(1):218–231.
40. Smith TF, Waterman MS. Identification of common molecular subsequences. Journal of Molecular Biology. 1981;147(1):195–7. pmid:7265238
41. Dias O, Gomes D, Vilaça P, Cardoso J, Rocha M, Ferreira EC, et al. Genome-wide semi-automated annotation of transporter systems. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2017;14(2):443–456. pmid:26887005
42. Loira N, Dulermo T, Nicaud JM, Sherman DJ. A genome-scale metabolic model of the lipid-accumulating yeast Yarrowia lipolytica. BMC Systems Biology. 2012;6(1):35. pmid:22558935
43. Liou YF, Vasylenko T, Yeh CL, Lin WC, Chiu SH, Charoenkwan P, et al. SCMMTP: identifying and characterizing membrane transport proteins using propensity scores of dipeptides. BMC Genomics. 2015;16(12):S6. pmid:26677931
44. Li L, Li J, Xiao W, Li Y, Qin Y, Zhou S, et al. Prediction the substrate specificities of membrane transport proteins based on support vector machine and hybrid features. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2016;13(5):947–953. pmid:26571537
45. Gene Ontology Consortium. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Research. 2004;32(suppl_1):D258–61. pmid:14681407
46. Chou KC. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins: Structure, Function, and Bioinformatics. 2001;43(3):246–255.
47. Tanford C. Contribution of hydrophobic interactions to the stability of the globular conformation of proteins. Journal of the American Chemical Society. 1962;84(22):4240–4247.
48. Hopp TP, Woods KR. Prediction of protein antigenic determinants from amino acid sequences. Proceedings of the National Academy of Sciences. 1981;78(6):3824–3828.
49. Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research. 1994;22(22):4673–4680. pmid:7984417
50. Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, Gasteiger E, et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Research. 2003;31(1):365–370. pmid:12520024
51. Ding Z. Diversified ensemble classifiers for highly imbalanced data learning and their application in bioinformatics. Georgia State University; 2011.
52. Weiss GM, Provost F. Learning when training data are costly: The effect of class distribution on tree induction. Journal of Artificial Intelligence Research. 2003;19:315–354.
53. Bekkar M, Djemaa HK, Alitouche TA. Evaluation measures for models assessment over imbalanced data sets. Journal of Information Engineering and Applications. 2013;3(10).
54. Manning C, Raghavan P, Schütze H. Introduction to information retrieval. Natural Language Engineering. 2010;16(1):280–3.
55. Gorodkin J. Comparing two K-category assignments by a K-category correlation coefficient. Computational Biology and Chemistry. 2004;28(5):367–374. pmid:15556477
56. Kwak SG, Kim JH. Central limit theorem: the cornerstone of modern statistics. Korean Journal of Anesthesiology. 2017;70(2):144–156. pmid:28367284
57. Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a protein. Journal of Molecular Biology. 1982;157(1):105–32. pmid:7108955
58. Tsirigos KD, Peters C, Shu N, Käll L, Elofsson A. The TOPCONS web server for consensus prediction of membrane protein topology and signal peptides. Nucleic Acids Research. 2015;43(W1)W401–W407. pmid:25969446
59. Tsirigos KD, Elofsson A, Bagos PG. PRED-TMBB2: improved topology prediction and detection of beta-barrel outer membrane proteins. Bioinformatics. 2016;32(17):i665–i671. pmid:27587687
Repository Staff Only: item control page