Gurpur, Ashwin Bhat (2005) The development of a tool for mapping protein mutations to sequence structures. Masters thesis, Concordia University.
- Accepted Version
Related work has been done in the NLP area to extract protein mutation information directly from PubMed papers and storing it in an XML file. This thesis describes a tool that processes this NLP output for the purpose of visualizing the mutations. The tool uses the NLP output file as input and extracts the details of the protein being discussed, along with the mutation information and these details are used to extract the sequence information from the NCBI protein database. Next, for each protein, it extracts the conserved domain information from the NCBI conserved domain database. Each extracted sequence is split into its respective conserved domains and these are placed sequentially. ClustalW and Alistat are used to remove sequences that fall below a particular threshold. For the remaining sequences, a consensus sequence is generated and a structure that best matches it, is selected. Mutations corresponding to the remaining sequences are mapped on to the structure and a reliability score is calculated. All this information is written on to a visualization file. This is the final output of this tool. This file can be uploaded to the PROSAT protein visualization tool and the mutations can be visualized. The results obtained when the tool was tested on three protein families---xylanases, dehalogenases and biphenyl dioxygenase are presented.
|Divisions:||Concordia University > Faculty of Engineering and Computer Science > Computer Science and Software Engineering|
|Item Type:||Thesis (Masters)|
|Authors:||Gurpur, Ashwin Bhat|
|Pagination:||x, 102 leaves ; 29 cm.|
|Degree Name:||M. Comp. Sc.|
|Program:||Computer Science and Software Engineering|
|Thesis Supervisor(s):||Butler, Gregory|
|Deposited By:||Concordia University Libraries|
|Deposited On:||18 Aug 2011 18:26|
|Last Modified:||18 Aug 2011 18:26|
Repository Staff Only: item control page