Lazar, Iustin (1998) A multi-level nearest-neighbour algorithm for predicting protein secondary structure. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
3MBMQ39987.pdf |
Abstract
A thesis on machine learning and prediction of protein secondary structure. We develop a variation of the nearest-neighbour algorithm that adopts a multi-level strategy together with a variable window size. The algorithm is applied to the problem of predicting the secondary structure of a protein given its primary structure: that is, given a sequence of amino-acids, output a sequence of secondary structures (helix, sheet, or coil). A new training set is developed that is orthogonal, and covers the known classes of proteins. Overall accuracy is 65.0%, with 68.7% accuracy for helices, 66.3% accuracy for sheets, and 61.4% for coils. This compares well with existing methods, in that the best results for a single nearest-neighbour classifier is 65.1% by Salzberg and Cost in 1992. Our accuracy rate for sheets is better than known methods, but our accuracy rate for coils is much lower than existing methods.
| Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering |
|---|---|
| Item Type: | Thesis (Masters) |
| Authors: | Lazar, Iustin |
| Pagination: | viii, 120 leaves : ill. ; 29 cm. |
| Institution: | Concordia University |
| Degree Name: | M. Comp. Sc. |
| Program: | Computer Science and Software Engineering |
| Date: | 1998 |
| Thesis Supervisor(s): | Butler, Gregory |
| Identification Number: | QP 551 L39 1998 |
| ID Code: | 507 |
| Deposited By: | lib-batchimporter |
| Deposited On: | 27 Aug 2009 17:12 |
| Last Modified: | 13 Jul 2020 19:46 |
| Related URLs: |
Repository Staff Only: item control page


Download Statistics
Download Statistics