Login | Register

Reverse engineering and optimisation of the BLASTP program

Title:

Reverse engineering and optimisation of the BLASTP program

Delaney, Shawn (1998) Reverse engineering and optimisation of the BLASTP program. Masters thesis, Concordia University.

[thumbnail of MQ39484.pdf]
Preview
Text (application/pdf)
MQ39484.pdf
3MB

Abstract

The BLASTP (Basic Local Alignment Search Tool for Proteins) is a popular protein database search program. The BLASTP algorithm consists of three steps: (1) Constructing the set of neighbourhood words, (2) Scanning a database sequence for word hits, and (3) Extending a word hit. By using a reverse engineering approach, the architecture of the program that implements the algorithm is obtained and described. It was found that approximately 90% of the program's execution time is spent doing the third step of the algorithm, extending word hits. This work describes three optimisations to the algorithm that significantly decrease the program's execution time. The optimisations are of two types: (1) Two new sequence representations that facilitate the extension step and are only used in the extension procedure and (2) A scanning procedure that restricts the number of invocations of the extension procedure. The first optimisation represents the query as a sequence of memory addresses. These addresses are those of the rows of the residue pair score matrix which correspond to a particular residue. This representation reduces the number of instructions executed per matrix access. The second optimisation represents the query and database sequence as a sequence of residue-doublets. A doublet consists of two adjacent residues. Extensions are done in steps of aligned residue-doublet pairs. The third optimisation counts the number of word hits per aligned segments of the query and database sequence. The extension procedure is invoked only if the number of hits per alignment meets a threshold criteria. Each optimised algorithm is implemented in the BLASTP version 1.4 program. A comparison of performance data between each of the optimised programs and the unmodified program shows a performance increase of 15%, 48% and 63% respectively.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Thesis (Masters)
Authors:Delaney, Shawn
Pagination:ix, 89 leaves ; 29 cm.
Institution:Concordia University
Degree Name:M. Comp. Sc.
Program:Dept. of Computer Science
Date:1998
Thesis Supervisor(s):Butler, Gregory
Identification Number:QP 551 D43 1998
ID Code:595
Deposited By: Concordia University Library
Deposited On:27 Aug 2009 17:13
Last Modified:13 Jul 2020 19:47
Related URLs:
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top