Delaney, Shawn (1998) Reverse engineering and optimisation of the BLASTP program. Masters thesis, Concordia University.
The BLASTP (Basic Local Alignment Search Tool for Proteins) is a popular protein database search program. The BLASTP algorithm consists of three steps: (1) Constructing the set of neighbourhood words, (2) Scanning a database sequence for word hits, and (3) Extending a word hit. By using a reverse engineering approach, the architecture of the program that implements the algorithm is obtained and described. It was found that approximately 90% of the program's execution time is spent doing the third step of the algorithm, extending word hits. This work describes three optimisations to the algorithm that significantly decrease the program's execution time. The optimisations are of two types: (1) Two new sequence representations that facilitate the extension step and are only used in the extension procedure and (2) A scanning procedure that restricts the number of invocations of the extension procedure. The first optimisation represents the query as a sequence of memory addresses. These addresses are those of the rows of the residue pair score matrix which correspond to a particular residue. This representation reduces the number of instructions executed per matrix access. The second optimisation represents the query and database sequence as a sequence of residue-doublets. A doublet consists of two adjacent residues. Extensions are done in steps of aligned residue-doublet pairs. The third optimisation counts the number of word hits per aligned segments of the query and database sequence. The extension procedure is invoked only if the number of hits per alignment meets a threshold criteria. Each optimised algorithm is implemented in the BLASTP version 1.4 program. A comparison of performance data between each of the optimised programs and the unmodified program shows a performance increase of 15%, 48% and 63% respectively.
|Divisions:||Concordia University > Faculty of Engineering and Computer Science > Computer Science and Software Engineering|
|Item Type:||Thesis (Masters)|
|Pagination:||ix, 89 leaves ; 29 cm.|
|Degree Name:||Theses (M.Comp.Sc.)|
|Program:||Dept. of Computer Science|
|Thesis Supervisor(s):||Butler, Gregory|
|Deposited By:||Concordia University Libraries|
|Deposited On:||27 Aug 2009 13:13|
|Last Modified:||08 Dec 2010 10:15|
Repository Staff Only: item control page