Login | Register

Fast computation of supermaximal repeats in DNA sequences


Fast computation of supermaximal repeats in DNA sequences

Lian, Chen Na (2009) Fast computation of supermaximal repeats in DNA sequences. Masters thesis, Concordia University.

[thumbnail of MR63238.pdf]
Text (application/pdf)
MR63238.pdf - Accepted Version


Searching for repetitive structures in DNA sequences is a major problem in bioinformatics research. We propose a novel index structure, called Parent-of-Leaves (POL) index and an algorithm for finding supermaximal repeats (SMR) which uses the index. The index is derived from and designed to replace the more versatile, but considerably larger suffix tree index STTD64. The results of our experiments using 24 homo sapiens chromosomes indicate that SMR significantly outperforms the Vmatch tool, the best known software package. Using constructed POL index, SMR is 2 times faster than Vmatch in searching for supermaximal repeats of size at least 10 bases. SMR is 7 times faster for repeats of minimum length of 25 nucleotide bases, and about an order of magnitude faster for repeats of length at least 200 basis. We also studied the cost of constructing the POL index, and the number of times we need to run SMR in order for the cost to payoff. The results indicate that our proposed technique outperforms Vmatch after two runs on a particular sequence using the POL25 index which has minimum index length (MIL) of 25 nucleotides, 3 runs with POL10, 5 runs with POL100, and 10 runs with POL200. The storage requirements of various POL indexes are much less than the suffix tree index used, about 200 times smaller for POL200 and POL100, and 25 times smaller for POL25. POL10 requires the largest storage space, which is one quarter the size of the STTD64 index.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Thesis (Masters)
Authors:Lian, Chen Na
Pagination:xi, 80 leaves : ill. ; 29 cm.
Institution:Concordia University
Degree Name:M. Comp. Sc.
Program:Computer Science and Software Engineering
Thesis Supervisor(s):Shiri, Nematollaah
Identification Number:LE 3 C66C67M 2009 L52
ID Code:976315
Deposited By: Concordia University Library
Deposited On:22 Jan 2013 16:23
Last Modified:13 Jul 2020 20:09
Related URLs:
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top