Kakumani, Raja Sekhar (2013) Analysis of Genomic and Proteomic Sequences using DSP Techniques. PhD thesis, Concordia University.
Preview |
Text (application/pdf)
6MBKakumani_PhD_S2013.pdf - Accepted Version |
Abstract
Analysis of biological sequences by detecting the hidden periodicities and symbolic patterns has been an active area of research since couple of decades. The hidden periodic components and the patterns help locating the biologically relevant motifs such as protein coding regions (exons), CpG islands (CGI) and hot-spots that characterize various biological functions. The discrete nature of biological sequences has prompted many researchers to use digital signal processing (DSP) techniques for their analysis. After mapping the biological sequences to numerical sequences, various DSP techniques using digital filters, wavelets, neural networks, filter banks etc. have been developed to detect the hidden periodicities and recurring patterns in these sequences. This thesis attempts to develop effective DSP based techniques to solve some of the important problems in biological sequence analysis. Specifically, DSP techniques such as statistically optimal null filters (SONF), matched filters and neural networks based algorithms are developed for the analysis of deoxyribonucleic acid (DNA), ribonucleic acid (RNA) and protein sequences.
In the first part of this study, DNA sequences are investigated in order to identify the locations of CGIs and protein coding regions, i.e., exons. SONFs, which are known for their ability to efficiently estimate short-duration signals embedded in noise by combining the maximum signal-to-noise ratio and the least squares optimization criteria, are utilized to solve these problems. Basis sequences characterizing CGIs and exons are formulated to be used in SONF technique for solving the problems.
In the second part of this study, RNA sequences are analyzed to predict their secondary structures. For this purpose, matched filters based on 2-dimensional convolution are developed to identify the locations of stem and loop patterns in the RNA secondary structure. The knowledge of the stem and loop patterns thus obtained are then used to predict the presence of pseudoknot, leading to the determination of the entire RNA secondary structure.
Finally, in the third part of this thesis, protein sequences are analyzed to solve the problems of predicting protein secondary structure and identifying the locations of hot-spots. For predicting the protein secondary structure a two-stage neural network scheme is developed, whereas for predicting the locations of hot-spots an SONF based approach is proposed. Hot-spots in proteins exhibit a characteristic frequency corresponding to their biological function. A basis function is formulated based on this characteristic frequency to be used in SONFs to detect the locations of hot-spots belonging to the corresponding functional group.
Extensive experiments are performed throughout the thesis to demonstrate the effectiveness and validity of the various schemes and techniques developed in this investigation. The performance of the proposed techniques is compared with that of the previously reported techniques for the analysis of biological sequences. For this purpose, the results obtained are validated using databases containing with known annotations. It is shown that the proposed schemes result in performance superior to those of some of the existing techniques.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Electrical and Computer Engineering |
---|---|
Item Type: | Thesis (PhD) |
Authors: | Kakumani, Raja Sekhar |
Institution: | Concordia University |
Degree Name: | Ph. D. |
Program: | Electrical and Computer Engineering |
Date: | 12 March 2013 |
Thesis Supervisor(s): | Ahmad, M. Omair and Devabhaktuni, Vijay |
ID Code: | 977152 |
Deposited By: | RAJASEKHAR KAKUMANI |
Deposited On: | 17 Jun 2013 16:02 |
Last Modified: | 11 Aug 2023 17:07 |
Repository Staff Only: item control page