ElBakry, Ola (2013) Analysis of Gene Expression Microarray Time Series Data. PhD thesis, Concordia University.
| Preview | 
Text (application/pdf)
1MB ElBakry_PhD_S2013.pdf - Accepted Version Available under License Spectrum Terms of Access. | 
Abstract
Regulatory interactions among genes and gene products are dynamic processes, and hence, modeling these processes is essential. In recent years, research efforts in the field of microarray data analysis have been constantly increasing due to the rapid growth of microarray technology, and due to the growing interest in the understanding of complex diseases. It is of vital importance to identify and characterize changes in gene expression over time. Since genes work in a cascade of networks, reconstruction of gene regulatory networks is a crucial process for a thorough understanding of the underlying biological interactions. Analysis of large scale microarray data is a challenging problem, where most of the microarray time series have only five to ten time points and the conventional time analysis techniques are not applicable. 
The present study focuses on two important aspects of the microarray data analysis. The first part is concerned with the identification of the differentially expressed genes, whereas the second part with the reconstruction of the gene regulatory networks. New computational methods for time course microarray data that assist in analyzing and modeling the dynamics of the gene regulations are developed in this study.
The main challenges in the identification of differently expressed genes arise due to the availability of a very small number of replicated samples (usually two or three samples) in the face of a huge number of genes (thousands of genes). Further, most of the previous works, in this area have focused on static gene expressions, with only a limited number on methods for selecting the genes that exhibit changes with time. In the first part of this study, a general statistical method for detecting changes in microarray expression over time within a single or multiple biological groups is presented. The method is based on repeated measures (RM) ANOVA, in which, unlike the classical F-statistic, statistical significance is determined by taking into account the time dependency of the microarray data. A correction factor for this RM F-statistic that leads to higher sensitivity as well as a high specificity is introduced. The two approaches for calculating the p-values that exist in the literature, that is, those resampling techniques of gene-wise p-values and pooled p-values, are investigated. It is shown that the pooled p-values method compared to the method of the gene-wise p-values is more powerful and computationally less expensive, and hence it is applied along with the correction factor introduced to various synthetic data sets and a real data set. The results from the synthetic data sets show that the proposed technique outperforms the state-of-the-art methods, whereas those from using the real data set are found to be consistent with the existing knowledge concerning the presence of the genes. 
As for the reconstruction of gene regulatory networks, challenges, such as the relatively large number of genes compared to the small number of time points, result in an underdetermined problem. Additional constraints and information are needed to be able to capture the gene regulatory dynamics. Since gene regulatory interactions involve underlying biological processes, such as transcription and translation that take place at different time points, the consideration of different delays is a very crucial, yet a demanding problem. In the second part of this study, an approach based on pair-wise correlations and lasso that take into account the different time delays between various genes, is presented to infer gene regulatory networks. The proposed method is applied to both synthetic and real data sets. The results from the synthetic data show that the proposed approach outperforms the existing methods, and the results from the real data are found to be more consistent with the existing knowledge concerning the possible gene interactions. 
The study on the identification of differentially expressed genes and the reconstruction of the gene regulatory networks, undertaken in this thesis, can be regarded to be directed towards a better understanding of the cellular dynamics.
| Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Electrical and Computer Engineering | 
|---|---|
| Item Type: | Thesis (PhD) | 
| Authors: | ElBakry, Ola | 
| Institution: | Concordia University | 
| Degree Name: | Ph. D. | 
| Program: | Electrical and Computer Engineering | 
| Date: | 26 April 2013 | 
| Thesis Supervisor(s): | Ahmad, M.O. and Swamy, M.N.S. | 
| ID Code: | 977193 | 
| Deposited By: | OLA EL BAKRY | 
| Deposited On: | 17 Jun 2013 15:57 | 
| Last Modified: | 18 Jan 2018 17:44 | 
Repository Staff Only: item control page


 Download Statistics
 Download Statistics Download Statistics
 Download Statistics