Login | Register

Statistical Analysis of Spherical Data: Clustering, Feature Selection and Applications


Statistical Analysis of Spherical Data: Clustering, Feature Selection and Applications

Amayri, Ola (2014) Statistical Analysis of Spherical Data: Clustering, Feature Selection and Applications. PhD thesis, Concordia University.

[thumbnail of Amayri_PhD_F2014.pdf]
Text (application/pdf)
Amayri_PhD_F2014.pdf - Accepted Version


In the light of interdisciplinary applications, data to be studied and analyzed have witnessed a growth in volume and change in their intrinsic structure and type. In other words, in practice the diversity of resources generating objects have imposed several challenges for decision maker to determine informative data in terms of time, model capability, scalability and knowledge discovery.
Thus, it is highly desirable to be able to extract patterns of interest that support the decision of data management. Clustering, among other machine learning approaches, is an important data engineering technique that empowers the automatic discovery of similar object’s clusters and the
consequent assignment of new unseen objects to appropriate clusters. In this context, the majority of current research does not completely address the true structure and nature of data for particular application at hand. In contrast to most previous research, our proposed work focuses on the modeling and classification of spherical data that are naturally generated in many data mining and knowledge discovery applications. Thus, in this thesis we propose several estimation and feature selection frameworks based on Langevin distribution which are devoted to spherical patterns in offline and online settings.
In this thesis, we first formulate a unified probabilistic framework, where we build probabilistic kernels based on Fisher score and information divergences from finite Langevin mixture for Support Vector Machine. We are motivated by the fact that the blending of generative and discriminative approaches has prevailed by exploring and adopting distinct characteristic of each approach
toward constructing a complementary system combining the best of both.
Due to the high demand to construct compact and accurate statistical models that are automatically adjustable to dynamic changes, next in this thesis, we propose probabilistic frameworks for high-dimensional spherical data modeling based on finite Langevin mixtures that allow simultaneous clustering and feature selection in offline and online settings. To this end, we adopted finite mixture models which have long been heavily relied on deterministic learning approaches such as maximum likelihood estimation. Despite their successful utilization in wide spectrum of areas, these approaches have several drawbacks as we will discuss in this thesis. An alternative approach is the adoption of Bayesian inference that naturally addresses data uncertainty while ensuring good generalization. To address this issue, we also propose a Bayesian approach for finite Langevin mixture model estimation and selection.
When data change dynamically and grow drastically, finite mixture is not always a feasible solution.
In contrast with previous approaches, which suppose an unknown finite number of mixture components, we finally propose a nonparametric Bayesian approach which assumes an infinite
number of components. We further enhance our model by simultaneously detecting informative features in the process of clustering. Through extensive empirical experiments, we demonstrate the merits of the proposed learning frameworks on diverse high dimensional datasets and challenging real-world applications.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Electrical and Computer Engineering
Item Type:Thesis (PhD)
Authors:Amayri, Ola
Institution:Concordia University
Degree Name:Ph. D.
Program:Electrical and Computer Engineering
Date:28 September 2014
Thesis Supervisor(s):Bouguila, Nizar
ID Code:978684
Deposited On:26 Nov 2014 13:47
Last Modified:18 Jan 2018 17:47
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top