Login | Register

Towards Better Clustering: From Quality Criteria to Advanced Hierarchical Algorithms

Title:

Towards Better Clustering: From Quality Criteria to Advanced Hierarchical Algorithms

Yao, Jinli (2025) Towards Better Clustering: From Quality Criteria to Advanced Hierarchical Algorithms. PhD thesis, Concordia University.

[thumbnail of Yao_PhD_F2025.pdf]
Preview
Text (application/pdf)
Yao_PhD_F2025.pdf - Accepted Version
Available under License Spectrum Terms of Access.
23MB

Abstract

Clustering is a cornerstone of unsupervised learning, offering powerful tools for uncovering patterns and natural groupings in unlabeled data. Despite its extensive applications across diverse
fields, clustering research faces persistent challenges, including inconsistent definitions, varied evaluation criteria, and difficulties in handling complex data characteristics. This thesis addresses these
challenges by integrating theoretical insights with algorithmic innovations to enhance clustering methodologies and their applicability.
The first part of this work explores the fundamental question, ”What defines a good cluster?” Through a systematic review of clustering criteria, principles, and evaluation metrics, it highlights the diversity of clustering algorithms and the challenges posed by high-dimensional, overlapping, and varied-density data. This foundational analysis establishes a structured understanding of clustering quality and its implications for algorithm design.
Building on these principles, the thesis introduces Gauging-δ, a nonparametric hierarchical clustering algorithm capable of handling diverse cluster shapes. Employing an adaptive mergeability function, the algorithm iteratively merges clusters based on local data statistics and environmental factors. Rigorous experiments on synthetic and real-world datasets demonstrate its robustness in
identifying well-separated clusters and its sensitivity to feature and distance metric selection.
The thesis further presents Gauging-β, a density-aware hierarchical clustering algorithm addressing challenges in data separation. The proposed algorithm leverages density-based methods to identify and remove border points, effectively separating data sets. Gauging-δ is then applied to the remaining points to generate the main clusters. Finally, the border points are reintegrated into the formed clusters. Experimental results demonstrate that the algorithm is capable of handling both convex and non-convex, as well as well-separated and poorly-separated data sets. The impact of parameter settings on clustering outcomes is thoroughly investigated. Further experiments on real-world data sets reveal that the consistency of clustering results with classification labels strongly depends on an appropriate measure of sample similarity.
Together, these three components offer a coherent approach to clustering, from clarifying theoretical concepts of cluster quality to developing algorithms capable of identifying meaningful clusters in various synthetic and complex real-world datasets.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering
Item Type:Thesis (PhD)
Authors:Yao, Jinli
Institution:Concordia University
Degree Name:Ph. D.
Program:Information and Systems Engineering
Date:7 July 2025
Thesis Supervisor(s):Zeng, Yong
ID Code:996054
Deposited By: Jinli Yao
Deposited On:04 Nov 2025 16:47
Last Modified:04 Nov 2025 16:47
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top