Nemalhabib, Aida (2006) A cohesion-based clustering technique for categorical data. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
3MBMR14331.pdf - Accepted Version |
Abstract
Clustering is a technique which aims to partition a given dataset of objects into groups of similar objects. In this work, we consider categorical data, which are unordered unlike numerical data. This makes clustering such data a more challenging task. We propose a clustering technique for categorical data, which uses a novel similarity function, called cohesion , to measure the degree to which objects "stick" to clusters. We have implemented this technique, to which we refer as CLUC ( CLU stering with C ohesion). To evaluate CLUC, we compared its results with those produced by well-known clustering algorithms. The results of our extensive experiments on real and synthetic datasets show that CLUC generates high quality clusters which conform better to clusterings by human experts. For some well-known real datasets, CLUC even discovers clusterings identical to those provided by experts. Our results also indicate that CLUC is order insensitive in general and is scalable when the dataset grows in size (the number of objects) and/or dimensions (attributes)
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | Nemalhabib, Aida |
Pagination: | x, 87 leaves : ill. ; 29 cm. |
Institution: | Concordia University |
Degree Name: | M. Comp. Sc. |
Program: | Computer Science and Software Engineering |
Date: | 2006 |
Thesis Supervisor(s): | Shiri, Nematollaah |
Identification Number: | LE 3 C66C67M 2006 N46 |
ID Code: | 8846 |
Deposited By: | Concordia University Library |
Deposited On: | 18 Aug 2011 18:37 |
Last Modified: | 13 Jul 2020 20:05 |
Related URLs: |
Repository Staff Only: item control page