Hasan, Tahira (2009) Finding usage patterns from generalized weblog data. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
3MBMR63172.pdf - Accepted Version |
Abstract
Buried in the enormous, heterogeneous and distributed information, contained in the web server access logs, is knowledge with great potential value. As websites continue to grow in number and complexity, web usage mining systems face two significant challenges - scalability and accuracy. This thesis develops a web data generalization technique and incorporates it into the web usage mining framework in an attempt to exploit this information-rich source of data for effective and efficient pattern discovery. Given a concept hierarchy on the web pages, generalization replaces actual page-clicks with their general concepts. Existing methods do this by taking a level-based cut through the concept hierarchy. This adversely affects the quality of mined patterns since, depending on the depth of the chosen level, either significant pages of user interests get coalesced, or many insignificant concepts are retained. We present a usage driven concept ascension algorithm, which only preserves significant items, possibly at different levels in the hierarchy. Concept usage is estimated using a small stratified sample of the large weblog data. A usage threshold is then used to define the nodes to be pruned in the hierarchy for generalization. Our experiments on large real weblog data demonstrate improved performance in terms of quality and computation time of the pattern discovery process. Our algorithm yields an effective and scalable tool for web usage mining.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | Hasan, Tahira |
Pagination: | x, 86 leaves ; 29 cm. |
Institution: | Concordia University |
Degree Name: | M. Comp. Sc. |
Program: | Computer Science and Software Engineering |
Date: | 2009 |
Thesis Supervisor(s): | Mudur, P and Shiri, Nematollaah |
Identification Number: | LE 3 C66C67M 2009 H37 |
ID Code: | 976367 |
Deposited By: | Concordia University Library |
Deposited On: | 22 Jan 2013 16:24 |
Last Modified: | 13 Jul 2020 20:10 |
Related URLs: |
Repository Staff Only: item control page