Login | Register

Finding usage patterns from generalized weblog data


Finding usage patterns from generalized weblog data

Hasan, Tahira (2009) Finding usage patterns from generalized weblog data. Masters thesis, Concordia University.

[thumbnail of MR63172.pdf]
Text (application/pdf)
MR63172.pdf - Accepted Version


Buried in the enormous, heterogeneous and distributed information, contained in the web server access logs, is knowledge with great potential value. As websites continue to grow in number and complexity, web usage mining systems face two significant challenges - scalability and accuracy. This thesis develops a web data generalization technique and incorporates it into the web usage mining framework in an attempt to exploit this information-rich source of data for effective and efficient pattern discovery. Given a concept hierarchy on the web pages, generalization replaces actual page-clicks with their general concepts. Existing methods do this by taking a level-based cut through the concept hierarchy. This adversely affects the quality of mined patterns since, depending on the depth of the chosen level, either significant pages of user interests get coalesced, or many insignificant concepts are retained. We present a usage driven concept ascension algorithm, which only preserves significant items, possibly at different levels in the hierarchy. Concept usage is estimated using a small stratified sample of the large weblog data. A usage threshold is then used to define the nodes to be pruned in the hierarchy for generalization. Our experiments on large real weblog data demonstrate improved performance in terms of quality and computation time of the pattern discovery process. Our algorithm yields an effective and scalable tool for web usage mining.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Thesis (Masters)
Authors:Hasan, Tahira
Pagination:x, 86 leaves ; 29 cm.
Institution:Concordia University
Degree Name:M. Comp. Sc.
Program:Computer Science and Software Engineering
Thesis Supervisor(s):Mudur, P and Shiri, Nematollaah
Identification Number:LE 3 C66C67M 2009 H37
ID Code:976367
Deposited By: Concordia University Library
Deposited On:22 Jan 2013 16:24
Last Modified:13 Jul 2020 20:10
Related URLs:
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top