Login | Register

Measures and adjustments of pattern frequency distributions

Title:

Measures and adjustments of pattern frequency distributions

Wang, Tongyuan (2010) Measures and adjustments of pattern frequency distributions. PhD thesis, Concordia University.

[img]
Preview
Text (application/pdf)
NR67321.pdf - Accepted Version
6MB

Abstract

Frequent pattern mining over large databases is fundamental to many data mining applications, where pattern frequency distribution plays a central role. Various approaches have been proposed for pattern mining with respectable computational performance. However, the appropriate evaluation of the pattern frequentness and the refinement of the mining result set are somewhat ignored. This has created a set of problems in conventional mining approaches which are identified in this thesis. Most conventional mining approaches evaluate pattern frequentness with an ill formed "support" measure, and generate patterns with full enumeration mode which produces excessive number of patterns in an application. Consequently, the mining result sets exhibit among other issues those of overfitting and underfitting, probability anomaly and bias for generated against original observations. Even worse, these results are delivered to users without any refinement. Overcoming these drawbacks is challenging, since these problems are rather philosophical than computational and hence their resolution demands a well established theory to reform the mining foundations and to pursue graceful knowledge degeneration. Based on the problems identified, this thesis first proposes a reformulation of the frequentness measure, which effectively resolves the probability anomaly and other related issues. To deal with the profound full enumeration mode, we first explore a set of properties governing raw pattern frequency distributions, such that a number of important mining parameters can be predetermined Based on these explorations, an approach to adjust the raw pattern frequency distributions is established and its theoretical merits are justified. This refinement theory shows that unconditional pattern reduction is achievable before domain constraints are imposed. The thesis then presents a maximum likelihood pattern sampling model and strategies to realize the adjustment. Findings presented in this thesis are based on known set theory, combinatorics, and probability theory, and they are theoretically fundamental and applicable to every item based or key words based pattern mining and the improvement of mining effectiveness. We expect that these findings would pave a way to replace the full enumeration pattern generation with selective generation mode, which would then radically change the state of the art of pattern mining.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Thesis (PhD)
Authors:Wang, Tongyuan
Pagination:viii, 175 leaves : ill. ; 29 cm.
Institution:Concordia University
Degree Name:Ph. D.
Program:Computer Science and Software Engineering
Date:2010
Thesis Supervisor(s):Desai, Bipin C
ID Code:979289
Deposited By: Concordia University Library
Deposited On:09 Dec 2014 17:56
Last Modified:18 Jan 2018 17:48
Related URLs:
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Back to top Back to top