Efficiently mining frequent itemsets from very large databases

Title:

Efficiently mining frequent itemsets from very large databases

Zhu, Jianfei (2004) Efficiently mining frequent itemsets from very large databases. PhD thesis, Concordia University.

Preview

Text (application/pdf)
NQ96957.pdf - Accepted Version

5MB

Abstract

Efficient algorithms for mining frequent itemsets are crucial for mining association rules and for other data mining tasks. Methods for mining frequent itemsets and for iceberg data cube computation have been implemented using a prefix-tree structure, known as a FP-tree, for storing compressed frequency information. Numerous experimental results have demonstrated that these algorithms perform extremely well. In this thesis we present a novel FP-array technique that greatly reduces the need to traverse FP-trees, thus obtaining significantly improved performance for FP-tree based algorithms. The technique works especially well for sparse datasets. We then present new algorithms for mining all frequent itemsets, maximal frequent itemsets, and closed frequent item-sets. The algorithms use the FP-tree data structure in combination with the FP-array technique efficiently, and incorporate various optimization techniques. In the algorithm for mining maximal frequent itemsets, a variant FP-tree data structure, called a MFI-tree, and an efficient maximality-checking approach are used. Another variant FP-tree data structure, called a CFI-tree, and an efficient closedness-testing approach are also given in the algorithm for mining closed frequent itemsets. Experimental results show that our methods outperform the existing methods in not only the speed of the algorithms, but also their memory consumption and their scalability. We also notice that most algorithms for mining frequent itemsets assume that the main memory is large enough for the data structures used in the mining, and very few efficient algorithms deal with the cases when the database is very large or the minimum support is very low. We thus investigate approaches to mining frequent itemsets when data structures are too large to fit in main memory. Several divide-and-conquer algorithms are presented for mining from disks. Many novel techniques are introduced. Experimental results show that the techniques reduce the required disk accesses by orders of magnitude, and enable truly scalable data mining.

Divisions:	Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:	Thesis (PhD)
Authors:	Zhu, Jianfei
Pagination:	xii, 106 leaves : ill. ; 29 cm.
Institution:	Concordia University
Degree Name:	Ph. D.
Program:	Computer Science and Software Engineering
Date:	2004
Thesis Supervisor(s):	Grahne, Gosta
Identification Number:	QA 76.9 D343Z48 2004
ID Code:	8431
Deposited By:	lib-batchimporter
Deposited On:	18 Aug 2011 18:25
Last Modified:	13 Jul 2020 20:04
Related URLs:	https://concordiauniversity.on.worldcat....

Repository Staff Only: item control page

Download Statistics

Downloads per month over past year

Research related to the current document (at the CORE website)

Spectrum Research Repository

Efficiently mining frequent itemsets from very large databases

Efficiently mining frequent itemsets from very large databases

Abstract