Cueva, David (2007) A Hilbert space compression framework for parallel relational OLAP. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
4MBMR28950.pdf - Accepted Version |
Abstract
The Data Cube is the central abstraction behind the power of On-Line Analytical Processing (OLAP) systems. It enables knowledge workers to analyze vast amounts of enterprise data, in order to make timely and informed decisions. Nevertheless, this potential is accompanied by massive amounts of storage requirements. This fact challenges even the most powerful parallel systems to efficiently pack and manage the data, allowing the delivery of fast responses to user queries. An ample number of methods concentrate on generic data compression, but relatively few have been proposed which fulfill the requirements of distributed OLAP environments. This thesis investigates these opportunities by presenting a compression framework specially tailored for parallel OLAP. New algorithms for data and index compression are proposed. The data encoding method is based on the Hilbert space-filling curve. It eliminates the inherent redundancies in OLAP data and preserves compression granularity at the block level. The index encoding technique is based on the concept of packed R-trees, and when combined with our native query engine provides fast and I/O efficient access to the specific blocks satisfying a user request. Additionally, unlimited scalability is achieved by a set of supporting techniques permitting the framework to handle arbitrarily large data sets. We have performed a broad evaluation of our framework, including comparison benchmarks with several influential published methods. Our compression algorithms deliver state-of-the-art ratios with averages above 80% for data and 95% for indexes. Finally, the number of blocks accessed by the query engine is typically reduced by a factor of 10
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | Cueva, David |
Pagination: | xii, 117 leaves : ill. ; 29 cm. |
Institution: | Concordia University |
Degree Name: | M. Comp. Sc. |
Program: | Computer Science and Software Engineering |
Date: | 2007 |
Thesis Supervisor(s): | Eavis, Todd |
Identification Number: | LE 3 C66C67M 2007 C84 |
ID Code: | 975319 |
Deposited By: | Concordia University Library |
Deposited On: | 22 Jan 2013 16:06 |
Last Modified: | 13 Jul 2020 20:07 |
Related URLs: |
Repository Staff Only: item control page