Login | Register

Measurement Framework for Assessing Quality of Big Data (MEGA) in Big Data Pipelines

Title:

Measurement Framework for Assessing Quality of Big Data (MEGA) in Big Data Pipelines

Bhardwaj, Dave (2021) Measurement Framework for Assessing Quality of Big Data (MEGA) in Big Data Pipelines. Masters thesis, Concordia University.

[thumbnail of Bhardwaj_MSc_F2021.pdf]
Preview
Text (application/pdf)
Bhardwaj_MSc_F2021.pdf - Accepted Version
Available under License Spectrum Terms of Access.
4MB

Abstract

ABSTRACT
Measurement Framework for Assessing Quality of Big Data (MEGA) in Big Data Pipelines

Dave Bhardwaj
Concordia University, 2021
Big Data is used widely in the decision-making process and businesses have seen just how powerful data can be, especially for areas such as advertising and marketing. As institutions begin relying on their Big Data systems to make more informed and strategic business decisions, the importance of the underlying data quality becomes extremely significant. In our research this is accomplished through studying and automating the quality characteristics of Big Data, more specifically, through the V’s of Big Data.

In this thesis, our aim is to not only present researchers with useful Big Data quality measurements, but to bridge the gap between theoretical measurement models of Big Data quality characteristics and the application of these metrics to real world Big Data Systems. Therefore, our thesis proposes a framework (The MEGA Framework) that can be applied to Big Data Pipelines in order to facilitate the extraction and interpretation of Big Data V’s measurement indicators. The proposed framework allows the application of Big Data V’s measurements at any phase of the architecture process in order to flag quality anomalies of the underlying data, before they can negatively impact the decision-making process. The theoretical quality measurement models for six of the Big Data V’s, namely Volume, Variety, Velocity, Veracity, Validity, and Vincularity, are currently automated.

The novelty of the MEGA approach includes the ability to: i) process both structured and unstructured data, ii) track a variety of quality indicators defined for the V’s, iii) flag datasets that pass a certain quality threshold, and iv) define a general infrastructure for collecting, analyzing, and reporting the V's measurement indicators for trustworthy and meaningful decision-making.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Thesis (Masters)
Authors:Bhardwaj, Dave
Institution:Concordia University
Degree Name:M. Comp. Sc.
Program:Computer Science
Date:15 September 2021
Thesis Supervisor(s):Ormandjieva, Olga
ID Code:988957
Deposited By: Dave Bhardwaj
Deposited On:29 Nov 2021 16:27
Last Modified:01 Sep 2022 00:00
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top