Integration of Multiple Uncertain Data Sources

Han, Wei (2016) Integration of Multiple Uncertain Data Sources. Masters thesis, Concordia University.

Data integration is the problem of combining data from multiple autonomous data
sources, and providing a unified view to the users. The problem has been studied
extensively over the past two decades, and focused more on integrating traditional, exact
relational data. Integration over uncertain data sources is a more recent problem
and a more challenging one. The purpose of this thesis is to understand the semantics
and techniques of uncertain data integration over multiple such data sources.
We study existing proposals for uncertain and probabilistic data integration. As a
basis of our work, we consider two integration operations, one in the possible worlds
model, and the other in a compact model. We introduce the properties of the integration
operations proposed for two sources, and consider these properties to develop
a framework for integrating multiple sources. For this, we also extend and generalize
a conversion algorithm from possible worlds model to the compact probabilistic relations.
We define the integration procedure, the concept of probability consistency,
and a probability adjustment method when the consistency is violated. We build a
running prototype of the proposed framework to show its feasibility and to automate
the probability calculation. This thesis makes a step forward to better understand
the challenges and development of uncertain data integration systems.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Thesis (Masters)
Authors:Han, Wei
Institution:Concordia University
Degree Name:M. Comp. Sc.
Program:Computer Science
Date:25 January 2016
Thesis Supervisor(s):Shiri, Nematollaah
Keywords:uncertain data integration
ID Code:980853
Deposited By: WEI HAN
Deposited On:16 Jun 2016 14:39
Last Modified:18 Jan 2018 17:52


