Login | Register

Integration of Multiple Uncertain Data Sources

Title:

Integration of Multiple Uncertain Data Sources

Han, Wei (2016) Integration of Multiple Uncertain Data Sources. Masters thesis, Concordia University.

[thumbnail of Han_MCompSc_S2016.pdf]
Preview
Text (application/pdf)
Han_MCompSc_S2016.pdf - Accepted Version
826kB

Abstract

Data integration is the problem of combining data from multiple autonomous data
sources, and providing a unified view to the users. The problem has been studied
extensively over the past two decades, and focused more on integrating traditional, exact
relational data. Integration over uncertain data sources is a more recent problem
and a more challenging one. The purpose of this thesis is to understand the semantics
and techniques of uncertain data integration over multiple such data sources.
We study existing proposals for uncertain and probabilistic data integration. As a
basis of our work, we consider two integration operations, one in the possible worlds
model, and the other in a compact model. We introduce the properties of the integration
operations proposed for two sources, and consider these properties to develop
a framework for integrating multiple sources. For this, we also extend and generalize
a conversion algorithm from possible worlds model to the compact probabilistic relations.
We define the integration procedure, the concept of probability consistency,
and a probability adjustment method when the consistency is violated. We build a
running prototype of the proposed framework to show its feasibility and to automate
the probability calculation. This thesis makes a step forward to better understand
the challenges and development of uncertain data integration systems.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Thesis (Masters)
Authors:Han, Wei
Institution:Concordia University
Degree Name:M. Comp. Sc.
Program:Computer Science
Date:25 January 2016
Thesis Supervisor(s):Shiri, Nematollaah
Keywords:uncertain data integration
ID Code:980853
Deposited By: WEI HAN
Deposited On:16 Jun 2016 14:39
Last Modified:18 Jan 2018 17:52

References:

[1] Serge Abiteboul, Paris Kanellakis, and G¨osta Grahne. On the representation and
querying of sets of possible worlds. Theoretical Computer Science, 78(1):159–187,
1991.
[2] Charu C Aggarwal. Managing and Mining Uncertain Data: 3, A., volume 35.
Springer Science & Business Media, 2010.
[3] Parag Agrawal. Incorporating uncertainty in data management and integration.
August 2012.
[4] Parag Agrawal, Anish Das Sarma, Jeffrey Ullman, and Jennifer Widom. Foundations
of uncertain-data integration. Proceedings of the VLDB Endowment,
3(1-2):1080–1090, 2010.
[5] Amir Dayyan Borhanian and Fereidoon Sadri. A compact representation for efficient
uncertain-information integration. In Proceedings of the 17th International
Database Engineering & Applications Symposium, pages 122–131. ACM, 2013.
[6] Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, Daniele Nardi, and
Riccardo Rosati. Information integration: Conceptual modeling and reasoning
support. In Cooperative Information Systems, 1998. Proceedings. 3rd IFCIS
International Conference on, pages 280–289. IEEE, 1998.
[7] Thomas M Cover and Joy A Thomas. Elements of information, 1991.
[8] Nilesh Dalvi and Dan Suciu. Efficient query evaluation on probabilistic
databases. The VLDB Journal, 16(4):523–544, 2007.
[9] AnHai Doan, Alon Halevy, and Zachary Ives. Principles of data integration.
Elsevier, 2012.
[10] Xin Dong, Alon Y Halevy, and Cong Yu. Data integration with uncertainty. In
Proceedings of the 33rd international conference on Very large data bases, pages
687–698. VLDB Endowment, 2007.
[11] Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. Integrating conflicting
data: the role of source dependence. Proceedings of the VLDB Endowment,
2(1):550–561, 2009.
[12] Alon Halevy, Anand Rajaraman, and Joann Ordille. Data integration: the
teenage years. In Proceedings of the 32nd international conference on Very large
data bases, pages 9–16. VLDB Endowment, 2006.
[13] Alon Y Halevy. Answering queries using views: A survey. The VLDB Journal,
10(4):270–294, 2001.
[14] Ali Kiani and Nematollaah Shiri. A framework for information integration with
uncertainty. In Advanced Distributed Systems, pages 194–206. Springer, 2005.
[15] Maurizio Lenzerini. Data integration: A theoretical perspective. In Proceedings
of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles
of database systems, pages 233–246. ACM, 2002.
[16] Jayant Madhavan, S Jeffery, Shirley Cohen, Xin Dong, David Ko, Cong Yu, and
Alon Halevy. Web-scale data integration: You can only afford to pay as you go.
CIDR, 2007.
[17] Matteo Magnani and Danilo Montesi. Uncertainty in data integration: current
approaches and open problems. In MUD, pages 18–32, 2007.
[18] Matteo Magnani and Danilo Montesi. A survey on uncertainty management in
data integration. Journal of Data and Information Quality (JDIQ), 2(1):5, 2010.
[19] Fereidoon Sadri. On the foundations of probabilistic information integration.
In Proceedings of the 21st ACM international conference on Information and
knowledge management, pages 882–891. ACM, 2012.
[20] Fereidoon Sadri. Belief revision in uncertain data integration. In Databases
Theory and Applications, pages 78–90. Springer, 2015.
[21] Anish Das Sarma. Managing uncertain data. PhD thesis, Stanford InfoLab,
2009.
[22] Gayatri Tallur. Uncertain data integration with probabilities. The University of
North Carolina at Greensboro, 2013.
[23] Jennifer Widom. Trio: A system for integrated management of data, accuracy,
and lineage. Technical Report, 2004.
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top