Alkhodair, Sarah (2019) Assessing Trust and Veracity of Data in Social Media. PhD thesis, Concordia University.
Preview |
Text (application/pdf)
2MBAlkhodair_PhD_2019.pdf - Accepted Version |
Abstract
Social media highly impacts our knowledge and perception of the world. With the tremendous amount of data that is circulating in social media and initiated by a vast number of users from all over the world, extracting useful information from such data and assessing its veracity has become much more challenging. Data veracity refers to the trustworthiness and certainty of data. The challenges of handling textual data in social media have raised the need for efficient tools to extract, understand, and assess the veracity of information circulating in social media at a given time. In this thesis, we present three research problems to address major challenges of handling textual data in social media.
First, overwhelming the user with huge volumes of short, noisy, and unstructured textual data complicates the task of understanding what topics are discussed by users in micro-blogging websites. Topic models were proposed to automatically learn a set of keywords that better describe each topic covered by a large corpus of text documents to enable fast and effective browsing and exploration of its contents. However, in order for the results of topic modeling algorithms to be useful, these results have to be interpretable. Applying topic models to social media data to get meaningful results is not a trivial task. In this thesis, we study the problem of improving interpretation of topic modeling of micro-posts in social media. We propose a new method that incorporates topic modeling, a lexical database, and the set of hashtags available in the corpus of micro-posts to produce a higher quality representation of each extracted topic. Extensive experiments on two real-life datasets collected from Twitter show that our method outperforms the state-of-the-art model in terms of perplexity, topics' coherence, and their quality.
Second, the nature and flexibility of social media facilitate the process of posting unverified information, especially during the rapid diffusion of breaking news. Efficiently detecting and acting upon unverified breaking news rumors throughout social media is of high importance to minimizing their harmful effect. However, detecting them is not a trivial task. They belong to unseen topics or events that are not covered in the training dataset. In this thesis, we study the problem of assessing the veracity of information contained in micro-posts regarding emerging stories and topics of breaking news. We propose a new approach that jointly learns word embeddings and trains a neural network model with two different objectives to automatically identify unverified micro-posts spreading in social media during breaking news. Extensive experiments on real-life datasets show that our proposed model outperforms the state-of-the-art classifier as well as other baseline classifiers in terms of precision, recall, and F1.
Finally, the uncertainty and chaos associated with hot and sensitive breaking news and emergencies facilitate the explosive spread of high-engaging breaking news rumors that might be extremely damaging. In such a case, authorities have to prioritize the rumors verification process and act upon high-engaging breaking news rumors quickly to reduce their damaging consequences. However, this is an extremely challenging task. In this thesis, we study the problem of identifying rumors micro-posts that are most likely to become viral and achieve high engagement rates among recipients in social media during breaking news. We propose a multi-task neural network to jointly learn the two tasks of breaking news rumors detection and breaking news rumors popularity prediction. Extensive experiments on real-life datasets show that the performance of our joint learning model outperforms other baseline classifiers in terms of precision, recall, and F1 and is capable of identifying high-engaging breaking news rumors with high accuracy.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering |
---|---|
Item Type: | Thesis (PhD) |
Authors: | Alkhodair, Sarah |
Institution: | Concordia University |
Degree Name: | Ph. D. |
Program: | Information and Systems Engineering |
Date: | 4 February 2019 |
Thesis Supervisor(s): | Fung, Benjamin C. M. and Dssouli, Rachida |
ID Code: | 985284 |
Deposited By: | SARAH ALKHODAIR |
Deposited On: | 07 Jun 2019 16:42 |
Last Modified: | 07 Jun 2019 16:42 |
Repository Staff Only: item control page