Helali, Mossad (2025) Automation of Data Science Workflows via Knowledge Graphs and Table Representation Learning. PhD thesis, Concordia University.
Preview |
Text (application/pdf)
7MBHelali_PhD_S2026.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
The rapid growth of open and collaborative data science platforms has led to large, disconnected collections of artifacts, namely, datasets and code pipelines. This separation makes it difficult to reuse knowledge and automate complex data science workflows. This thesis shows that by combining the semantic representations of Knowledge Graphs (KGs) and Table Representation Learning (TRL) into a unified foundational layer, these limitations can be overcome.
The presented research demonstrates the versatility of this semantic layer through three distinct paradigms, each addressing a critical stage of the data science lifecycle. First, data discovery is formulated as a scalable, direct query process against the KG, enabling expressive, semantic searches. Second, Automated Machine Learning (AutoML) is approached as a meta-learning task where Graph Neural Networks (GNNs) learn from the collective experience encoded in the graph's structure to recommend optimal pipelines. Third, Automated Exploratory Data Analysis (AutoEDA) is realized via a Retrieval-Augmented Generation (RAG) framework that grounds Large Language Models (LLMs) with factual, verifiable context from the KG.
Through extensive evaluations on standard benchmarks, the systems developed in this research show significant improvements in accuracy, scalability, and cost-effectiveness over state-of-the-art methods. Together, this work presents a new and powerful architectural pattern for data science automation, proving that a unified semantic representation is a practical foundation for building the next generation of intelligent and collaborative data science tools.
| Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering |
|---|---|
| Item Type: | Thesis (PhD) |
| Authors: | Helali, Mossad |
| Institution: | Concordia University |
| Degree Name: | Ph. D. |
| Program: | Computer Science |
| Date: | 16 October 2025 |
| Thesis Supervisor(s): | Mansour, Essam |
| Keywords: | Knowledge Graphs, Data Science, Graph Neural Networks, Retrieval-Augmented Generation, Large Language Models, Data Discovery, AutoML, Data Exploration |
| ID Code: | 996378 |
| Deposited By: | Mossad Helali |
| Deposited On: | 29 Jun 2026 15:33 |
| Last Modified: | 29 Jun 2026 15:33 |
Repository Staff Only: item control page


Download Statistics
Download Statistics