Mazaheri, Mandana (2021) A recommender system for scientific datasets and analysis pipelines. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
830kBMazaheri_Master_F2021.pdf - Accepted Version |
Abstract
Scientific datasets and analysis pipelines are increasingly being shared
publicly in the interest of open science.
However, mechanisms are lacking to reliably identify which pipelines
and datasets can appropriately be used together. Given the increasing number of high-quality public datasets and
pipelines, this lack of clear compatibility threatens the
findability and reusability of these resources. We investigate
the feasibility of a collaborative filtering system to recommend pipelines
and datasets based on provenance records from previous executions.
We evaluate our system using datasets and pipelines extracted from the
Canadian Open Neuroscience Platform, a national initiative for open
neuroscience. The recommendations provided by our system (AUC$=0.83$) are
significantly better than chance and outperform recommendations made by
domain experts using their previous knowledge as well as pipeline and dataset descriptions (AUC$=0.63$). In particular, domain experts often neglect
low-level technical aspects of a pipeline-dataset interaction, such as the level of pre-processing, which are
captured by a provenance-based system. We conclude that provenance-based
pipeline and dataset recommenders are feasible and beneficial to
the sharing and usage of open-science resources. Future
work will focus on the collection of more
comprehensive provenance traces, and on deploying the system in production.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | Mazaheri, Mandana |
Institution: | Concordia University |
Degree Name: | M. Sc. |
Program: | Computer Science |
Date: | October 2021 |
Thesis Supervisor(s): | Glatard, Tristan |
ID Code: | 989068 |
Deposited By: | Mandana Mazaheri |
Deposited On: | 29 Nov 2021 17:03 |
Last Modified: | 29 Nov 2021 17:03 |
Related URLs: |
Repository Staff Only: item control page