Login | Register

A recommender system for scientific datasets and analysis pipelines

Title:

A recommender system for scientific datasets and analysis pipelines

Mazaheri, Mandana (2021) A recommender system for scientific datasets and analysis pipelines. Masters thesis, Concordia University.

[thumbnail of Mazaheri_Master_F2021.pdf]
Preview
Text (application/pdf)
Mazaheri_Master_F2021.pdf - Accepted Version
830kB

Abstract

Scientific datasets and analysis pipelines are increasingly being shared
publicly in the interest of open science.
However, mechanisms are lacking to reliably identify which pipelines
and datasets can appropriately be used together. Given the increasing number of high-quality public datasets and
pipelines, this lack of clear compatibility threatens the
findability and reusability of these resources. We investigate
the feasibility of a collaborative filtering system to recommend pipelines
and datasets based on provenance records from previous executions.
We evaluate our system using datasets and pipelines extracted from the
Canadian Open Neuroscience Platform, a national initiative for open
neuroscience. The recommendations provided by our system (AUC$=0.83$) are
significantly better than chance and outperform recommendations made by
domain experts using their previous knowledge as well as pipeline and dataset descriptions (AUC$=0.63$). In particular, domain experts often neglect
low-level technical aspects of a pipeline-dataset interaction, such as the level of pre-processing, which are
captured by a provenance-based system. We conclude that provenance-based
pipeline and dataset recommenders are feasible and beneficial to
the sharing and usage of open-science resources. Future
work will focus on the collection of more
comprehensive provenance traces, and on deploying the system in production.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Thesis (Masters)
Authors:Mazaheri, Mandana
Institution:Concordia University
Degree Name:M. Sc.
Program:Computer Science
Date:October 2021
Thesis Supervisor(s):Glatard, Tristan
ID Code:989068
Deposited By: Mandana Mazaheri
Deposited On:29 Nov 2021 17:03
Last Modified:29 Nov 2021 17:03
Related URLs:
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top