Login | Register

Data Stream Classification with Mondrian Forest under Memory Constraints

Title:

Data Stream Classification with Mondrian Forest under Memory Constraints

Khannouz, Martin (2023) Data Stream Classification with Mondrian Forest under Memory Constraints. PhD thesis, Concordia University.

[thumbnail of Khannouz_PhD_S2023.pdf]
Preview
Text (application/pdf)
Khannouz_PhD_S2023.pdf - Accepted Version
Available under License Spectrum Terms of Access.
4MB

Abstract

Supervised learning algorithms generally assume the availability of enough memory to store
data models during the training and test phases. However, this assumption is unrealistic when data comes in the form of infinite data streams, or when learning algorithms are deployed on devices with reduced amounts of memory. In this manuscript, we investigate the use of data stream classification methods under memory constraints. Our investigation consists of three steps: a benchmark of models, an update of a model, and an optimization
of a trade-off. We evaluate data stream classification models with different criteria such
as classification performance or resource usage. The benchmark reveals that the Mondrian
forest, despite having state-of-the-art classification performance with unlimited memory, is
impacted by a low memory limit. We then adapt the online Mondrian forest classification
algorithm to work with memory constraints on data streams. In particular, we design five
out-of-memory strategies to update Mondrian trees with new data points when the memory
limit is reached. We evaluate our algorithms on a variety of real and simulated datasets, and
we conclude with recommendations on their use in different situations: the Extend Node strategy appears as the best out-of-memory strategy in all configurations. We identify that the memory-constrained brings a trade-off between the Mondrian forest size and its tree depth. We design an adjusting algorithm to optimize the forest size to the data stream and the memory limit and we evaluate this algorithm on similar datasets. All our methods are implemented in the OrpailleCC open-source library and are ready to be used on embedded systems and connected objects. Overall, the contributions significantly improve the performance of the Mondrian forest under memory constraints.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Thesis (PhD)
Authors:Khannouz, Martin
Institution:Concordia University
Degree Name:Ph. D.
Program:Computer Science
Date:March 2023
Thesis Supervisor(s):Glatard, Tristan
ID Code:992192
Deposited By: Martin Yohann Vincent Khannouz
Deposited On:21 Jun 2023 14:37
Last Modified:21 Jun 2023 14:37
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top