Khannouz, Martin (2023) Data Stream Classification with Mondrian Forest under Memory Constraints. PhD thesis, Concordia University.
Preview |
Text (application/pdf)
4MBKhannouz_PhD_S2023.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
Supervised learning algorithms generally assume the availability of enough memory to store
data models during the training and test phases. However, this assumption is unrealistic when data comes in the form of infinite data streams, or when learning algorithms are deployed on devices with reduced amounts of memory. In this manuscript, we investigate the use of data stream classification methods under memory constraints. Our investigation consists of three steps: a benchmark of models, an update of a model, and an optimization
of a trade-off. We evaluate data stream classification models with different criteria such
as classification performance or resource usage. The benchmark reveals that the Mondrian
forest, despite having state-of-the-art classification performance with unlimited memory, is
impacted by a low memory limit. We then adapt the online Mondrian forest classification
algorithm to work with memory constraints on data streams. In particular, we design five
out-of-memory strategies to update Mondrian trees with new data points when the memory
limit is reached. We evaluate our algorithms on a variety of real and simulated datasets, and
we conclude with recommendations on their use in different situations: the Extend Node strategy appears as the best out-of-memory strategy in all configurations. We identify that the memory-constrained brings a trade-off between the Mondrian forest size and its tree depth. We design an adjusting algorithm to optimize the forest size to the data stream and the memory limit and we evaluate this algorithm on similar datasets. All our methods are implemented in the OrpailleCC open-source library and are ready to be used on embedded systems and connected objects. Overall, the contributions significantly improve the performance of the Mondrian forest under memory constraints.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering |
---|---|
Item Type: | Thesis (PhD) |
Authors: | Khannouz, Martin |
Institution: | Concordia University |
Degree Name: | Ph. D. |
Program: | Computer Science |
Date: | March 2023 |
Thesis Supervisor(s): | Glatard, Tristan |
ID Code: | 992192 |
Deposited By: | Martin Yohann Vincent Khannouz |
Deposited On: | 21 Jun 2023 14:37 |
Last Modified: | 21 Jun 2023 14:37 |
Repository Staff Only: item control page