Do, Hoang-Dung (2021) Modeling the Linux page cache for accurate simulation of data-intensive applications. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
1MBDo_MSC_S2021.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
The emergence of Big Data in recent years has led to a growing need in data processing and an increasing number of data intensive applications. Processing and storage of massive amounts of data require large-scale solutions and thus must data-intensive applications be executed on infrastructures such as cloud or High Performance Computing (HPC) clusters. Although there are advancements of hardware/software stack that enable larger computing platforms, some relevant challenges remain in resource management, performance, scheduling, scalability, etc. As a result, there is an increasing demand for optimizing and quantifying performance when executing data-intensive applications on those platforms. While infrastructures with sufficient computing power and storage capacity are available, the I/O performance on disks remains a bottleneck. To tackle this problem, apart from hardware improvements, the Linux page cache is an efficient architectural approach to reduce I/O overheads, but few experimental studies of its interactions with Big Data applications exist, partly due to limitations of real-world experiments. Simulation is a popular approach to address these issues, however, existing simulation frameworks do not simulate page caching fully, or even at all. As a result, simulation-based performance studies of data-intensive applications lead to inaccurate results.
This thesis proposes an I/O simulation model that captures the key features of the Linux page cache. We have implemented this model as part of the WRENCH workflow simulation framework, which itself builds on the popular SimGrid distributed systems simulation framework. Our model and its implementation enable the simulation of both single-threaded and multithreaded applications, and of both writeback and writethrough caches for local or network-based filesystems. We evaluate the accuracy of our model in different conditions, including sequential and concurrent applications, as well as local and remote I/Os. The results show that our page cache model reduces the simulation error by up to an order of magnitude when compared to state-of-the-art, cacheless simulations.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | Do, Hoang-Dung |
Institution: | Concordia University |
Degree Name: | M. Sc. |
Program: | Computer Science |
Date: | April 2021 |
Thesis Supervisor(s): | Glatard, Tristan |
ID Code: | 988339 |
Deposited By: | Hoang Dung Do |
Deposited On: | 29 Jun 2021 23:16 |
Last Modified: | 29 Jun 2021 23:16 |
Repository Staff Only: item control page