Aoun, Alain
ORCID: https://orcid.org/0000-0001-9038-5335
(2025)
Machine Learning based Memory Load Approximation.
PhD thesis, Concordia University.
Preview |
Text (application/pdf)
4MBAoun_PhD_F2025.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
Modern computing applications demand ever-increasing performance and energy efficiency. However, conventional processor architectures frequently stall while waiting for data retrieval from memory, creating a bottleneck known as the memory wall. Over the past decades, various approaches such as speculative prefetching, load value prediction, and hardware caching have been proposed to mitigate this limitation. While these techniques yield moderate gains, they often rely on rigid hardware logic or simple pattern matching, which struggle with the irregular, data-driven workloads typical of contemporary multimedia and machine learning applications.
This thesis propose to use Machine Learning (ML) to speculate load values and reduce memory accesses. The proposed method is grounded in the principles of Approximate Computing (AC), where minor inaccuracies are accepted in exchange for improvements in performance or efficiency. To this end, we introduce an ML-based Load Value Approximation (ML-LVA) approach, which predicts the values of memory loads to reduce access latency. The ML-LVA is trained offline to generate a compact predictor that captures patterns in image and audio data, enabling accurate value prediction during runtime without the need for continual retraining. By learning spatial correlations among adjacent data values, the proposed ML-LVA effectively anticipates memory contents, thereby reducing stalls and improving overall system performance in online deployment.
We have implemented the proposed ML-LVA framework both in software and hardware. The software variant targets existing processors lacking reconfigurability, as well as systems with tight area or power constraints that prohibit adding custom hardware. It operates as a callable subroutine designed for seamless integration without modifying the processor architecture. The software implementation was tested on an x86 processor in the GEM5 simulator. On the other hand, the hardware-based implementation integrates the proposed ML-LVA as a dedicated accelerator accessed via a custom instruction, offering tighter pipeline integration, lower latency, and enhanced efficiency for newly designed systems. The hardware-based ML-LVA was implemented in CVA6, which is an open source RISC-V processor. The synthesis results conducted in Cadence Innovus showed that the overhead of the added accelerator is marginal.
Experimental results conducted on audio and image processing workloads demonstrate that the proposed ML-LVA accelerates memory access by over 6×, resulting in application speedups up to 2.45×. Additionally, even when predicting up to 95% of loads, the output fidelity remains within perceptual thresholds. Subsequently, the proposed ML-LVA outperforms state-of-the-art LVAs in terms of performance and quality. The ML-LVA achieves these results with only a 5% area overhead and less than 1% power increase in silicon.
| Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Electrical and Computer Engineering |
|---|---|
| Item Type: | Thesis (PhD) |
| Authors: | Aoun, Alain |
| Institution: | Concordia University |
| Degree Name: | Ph. D. |
| Program: | Electrical and Computer Engineering |
| Date: | 2 June 2025 |
| Thesis Supervisor(s): | Tahar, Sofiène |
| Keywords: | Approximate Computing, Approximate Memory, Approximate Load Value, Machine Learning, Multimedia Processing, Image Processing, Audio Processing |
| ID Code: | 996112 |
| Deposited By: | Alain Aoun |
| Deposited On: | 04 Nov 2025 16:11 |
| Last Modified: | 04 Nov 2025 16:11 |
Repository Staff Only: item control page


Download Statistics
Download Statistics