Thiel, Stuart ORCID: https://orcid.org/0000-0002-2172-3916 (2021) The Diverting Fast Radix Algorithm. PhD thesis, Concordia University.
Preview |
Text (application/pdf)
1MBThiel_PhD_S2021.pdf - Accepted Version Available under License Creative Commons Attribution. |
Abstract
Radix sort is a classical algorithm to sort $N$ records with integer keys. The keys are represented using digits of a radix. The classical algorithm executes a pass through the records to count the frequency of each digit to determine the digit’s bucket size, and then a pass to deal the records to the corresponding bucket. The algorithm may proceed from the most significant digit to the least significant, or from the least significant digit to the most significant.
This thesis presents the Fast Radix and the Diverting Fast Radix algorithms, which replace
all but one count pass with estimation of bucket sizes and correction of overflow, and may,
in addition, divert to another sorting algorithm such as insertion sort when the subproblem
size is below a threshold.
Extensive performance experiments compare the implementations of the two algorithms against the native std::sort in the C++ Standard Library, radix sort, and RADULS2, which is currently considered the fastest radix sort available.
We develop a mathematical model of the average-case performance, when keys have a uniform distribution of digits, that support the observed performance improvements. The modeling demonstrates that the costs of estimation and correction quickly approach zero, falling under 5\% of the costs of traditional counting for $N$ as low as 11100. The modeling also provides a formula to calculate the thresholds for diversion that conform to the experimental results.
Our focus is on algorithms, so our implementation is fixed-digit, single-threaded and avoids optimizations such as parallelism, cache, translation lookaside buffer (TLB), or the specific input distribution, which are utilised by RADULS2. Nevertheless, Diverting Fast Radix is more than twice as fast as std::sort on inputs of size up to one billion (and beyond), and approximately 10\% faster than RADULS2 for one million records (even faster on smaller inputs). It is notable that the compiled code size of Diverting Fast Radix is two orders of magnitude smaller than RADULS2.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering |
---|---|
Item Type: | Thesis (PhD) |
Authors: | Thiel, Stuart |
Institution: | Concordia University |
Degree Name: | Ph. D. |
Program: | Computer Science |
Date: | 29 January 2021 |
Thesis Supervisor(s): | Butler, Gregory |
Keywords: | sort, sorting, algorithm, analysis, radix sort, LSD |
ID Code: | 988207 |
Deposited By: | STUART THIEL |
Deposited On: | 13 Jul 2021 18:30 |
Last Modified: | 13 Jul 2021 18:30 |
Additional Information: | The repository for algorithms developed in this thesis is: https://github.com/ramou/dfr |
Repository Staff Only: item control page