[1] J. Hennessy, D. Patterson, “Computer architecture: a quantitative approach” Elsevier, 2011.
[2] K. Pingali, D. Nguyen, M. Kulkarni, M. Burtscher, M. A. Hassaan, R. Kaleem, T. Lee, A. Lenharth, R. Manevich, M. Méndez-Lojo, D. Prountzos, X. Sui, “The tao of parallelism in algorithms,” In Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation, pp. 12–25, ACM, 2011.
[3] O. Borůvka, “O jistém problému minimálním (About a certain minimal problem),” Práce mor. přírodověd. spol. v Brně III 3, pp. 37-58, 1926.
[4] V. Vineet, P. Harish, S. Patidar, P. J. Narayanan, “Fast minimum spanning tree for large graphs on the GPU,” In Proceeding of the Conference on High Performance Graphics, pp. 167- 171, ACM, 2009.
[5] R. Nasre, M. Burtscher, K. Pingali, “Morph algorithms on GPUs,” In Proceedings of ACM SIGPLAN Notices, 48(8), pp. 147-156, ACM, 2013.
[6] R. Nasre, M. Burtscher, K. Pingali, “Atomic-free irregular computations on GPUs,” In Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, pp. 96-107, ACM, 2013.
[7] N. Shavit, D. Touitou, “Software transactional memory”, Distributed Computing, 10(2), pp. 99-116, 1997.
[8] D. Cederman, P. Tsigas, M.T. Chaudhry, “Towards a software transactional memory for graphics processors,” In Proceedings of Euro-graphics Symposium on Parallel Graphics and Visualization, pp. 121-129, 2010.
[9] Y. Xu, R. Wang, N. Goswami, T. Li, L, Gao, D. Qian, ”Software transactional memory for GPU architectures,” In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, ACM, 2014.
80
[10] A. Holey, A. Zhai, “Lightweight software transactions on GPUs, ” In Proceedings of 43rd International Conference on Parallel Processing, pp. 461-470, IEEE, 2014.
[11] Q. Shen, C. Sharp, W. Blewitt, G. Ushaw, G. Morgan, “PR-STM: Priority Rule Based Software Transactions for the GPU,” European Conference on Parallel Processing, pp. 361–372, 2015.
[12] F. Khorasani, R. Gupta Laxmi, N. Bhuyan, “Scalable SIMD-Efficient Graph Processing on GPUs,” In Proceedings of International Conference on Parallel Architecture and Compilation, pp. 39-50, IEEE, 2015.
[13] Nvidia. CUDA. Retrieved September 7, 2016 from http://www.nvidia.com/cuda.
[14] L. G. Valiant. “A bridging model for parallel computation,” Communications of the ACM, 33(8), pp. 103-111, 1990.
[15] M. Herlihy, J. E. B. Moss, “Transactional Memory: Architectural Support for Lock-Free Data Structures,” In Proceedings of the 20th Annual International Symposium on Computer Architecture, pp. 289-300, ACM, 1993.
[16] C. J. Rossbach, O. S. Hofmann, E. Witchel, “Is transactional programming actually easier?” In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 45(5), pp. 47-56, 2010.
[17] V. Pankratius, A.-R. Adl-Tabatabai, “A study of transactional memory vs. locks in practice,” In Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures, pp 43-52, ACM, 2011.
[18] R. Ennals, “Efficient software transactional memory,” Technical report, Intel Research Cambridge, UK, 2005.
[19] T. Harris, J. Larus, R. Rajwar, “Transactional Memory,” Synthesis Lectures on Computer Architecture, 5(1), pp. 1-263, 2010.
81
[20] T. Harris, K. Fraser, “Language Support for Lightweight Transactions,” In ACM SIGPLAN Notices, 38(11), pp. 388-402, ACM, 2003.
[21] Y.C. Tseng, T.T.Y. Juang, M.C. Du, “Building a multicasting tree in a high-speed network,” IEEE Concurrency, 6(4), pp. 57-67, 1998.
[22] L. An, Q.S. Xiang, S. Chavez, “A fast implementation of the minimum spanning tree method for phase unwrapping,” IEEE Transactions on Medical Imaging, 19(8), pp. 805-808, 2000.
[23] S. Kang, D.A. Bader, “An Efficient Transactional Memory Algorithm for Computing Minimum Spanning Forest of Sparse Graphs,” In Proceedings of the 14th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, 44(4), pp. 15-24, 2009.
[24] S.V. Adve, K. Gharachorloo, “Shared Memory Consistency Models: A Tutorial”, IEEE Computer Journal 29(12), pp. 66-76, 1995.
[25] DIMACS 10 challenge graph collection – Graph Partitioning and Graph Clustering, Retrieved January 25, 2016 from http://www.cc.gatech.edu/dimacs10/downloads.shtml.
[26] D. A. Bader, K. Madduri, "GTgraph: A Synthetic Graph Generator Suite," Technical Report, 2006.
[27] J. Siek, L. Lee, A. Lumsdaine, "The Boost Graph Library: User Guide and Reference Manual," Addison-Wesley, 2002.
[28] D. Lasalle , G. Karypis, “Multi-threaded Graph Partitioning,” In Proceedings of 27th International Symposium on Parallel and Distributed Processing, pp. 225-236, IEEE, 2013.
[29] T. Bui, C. Jones, “A heuristic for reducing fill in sparse matrix factorization”, In Proceedings of the 6th SIAM Conference, Parallel Processing for Scientific Computing, pp. 445-452, 1993.
[30] R. Leland, B. Hendrickson, “A Multilevel Algorithm for Partitioning Graphs, Technical Report SAND93-1301, Sandia National Laboratories, 1993.
82
[31] G. Karypis, V. Kumar, “Analysis of Multilevel Graph Partitioning,” Technical Report, TR 95-037, Department of Computer Science, University of Minnesota, 1995.
[32] G. Karypis, V. Kumar, “A fast and highly quality multilevel scheme for partitioning irregular graphs”, SIAM Journal on Scientific Computing, pp. 359-392, 1998.
[33] B. W. Kernighan, S. Lin, “An efficient heuristic procedure for partitioning graphs”, Bell System. Technical Journal, 49, pp. 291-307, 1970.
[34] C. M. Fiduccia, R. M. Mattheyses, “A linear time heuristic for improving network partitions”, In Proceedings of 19th IEEE Design Automation Conference, pp. 175-181, 1982.
[35] D. LaSalle, G Karypis, “A parallel hill-climbing refinement algorithm for graph partitioning,” 45th International Conference on Parallel Processing, pp. 236-241,IEEE, 2016.
[36] M. Harris, S. Sengupta, J.D. Owens, “Parallel prefix sum (scan) with CUDA,” GPU gems, 3(39), pp. 851-876, 2007.
[37] J.T., Schwartz, “Ultracomputers,” ACM Transactions on Programming Languages and Systems , 2(4), pp. 484-521, 1980.
[38] S. Sengupta, M. Harris, Y. Zhang, J.D. Owens, "Scan primitives for GPU computing," In Graphics hardware, vol. 2017, pp. 97-106, 2007.
[39] L. Ma, K. Agrawal, R. D. Chamberlain, "A memory access model for highly-threaded many-core architectures," Future Generation Computer Systems, vol. 30, pp. 202-215, 2014.
[40] M. Lin, R.D. Chamberlain, K. Agrawal, "Analysis of classic algorithms on GPUs," In Proceedings of High Performance Computing & Simulation, pp. 65-73, IEEE, 2014.
[41] B. Goodarzi, M. Burtscher, D. Goswami, "Parallel Graph Partitioning on a CPU-GPU Architecture," In Proceedings of Parallel and Distributed Processing Symposium Workshops, pp. 58-66. IEEE, 2016.