Login | Register

Parallelizing Games for Heterogeneous Computing Environments


Parallelizing Games for Heterogeneous Computing Environments

ALBAHNASSI, WESSAM (2012) Parallelizing Games for Heterogeneous Computing Environments. Masters thesis, Concordia University.

Text (application/pdf)
Thesis_Final2.pdf - Accepted Version
Available under License Spectrum Terms of Access.


Game applications nowadays run on platforms that contain multiple and different types of processors. Benefiting from the available processing power requires: (1) Code to be designed to run in parallel; and (2) Efficient mapping techniques to schedule tasks optimally on appropriate processors for increasing performance. The work in this thesis is directed towards improving the execution time performance of game applications on host platforms. The methodology adopted consisted of the following different stages: (1) Studying the main features of game applications; (2) Investigating parallelization opportunities within these applications, (3) Identifying/designing the programming structures needed to realize those opportunities; and (4) Designing, implementing, verifying and testing the parallel programming structures developed to support heterogeneous processing and maximize use of available processing power.

The research reported in this thesis presents a new framework for parallel programming of games. This framework consists of two major components which form the two main contributions of our work. These are (1) Sayl, a new design pattern to enable parallel programming of game applications and to reduce the amount of programmer work when writing parallel code; and (2) Arbiter Work Stealing, a new task scheduler capable of efficiently and automatically scheduling dynamically generated tasks to processors in a heterogeneous processing and memory environment. The framework was used to successfully parallelize the serial code of a sample game application to work in a heterogeneous processing environment. The parallel version shows far superior performance as compared to the serial version.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Thesis (Masters)
Institution:Concordia University
Degree Name:M.A. Sc.
Program:Software Engineering
Date:October 2012
Thesis Supervisor(s):Dhrubajyoti, Goswami and Sudhir, Mudur
Keywords:parallel computing; games; patterns; data-flow; parameters; task graph; heterogeneous processing; work stealing; dynamic scheduling
ID Code:974932
Deposited On:25 Jan 2016 17:22
Last Modified:18 Jan 2018 17:39


Adams, D. A. (1969). A Computation Model with Data-Flow Sequencing. Tech. Report TR CS 177, Stanford Univ., Computer Science Dept., Stanford, Calif.
Agrawal, K., He, Y., & Leiserson, C. E. (2007). Adaptive work stealing with parallelism feedback. The 12th ACM SIGPLAN symposium on Principles and practice of parallel programming. San Jose, California, USA: ACM. doi:10.1145/1229428.1229448
Andrews, J., & Baker, N. (2006, March). Xbox 360 System Architecture. IEEE Micro, 26(2), pp. 25-37. doi:10.1109/MM.2006.45
Augonnet, C., Thibault, S., Namyst, R., & Wacrenier, P.-A. (2011, February). StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput. : Pract. Exper., Special Issue: Euro-Par 2009, pp. 187–198.
Best, M. J., Fedorova1, A., Dickie, R., Tagliasacchi, A., Couture-Beil, A., Mustard, C., . . . Brownsword, A. (2009). Searching for Concurrent Design Patterns in Video Games. Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing. London: Springer-Verlag London.
Blow, J. (2004, February). Game Development: Harder Than You Think. Queue - Game Development, 1(10). doi:10.1145/971564.971590
Blumofe, R. D., & Leiserson, C. E. (1999). Scheduling multithreaded computations by work stealing. Journal of the ACM (JACM), 46(5), 720-748. doi:10.1145/324133.324234
Blumofe, R. D., Joerg, C. F., Kuszmaul, B. C., Leiserson, C. E., Randall, K. H., & Zhou, Y. (1995). Cilk: an efficient multithreaded runtime system. Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, (pp. 207-216). Santa Barbara, California, United States. doi:10.1145/209936.209958
Boyd, C. (2008). The DirectX 11 compute shader. SIGGRAPH 2008. Retrieved from http://s08.idav.ucdavis.edu/boyd-dx11-compute-shader.pdf
Cosnard, M., & Jeannot, E. (1999, September). Compact DAG representation and its dynamic scheduling. Journal of Parallel and Distributed Computing, 58(3).
Danial, A. (2009). CLOC: Count lines of code. Retrieved December 2011, from http://cloc.sourceforge.net/
Diamos, G. F., & Yalamanchili, S. (2008). Harmony: an execution model and runtime for heterogeneous many core systems. Proceedings of the 17th international symposium on High performance distributed computing, (pp. 197-200). doi:10.1145/1383422.1383447
Diefendorff, K., Dubey, P. K., Hochsprung, R., & Scales, H. (2000). AltiVec Extension to PowerPC Accelerates Media Processing. IEEE Micro, 20(2), 85-95. doi:10.1109/40.848475
Ericson, C. (2005). Real-Time Collision Detection. Morgan Kaufmann.
Fatahalian, K., & Houston, M. (2008). GPUs: A Closer Look. ACM Queue, 18-28.
Giusti, L. D., Chichizola, F., Naiouf, M., & Giusti, A. D. (2008). Mapping Tasks to Processors in Heterogeneous Multiprocessor Architectures: The MATEHa Algorithm. SCCC '08 Proceedings of the 2008 International Conference of the Chilean Computer Science Society, (pp. 85-91). doi:10.1109/SCCC.2008.11
Giusti, L. D., Luque, E., Chichizola, F., Naiouf, M., & Giusti, A. D. (2009). AMTHA: An Algorithm for Automatically Mapping Tasks to Processors in Heterogeneous Multiprocessor Architectures. Proceedings of the 2009 WRI World Congress on Computer Science and Information Engineering, (pp. 481-485). doi:10.1109/CSIE.2009.175
Goetz, B. (2002, July 1). Java theory and practice: Thread pools and work queues. Retrieved from IBM: http://www.ibm.com/developerworks/java/library/j-jtp0730/index.html
Gonina, E., & Chong, J. (2008). Task Queue Implementation Pattern. Retrieved from UC Berkeley ParLab: http://parlab.eecs.berkeley.edu/wiki/_media/patterns/taskqueue.pdf
Gregg, C., & Hazelwood, K. (2011). Where is the data? Why you cannot debate CPU vs. GPU performance without the answer. Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (pp. 134-144). IEEE. doi:10.1109/ISPASS.2011.5762730
Gschwind, M. (2007, August 22). Using data-parallel SIMD architecture in video games and supercomputers. Retrieved from IBM: http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.simd.html
Hamidzadeh, B., Lilja, D. J., & Atif, Y. (1995, October). Dynamic Scheduling Techniques for Heterogeneous Computing Systems. Concurrency: Practice and Experience, 7(7), pp. 633-652.
Hansen, C., Crockett, T., & Whitman, S. (1994). Guest Editors' Introduction: Parallel Rendering. IEEE Concurrency, 2(2), 7. doi:10.1109/MCC.1994.10014
Hovland, R. J. (2008). Latency and Bandwidth Impact on GPU-systems Projet report. Norwegian University of Science and Technology, Department of Computer and Information Science. Retrieved from http://www.idi.ntnu.no/elster/master-studs/runejoho/ms-proj-gpgpu-latency-bandwidth.pdf
Intel Corporation. (2007, April). Intel SSE4 Programming Reference. Retrieved from Intel: http://software.intel.com/sites/default/files/m/9/4/2/d/5/17971-intel_20sse4_20programming_20reference.pdf
Johnson, T., Davis, T. A., & Hadfield, S. M. (1996, February). A concurrent dynamic task graph. Parallel Computing, 22(2), 327-333. doi:10.1016/0167-8191(95)00061-5
Joselli, M., Clua, E., Montenegro, A., Conci, A., & Pagliosa, P. (2008). A new physics engine with automatic process distribution between CPU-GPU. Proceedings of the 2008 ACM SIGGRAPH symposium on Video games. Los Angeles, California: ACM. doi:10.1145/1401843.1401871
Joselli, M., Zamith, M., Clua, E. W., Montenegro, A., Leal-Toledo, R. C., Valente, L., & Feijó, B. (2010). An Architecture with Automatic Load Balancing and Distribution for Digital Games. 2010 Brazilian Symposium on Games and Digital Entertainment.
Joselli, M., Zamith, M., Clua, E., Montenegro, A., Conci, A., Leal-Toledo, R., . . . Pozzer, C. (2008). Automatic Dynamic Task Distribution between CPU and GPU for Real-Time Systems. 11th IEEE International Conference on Computational Science and Engineering, CSE '08 (pp. 48-55). IEEE. doi:10.1109/CSE.2008.38
Joselli, M., Zamith, M., Clua, E., Montenegro, A., Leal-Toledo, R., Conci, A., . . . Feijó, B. (2009, December). An adaptative game loop architecture with automatic distribution of tasks between CPU and GPU. Computers in Entertainment (CIE), 7(4). doi:10.1145/1658866.1658869
Kahle, J. A., Day, M. N., Hofstee, H. P., Johns, C. R., Maeurer, T. R., & Shippy, D. (2005, July). Introduction to the cell multiprocessor. IBM Journal of Research and Development, 49(4/5), 589-604.
Khronos Group. (1997). Retrieved from OpenGL: http://www.opengl.org/
Kilgariff, E., & Fernando, R. (2005). The GeForce 6 Series GPU Architecture. In M. Pharr, & R. Fernando, GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation. Addison-Wesley Professional.
Kim, C., & Agrawala, A. K. (1989, February). Analysis of the Fork-Join Queue. IEEE Transactions on Computers, 38(2), pp. 250-255. doi:10.1109/12.16501
Lavender, R. G., & Schmidt, D. C. (1996). Active object: an object behavioral pattern for concurrent programming. In Pattern languages of program design 2. Boston, MA: Addison-Wesley Longman Publishing Co. Inc.
Luk, C.-K., Hong, S., & Kim, H. (2009). Qilin: Exploiting Parallelism on Heterogeneous Multiprocessors with Adaptive Mapping. MICRO 42 Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (pp. 45-55). New York, NY, USA: ACM. doi:10.1145/1669112.1669121
Maheswaran, M., & Siegel, H. J. (1998). A Dynamic Matching and Scheduling Algorithm for Heterogeneous Computing Systems. Proceedings of the Seventh Heterogeneous Computing Workshop, (p. 57).
Manolescu, D.-A. (September, 1997). A Data Flow Pattern Language. Proc. 4th Pattern Languages of Programming. Monticello, IL.
Mattson, T., Sanders, B., & Massingill, B. (2004). Patterns for Parallel Programming. Addison-Wesley Professional.
Microsoft Corp. (2007, November). How to use the QueryPerformanceCounter function to time code in Visual C++. Retrieved 12 2011, from Microsoft: http://support.microsoft.com/kb/815668/en-us/
Microsoft Corp. (2012). C++ AMP (C++ Accelerated Massive Parallelism). Retrieved from Microsoft Developer Network: http://msdn.microsoft.com/en-us/library/hh265137.aspx
Microsoft Corp. (2012). Direct3D. Retrieved from Microsoft Developer Network: http://msdn.microsoft.com/en-us/library/windows/desktop/hh309466%28v=vs.85%29.aspx
Microsoft Corp. (2012, 6 21). XInput Game Controller APIs. Retrieved from Microsoft Developer Network: http://msdn.microsoft.com/en-us/library/windows/desktop/hh405053%28v=vs.85%29.aspx
Miller, A. (2010). The task graph pattern. ParaPLoP '10: Proceedings of the 2010 Workshop on Parallel Programming Patterns. ACM. doi:10.1145/1953611.1953619
NVIDIA Corp. (2007). NVIDIA CUDA: Compute Unified Device Architecture.
Rhalibi, A. E., Merabti, M., & Shen, Y. (2006). Improving game processing in multithreading and multiprocessor architecture. Edutainment'06 Proceedings of the First international conference on Technologies for E-Learning and Digital Entertainment (pp. 669-679). Springer-Verlag Berlin, Heidelberg. doi:10.1007/11736639_81
Rodrigues, J. E., & Bezos, J. E. (1969). A GRAPH MODEL FOR PARALLEL COMPUTATIONS. Cambridge, MA: Massachusetts Institute of Technology.
Stroustrup, B. (1997). The C++ Programming Language Third Edition. Massachusetts: Addison Wesley, Reading.
The OpenCL Specification. (2010). Retrieved from http://www.khronos.org/registry/cl/
Valente, L., Conci, A., & Feijó, B. (2005). Real time game loop models for single-player computer games. Proceedings of the Fourth Brazilian Symposium on Computer Games and Digital Entertainment, (pp. 89-99).
Werth, T., Schreier, S., & Philippsen, M. (2011). CellCilk: Extending Cilk for heterogeneous multicore platforms. The 24th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2011). Fort Collins, Colorado, USA.
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Back to top Back to top