Adams, D. A. (1969). A Computation Model with Data-Flow Sequencing. Tech. Report TR CS 177, Stanford Univ., Computer Science Dept., Stanford, Calif. Agrawal, K., He, Y., & Leiserson, C. E. (2007). Adaptive work stealing with parallelism feedback. The 12th ACM SIGPLAN symposium on Principles and practice of parallel programming. San Jose, California, USA: ACM. doi:10.1145/1229428.1229448 Andrews, J., & Baker, N. (2006, March). Xbox 360 System Architecture. IEEE Micro, 26(2), pp. 25-37. doi:10.1109/MM.2006.45 Augonnet, C., Thibault, S., Namyst, R., & Wacrenier, P.-A. (2011, February). StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput. : Pract. Exper., Special Issue: Euro-Par 2009, pp. 187–198. Best, M. J., Fedorova1, A., Dickie, R., Tagliasacchi, A., Couture-Beil, A., Mustard, C., . . . Brownsword, A. (2009). Searching for Concurrent Design Patterns in Video Games. Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing. London: Springer-Verlag London. Blow, J. (2004, February). Game Development: Harder Than You Think. Queue - Game Development, 1(10). doi:10.1145/971564.971590 Blumofe, R. D., & Leiserson, C. E. (1999). Scheduling multithreaded computations by work stealing. Journal of the ACM (JACM), 46(5), 720-748. doi:10.1145/324133.324234 Blumofe, R. D., Joerg, C. F., Kuszmaul, B. C., Leiserson, C. E., Randall, K. H., & Zhou, Y. (1995). Cilk: an efficient multithreaded runtime system. Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, (pp. 207-216). Santa Barbara, California, United States. doi:10.1145/209936.209958 Boyd, C. (2008). The DirectX 11 compute shader. SIGGRAPH 2008. Retrieved from http://s08.idav.ucdavis.edu/boyd-dx11-compute-shader.pdf Cosnard, M., & Jeannot, E. (1999, September). Compact DAG representation and its dynamic scheduling. Journal of Parallel and Distributed Computing, 58(3). Danial, A. (2009). CLOC: Count lines of code. Retrieved December 2011, from http://cloc.sourceforge.net/ Diamos, G. F., & Yalamanchili, S. (2008). Harmony: an execution model and runtime for heterogeneous many core systems. Proceedings of the 17th international symposium on High performance distributed computing, (pp. 197-200). doi:10.1145/1383422.1383447 Diefendorff, K., Dubey, P. K., Hochsprung, R., & Scales, H. (2000). AltiVec Extension to PowerPC Accelerates Media Processing. IEEE Micro, 20(2), 85-95. doi:10.1109/40.848475 Ericson, C. (2005). Real-Time Collision Detection. Morgan Kaufmann. Fatahalian, K., & Houston, M. (2008). GPUs: A Closer Look. ACM Queue, 18-28. Giusti, L. D., Chichizola, F., Naiouf, M., & Giusti, A. D. (2008). Mapping Tasks to Processors in Heterogeneous Multiprocessor Architectures: The MATEHa Algorithm. SCCC '08 Proceedings of the 2008 International Conference of the Chilean Computer Science Society, (pp. 85-91). doi:10.1109/SCCC.2008.11 Giusti, L. D., Luque, E., Chichizola, F., Naiouf, M., & Giusti, A. D. (2009). AMTHA: An Algorithm for Automatically Mapping Tasks to Processors in Heterogeneous Multiprocessor Architectures. Proceedings of the 2009 WRI World Congress on Computer Science and Information Engineering, (pp. 481-485). doi:10.1109/CSIE.2009.175 Goetz, B. (2002, July 1). Java theory and practice: Thread pools and work queues. Retrieved from IBM: http://www.ibm.com/developerworks/java/library/j-jtp0730/index.html Gonina, E., & Chong, J. (2008). Task Queue Implementation Pattern. Retrieved from UC Berkeley ParLab: http://parlab.eecs.berkeley.edu/wiki/_media/patterns/taskqueue.pdf Gregg, C., & Hazelwood, K. (2011). Where is the data? Why you cannot debate CPU vs. GPU performance without the answer. Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (pp. 134-144). IEEE. doi:10.1109/ISPASS.2011.5762730 Gschwind, M. (2007, August 22). Using data-parallel SIMD architecture in video games and supercomputers. Retrieved from IBM: http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.simd.html Hamidzadeh, B., Lilja, D. J., & Atif, Y. (1995, October). Dynamic Scheduling Techniques for Heterogeneous Computing Systems. Concurrency: Practice and Experience, 7(7), pp. 633-652. Hansen, C., Crockett, T., & Whitman, S. (1994). Guest Editors' Introduction: Parallel Rendering. IEEE Concurrency, 2(2), 7. doi:10.1109/MCC.1994.10014 Hovland, R. J. (2008). Latency and Bandwidth Impact on GPU-systems Projet report. Norwegian University of Science and Technology, Department of Computer and Information Science. Retrieved from http://www.idi.ntnu.no/elster/master-studs/runejoho/ms-proj-gpgpu-latency-bandwidth.pdf Intel Corporation. (2007, April). Intel SSE4 Programming Reference. Retrieved from Intel: http://software.intel.com/sites/default/files/m/9/4/2/d/5/17971-intel_20sse4_20programming_20reference.pdf Johnson, T., Davis, T. A., & Hadfield, S. M. (1996, February). A concurrent dynamic task graph. Parallel Computing, 22(2), 327-333. doi:10.1016/0167-8191(95)00061-5 Joselli, M., Clua, E., Montenegro, A., Conci, A., & Pagliosa, P. (2008). A new physics engine with automatic process distribution between CPU-GPU. Proceedings of the 2008 ACM SIGGRAPH symposium on Video games. Los Angeles, California: ACM. doi:10.1145/1401843.1401871 Joselli, M., Zamith, M., Clua, E. W., Montenegro, A., Leal-Toledo, R. C., Valente, L., & Feijó, B. (2010). An Architecture with Automatic Load Balancing and Distribution for Digital Games. 2010 Brazilian Symposium on Games and Digital Entertainment. Joselli, M., Zamith, M., Clua, E., Montenegro, A., Conci, A., Leal-Toledo, R., . . . Pozzer, C. (2008). Automatic Dynamic Task Distribution between CPU and GPU for Real-Time Systems. 11th IEEE International Conference on Computational Science and Engineering, CSE '08 (pp. 48-55). IEEE. doi:10.1109/CSE.2008.38 Joselli, M., Zamith, M., Clua, E., Montenegro, A., Leal-Toledo, R., Conci, A., . . . Feijó, B. (2009, December). An adaptative game loop architecture with automatic distribution of tasks between CPU and GPU. Computers in Entertainment (CIE), 7(4). doi:10.1145/1658866.1658869 Kahle, J. A., Day, M. N., Hofstee, H. P., Johns, C. R., Maeurer, T. R., & Shippy, D. (2005, July). Introduction to the cell multiprocessor. IBM Journal of Research and Development, 49(4/5), 589-604. Khronos Group. (1997). Retrieved from OpenGL: http://www.opengl.org/ Kilgariff, E., & Fernando, R. (2005). The GeForce 6 Series GPU Architecture. In M. Pharr, & R. Fernando, GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation. Addison-Wesley Professional. Kim, C., & Agrawala, A. K. (1989, February). Analysis of the Fork-Join Queue. IEEE Transactions on Computers, 38(2), pp. 250-255. doi:10.1109/12.16501 Lavender, R. G., & Schmidt, D. C. (1996). Active object: an object behavioral pattern for concurrent programming. In Pattern languages of program design 2. Boston, MA: Addison-Wesley Longman Publishing Co. Inc. Luk, C.-K., Hong, S., & Kim, H. (2009). Qilin: Exploiting Parallelism on Heterogeneous Multiprocessors with Adaptive Mapping. MICRO 42 Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (pp. 45-55). New York, NY, USA: ACM. doi:10.1145/1669112.1669121 Maheswaran, M., & Siegel, H. J. (1998). A Dynamic Matching and Scheduling Algorithm for Heterogeneous Computing Systems. Proceedings of the Seventh Heterogeneous Computing Workshop, (p. 57). Manolescu, D.-A. (September, 1997). A Data Flow Pattern Language. Proc. 4th Pattern Languages of Programming. Monticello, IL. Mattson, T., Sanders, B., & Massingill, B. (2004). Patterns for Parallel Programming. Addison-Wesley Professional. Microsoft Corp. (2007, November). How to use the QueryPerformanceCounter function to time code in Visual C++. Retrieved 12 2011, from Microsoft: http://support.microsoft.com/kb/815668/en-us/ Microsoft Corp. (2012). C++ AMP (C++ Accelerated Massive Parallelism). Retrieved from Microsoft Developer Network: http://msdn.microsoft.com/en-us/library/hh265137.aspx Microsoft Corp. (2012). Direct3D. Retrieved from Microsoft Developer Network: http://msdn.microsoft.com/en-us/library/windows/desktop/hh309466%28v=vs.85%29.aspx Microsoft Corp. (2012, 6 21). XInput Game Controller APIs. Retrieved from Microsoft Developer Network: http://msdn.microsoft.com/en-us/library/windows/desktop/hh405053%28v=vs.85%29.aspx Miller, A. (2010). The task graph pattern. ParaPLoP '10: Proceedings of the 2010 Workshop on Parallel Programming Patterns. ACM. doi:10.1145/1953611.1953619 NVIDIA Corp. (2007). NVIDIA CUDA: Compute Unified Device Architecture. Rhalibi, A. E., Merabti, M., & Shen, Y. (2006). Improving game processing in multithreading and multiprocessor architecture. Edutainment'06 Proceedings of the First international conference on Technologies for E-Learning and Digital Entertainment (pp. 669-679). Springer-Verlag Berlin, Heidelberg. doi:10.1007/11736639_81 Rodrigues, J. E., & Bezos, J. E. (1969). A GRAPH MODEL FOR PARALLEL COMPUTATIONS. Cambridge, MA: Massachusetts Institute of Technology. Stroustrup, B. (1997). The C++ Programming Language Third Edition. Massachusetts: Addison Wesley, Reading. The OpenCL Specification. (2010). Retrieved from http://www.khronos.org/registry/cl/ Valente, L., Conci, A., & Feijó, B. (2005). Real time game loop models for single-player computer games. Proceedings of the Fourth Brazilian Symposium on Computer Games and Digital Entertainment, (pp. 89-99). Werth, T., Schreier, S., & Philippsen, M. (2011). CellCilk: Extending Cilk for heterogeneous multicore platforms. The 24th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2011). Fort Collins, Colorado, USA.