Hosseini, Hassan (1998) High performance virtual architecture parallel libraries with data redistribution for multicomputers. PhD thesis, Concordia University.
Sequential programs which use library calls to perform their intensive numerical computations may not deliver satisfactory performance for large problem instances in uniprocessor systems. Replacing the library system with one that performs the computations on a multicomputer can provide significant improvement in the execution time of these programs. These parallel libraries also encourage programmers who have no knowledge of multicomputer programming to use multicomputers to run their newly developed compute-intensive applications. Multicomputer programs perform computations on distributed data. Transfer of data between processors is carried out using communication operations which are normally costly. Introduction of parallel library systems gives rise to four important issues. The first one is the design of the library routine without knowing the problem instance size and the physical system size , as this is the case with many partitionable and reconfigurable systems. Performance of the library routine which is sensitive to the granularity of the computation and the mapping of the computation onto the physical system is the second issue. Maintaining a call interface which resembles those of sequential libraries is the third one. Finally, once ported to a new platform, parallel system speedup becomes a major concern. Data distribution at each parallel library call is performed sequentially which, consequently, degrades the performance of the library routine. Since distributed data used or produced by one library call is often used in the subsequent calls to the same routine or other library routines, it is beneficial to redistribute the data from the former library call to prepare for the latter. The redistribution operation is a parallel operation and reduces the overall execution time of a parallel library call. This thesis presents the design of a parallel library system which possesses several unique properties. The design supports dynamic grain adjustment and delayed mapping of the virtual to physical processors in order to reduce the communication overhead of the library calls. It also supports transparent distributed data management that results in a call interface similar to those of sequential libraries. Furthermore, the design supports transparent data redistribution across parallel library calls. Once ported to a new system, the library can be easily adjusted with the target system parameters to deliver the best performance based on the new parameters. Feasibility, performance, and overhead of our design have been experimented using a source to source transformer, a compiler, library design of several virtual architecture parallel algorithms, a mapping module, a virtual communication library, a redistribution library, and a multicomputer simulator. The implementation of the library system on an actual multicomputer has been thoroughly discussed in the thesis.
|Divisions:||Concordia University > Faculty of Engineering and Computer Science > Computer Science and Software Engineering|
|Item Type:||Thesis (PhD)|
|Pagination:||ix, 189 leaves : ill. ; 29 cm.|
|Degree Name:||Theses (Ph.D.)|
|Program:||Computer Science and Software Engineering|
|Thesis Supervisor(s):||Tao, Lixin|
|Deposited By:||Concordia University Libraries|
|Deposited On:||27 Aug 2009 13:14|
|Last Modified:||08 Dec 2010 10:16|
Repository Staff Only: item control page