Zhang, Yi (2021) Efficient Asynchronous GCN Training on a GPU Cluster. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
1MBZhang_MCompSc_S2021.pdf - Accepted Version |
Abstract
A common assumption in traditional synchronous parallel training of Graph Convolutional Networks (GCNs) using multiple GPUs is that load is perfectly balanced among all GPUs. However, this assumption does not hold in a real-world scenario where there can be imbalances in workloads among GPUs for various reasons. In a synchronous parallel implementation, a straggler in the system can limit the overall speed-up of parallel training. To address these issues, this research investigates approaches for asynchronous decentralized parallel training for GCNs. The techniques investigated are based on graph clustering and gossiping. The research specifically adapts the approach of Cluster-GCN, which uses graph partitioning for SGD-based training, and combines with a novel gossip algorithm specifically designed for a GPU cluster to periodically exchange gradients among randomly chosen partners. In addition, it incorporates a work-pool mechanism for load balancing among GPUs. The gossip algorithm is proven to be deadlock free. The implementation is done on a GPU cluster with 8 Tesla V100 GPUs per compute node, and PyTorch and DGL as the software platforms. Experiments are conducted for different benchmark datasets. The results demonstrate superior performance, at the compromise of minor accuracy loss in some runs, as compared to traditional synchronous training which uses all-reduce to synchronously accumulate parallel training results.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | Zhang, Yi |
Institution: | Concordia University |
Degree Name: | M. Comp. Sc. |
Program: | Computer Science |
Date: | 26 March 2021 |
Thesis Supervisor(s): | Goswami, Dhrubajyoti |
ID Code: | 988244 |
Deposited By: | YI ZHANG |
Deposited On: | 29 Jun 2021 22:27 |
Last Modified: | 29 Jun 2021 22:27 |
Repository Staff Only: item control page