Login | Register

Efficient Asynchronous GCN Training on a GPU Cluster


Efficient Asynchronous GCN Training on a GPU Cluster

Zhang, Yi (2021) Efficient Asynchronous GCN Training on a GPU Cluster. Masters thesis, Concordia University.

[thumbnail of Zhang_MCompSc_S2021.pdf]
Text (application/pdf)
Zhang_MCompSc_S2021.pdf - Accepted Version


A common assumption in traditional synchronous parallel training of Graph Convolutional Networks (GCNs) using multiple GPUs is that load is perfectly balanced among all GPUs. However, this assumption does not hold in a real-world scenario where there can be imbalances in workloads among GPUs for various reasons. In a synchronous parallel implementation, a straggler in the system can limit the overall speed-up of parallel training. To address these issues, this research investigates approaches for asynchronous decentralized parallel training for GCNs. The techniques investigated are based on graph clustering and gossiping. The research specifically adapts the approach of Cluster-GCN, which uses graph partitioning for SGD-based training, and combines with a novel gossip algorithm specifically designed for a GPU cluster to periodically exchange gradients among randomly chosen partners. In addition, it incorporates a work-pool mechanism for load balancing among GPUs. The gossip algorithm is proven to be deadlock free. The implementation is done on a GPU cluster with 8 Tesla V100 GPUs per compute node, and PyTorch and DGL as the software platforms. Experiments are conducted for different benchmark datasets. The results demonstrate superior performance, at the compromise of minor accuracy loss in some runs, as compared to traditional synchronous training which uses all-reduce to synchronously accumulate parallel training results.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Thesis (Masters)
Authors:Zhang, Yi
Institution:Concordia University
Degree Name:M. Comp. Sc.
Program:Computer Science
Date:26 March 2021
Thesis Supervisor(s):Goswami, Dhrubajyoti
ID Code:988244
Deposited By: YI ZHANG
Deposited On:29 Jun 2021 22:27
Last Modified:29 Jun 2021 22:27
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top