Islam, Zaedul (2023) Graph Representation Learning for 3D Human Pose Estimation. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
17MBIslam_MASc_F2023.pdf - Accepted Version Available under License Creative Commons Attribution. |
Abstract
Graph convolutional networks (GCNs) have proven to be an effective approach for 3D human pose estimation. By naturally modeling the skeleton structure of the human body as a graph, GCNs are able to capture the spatial relationships between joints and learn an efficient representation of the underlying pose. However, most GCN-based methods use a shared weight matrix, making it challenging to accurately capture the different and complex relationships between joints. In this thesis, we introduce an iterative graph filtering framework for 3D human pose estimation, which aims to predict the 3D joint positions given a set of 2D joint locations in images. Our approach builds upon the idea of iteratively solving graph filtering with Laplacian regularization via the Gauss-Seidel iterative method. Motivated by this iterative solution, we design a Gauss-Seidel network architecture, which makes use of weight and adjacency modulation, skip connection, and a pure convolutional block with layer normalization. Adjacency modulation facilitates the learning of edges that go beyond the inherent connections of body joints, resulting in an adjusted graph structure that reflects the human skeleton, while skip connections help maintain crucial information from the input layer’s initial features as the network depth increases. Our experimental results demonstrate that our approach outperforms the baseline methods on standard benchmark datasets.
This thesis makes another significant contribution by designing a spatio-temporal 3D human pose estimation model. Accurate 3D human pose estimation is a challenging task due to occlusion and depth ambiguity. To address these issues, we introduce a novel approach called Multi-hop Graph Transformer Network, which combines the strengths of multi-head self-attention and multi-hop graph convolutional networks with disentangled neighborhoods to capture spatio-temporal dependencies and handle long-range interactions. The proposed network architecture consists of two main blocks: a graph attention block composed of stacked layers of multi-head self-attention and graph convolution with learnable adjacency matrix, and a multi-hop graph convolutional block comprised of multi-hop convolutional and dilated convolutional layers. Extensive experiments demonstrate the effectiveness and generalization ability of our model, achieving state-of-the-art performance on benchmark datasets while maintaining a compact model size.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | Islam, Zaedul |
Institution: | Concordia University |
Degree Name: | M.A. Sc. |
Program: | Quality Systems Engineering |
Date: | 28 August 2023 |
Thesis Supervisor(s): | Ben Hamza, Abdessamad |
ID Code: | 992729 |
Deposited By: | Zaedul Islam |
Deposited On: | 17 Nov 2023 14:52 |
Last Modified: | 17 Nov 2023 14:52 |
Repository Staff Only: item control page