Graph Representation Learning for 3D Human Pose Estimation

Title:

Graph Representation Learning for 3D Human Pose Estimation

Islam, Zaedul (2023) Graph Representation Learning for 3D Human Pose Estimation. Masters thesis, Concordia University.

Preview

Text (application/pdf)
Islam_MASc_F2023.pdf - Accepted Version
Available under License Creative Commons Attribution.

17MB

Abstract

Graph convolutional networks (GCNs) have proven to be an effective approach for 3D human pose estimation. By naturally modeling the skeleton structure of the human body as a graph, GCNs are able to capture the spatial relationships between joints and learn an efficient representation of the underlying pose. However, most GCN-based methods use a shared weight matrix, making it challenging to accurately capture the different and complex relationships between joints. In this thesis, we introduce an iterative graph filtering framework for 3D human pose estimation, which aims to predict the 3D joint positions given a set of 2D joint locations in images. Our approach builds upon the idea of iteratively solving graph filtering with Laplacian regularization via the Gauss-Seidel iterative method. Motivated by this iterative solution, we design a Gauss-Seidel network architecture, which makes use of weight and adjacency modulation, skip connection, and a pure convolutional block with layer normalization. Adjacency modulation facilitates the learning of edges that go beyond the inherent connections of body joints, resulting in an adjusted graph structure that reflects the human skeleton, while skip connections help maintain crucial information from the input layer’s initial features as the network depth increases. Our experimental results demonstrate that our approach outperforms the baseline methods on standard benchmark datasets.

This thesis makes another significant contribution by designing a spatio-temporal 3D human pose estimation model. Accurate 3D human pose estimation is a challenging task due to occlusion and depth ambiguity. To address these issues, we introduce a novel approach called Multi-hop Graph Transformer Network, which combines the strengths of multi-head self-attention and multi-hop graph convolutional networks with disentangled neighborhoods to capture spatio-temporal dependencies and handle long-range interactions. The proposed network architecture consists of two main blocks: a graph attention block composed of stacked layers of multi-head self-attention and graph convolution with learnable adjacency matrix, and a multi-hop graph convolutional block comprised of multi-hop convolutional and dilated convolutional layers. Extensive experiments demonstrate the effectiveness and generalization ability of our model, achieving state-of-the-art performance on benchmark datasets while maintaining a compact model size.

Divisions:	Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering
Item Type:	Thesis (Masters)
Authors:	Islam, Zaedul
Institution:	Concordia University
Degree Name:	M.A. Sc.
Program:	Quality Systems Engineering
Date:	28 August 2023
Thesis Supervisor(s):	Ben Hamza, Abdessamad
ID Code:	992729
Deposited By:	Zaedul Islam
Deposited On:	17 Nov 2023 14:52
Last Modified:	17 Nov 2023 14:52

Repository Staff Only: item control page

Download Statistics

Downloads per month over past year

Research related to the current document (at the CORE website)

Spectrum Research Repository

Graph Representation Learning for 3D Human Pose Estimation

Graph Representation Learning for 3D Human Pose Estimation

Abstract