Quan, Jianning (2021) Pose Estimation and Object Detection using Deep Convolutional Networks. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
21MBQuan_MASc_F2021.pdf - Accepted Version |
Abstract
Human pose estimation and object detection are fundamental problems in computer vision and autonomous systems with applications ranging from healthcare and sports to surveillance, autonomous driving and traffic monitoring. The task of 3D human pose estimation is to predict the positions of a person’s joints, while the goal of object detection is to identify the object category and locate the position using a bounding box for every known object within an image or video. The contributions in this thesis are two-fold. One is to tackle the 3D human pose estimation problem in the graph-theoretic setting. More specifically, we introduce a higher-order graph convolutional framework with initial residual connections for 3D-to-2D pose estimation. The proposed approach is derived from implicit fairing on graphs using a scale-dependent graph Laplacian filtering scheme. Using multi-hop neighborhoods for node feature aggregation, our model is able to capture the long-range dependencies between body joints. Moreover, our approach alleviates the oversmoothing problem caused by repeated graph convolutions, preventing the learned feature
representations from converging to similar values thanks in part to residual connections with the first layer of the network. These residual connections are integrated by design in our network architecture, and help ensure that the learned feature representations retain important information from the initial features of the input layer as the network depth increases. Experiments and ablations studies conducted on a standard benchmark demonstrate the effectiveness of our model, achieving superior performance over strong baseline methods for 3D human pose estimation.
The other contribution consists of designing a single-stage object detection model for aerial imagery using a class-balanced loss function in conjunction with a feature pyramid network in an effort to mitigate the data imbalance problem without the need to rely on data augmentation. The
key benefit of using the class-balanced focal loss is the ability to adjust the contributions of minority classes to the loss function with the aim to tackle the class imbalance problem, allowing our model to detect different classes evenly. The performance of our proposed object detection model is demonstrated through extensive experiments on a standard aerial image benchmark, achieving comparable or better object detection results in comparison with competing baselines.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | Quan, Jianning |
Institution: | Concordia University |
Degree Name: | M.A. Sc. |
Program: | Quality Systems Engineering |
Date: | 30 July 2021 |
Thesis Supervisor(s): | Ben Hamza, Abdessamad |
ID Code: | 988613 |
Deposited By: | Jianning Quan |
Deposited On: | 29 Nov 2021 17:03 |
Last Modified: | 29 Nov 2021 17:03 |
Repository Staff Only: item control page