Login | Register

Realistic Occlusion Augmentation for Human Pose Estimation


Realistic Occlusion Augmentation for Human Pose Estimation

Ansarian, Amin (2021) Realistic Occlusion Augmentation for Human Pose Estimation. Masters thesis, Concordia University.

[thumbnail of Ansarian_MASc_S2021.pdf]
Text (application/pdf)
Ansarian_MASc_S2021.pdf - Accepted Version


Occlusion occurs naturally in a high percentage
of real-world images. Handling occlusion has been a difficult challenge in human pose estimation methods, specially those using CNN. A main reason is the a lack of a proper dataset with an actual focus on realistic occlusion, prompting researchers to create datasets with synthetic (bounding-box based) occlusion, as a means of data augmentation. In this thesis, we investigate how to increase learning through data preparation (i.e., data-centric approach). To this end, we introduce a new realistic data augmentation approach built on top of an original (base) dataset (e.g., Human3.6m and MPI-INF-3DHP) that tackles this issue, creating realistic samples similar to those found in the wild. Arguing that CNN models pay higher attention to local as opposed
to global features, we define occlusion levels, process a large set of occluder objects from different categories, augment them adaptive to the joints types and to the size of the human subject, and effectively blend those occluders within the original RGB image from the base dataset. We, then, test top-performing CNN-based 2D and 3D human pose estimation models with and without our occlusion-augmented datasets (\textit{RealPose}). Our experiments show that a significant drop in accuracy of these CNN models under occlusion. When we then train them on RealPose, we observe a major increase in accuracy under occlusion, without any change to the models themselves. Achieved results indicate the effectiveness of the proposed data augmentation method in tackling the occlusion issue both in the 2D and 3D models, with a significantly much more accuracy increase of the 3D models. We have trained and tested the models under different dataset combinations such as "training on the original dataset but testing under the augmented dataset" or "training and testing with the mixed original and augmented dataset". A significant outcome is that, our data-centric approach results in a higher accuracy of CNN models trained under occluded samples but tested under the original (not-occluded) samples, indicating the model achieves a higher understanding of the dependency of different joints induced to it. Proposed approach is dataset and network independent, i.e., researchers can apply our approach (using its open-source code) to any dataset and feed the result to any human pose estimator.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Electrical and Computer Engineering
Item Type:Thesis (Masters)
Authors:Ansarian, Amin
Institution:Concordia University
Degree Name:M.A. Sc.
Program:Electrical and Computer Engineering
Date:1 July 2021
Thesis Supervisor(s):Amer, Maria
ID Code:988559
Deposited By: Amin Ansarian
Deposited On:29 Nov 2021 16:23
Last Modified:29 Nov 2021 16:23
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top