Ansarian, Amin (2021) Realistic Occlusion Augmentation for Human Pose Estimation. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
4MBAnsarian_MASc_S2021.pdf - Accepted Version |
Abstract
Occlusion occurs naturally in a high percentage
of real-world images. Handling occlusion has been a difficult challenge in human pose estimation methods, specially those using CNN. A main reason is the a lack of a proper dataset with an actual focus on realistic occlusion, prompting researchers to create datasets with synthetic (bounding-box based) occlusion, as a means of data augmentation. In this thesis, we investigate how to increase learning through data preparation (i.e., data-centric approach). To this end, we introduce a new realistic data augmentation approach built on top of an original (base) dataset (e.g., Human3.6m and MPI-INF-3DHP) that tackles this issue, creating realistic samples similar to those found in the wild. Arguing that CNN models pay higher attention to local as opposed
to global features, we define occlusion levels, process a large set of occluder objects from different categories, augment them adaptive to the joints types and to the size of the human subject, and effectively blend those occluders within the original RGB image from the base dataset. We, then, test top-performing CNN-based 2D and 3D human pose estimation models with and without our occlusion-augmented datasets (\textit{RealPose}). Our experiments show that a significant drop in accuracy of these CNN models under occlusion. When we then train them on RealPose, we observe a major increase in accuracy under occlusion, without any change to the models themselves. Achieved results indicate the effectiveness of the proposed data augmentation method in tackling the occlusion issue both in the 2D and 3D models, with a significantly much more accuracy increase of the 3D models. We have trained and tested the models under different dataset combinations such as "training on the original dataset but testing under the augmented dataset" or "training and testing with the mixed original and augmented dataset". A significant outcome is that, our data-centric approach results in a higher accuracy of CNN models trained under occluded samples but tested under the original (not-occluded) samples, indicating the model achieves a higher understanding of the dependency of different joints induced to it. Proposed approach is dataset and network independent, i.e., researchers can apply our approach (using its open-source code) to any dataset and feed the result to any human pose estimator.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Electrical and Computer Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | Ansarian, Amin |
Institution: | Concordia University |
Degree Name: | M.A. Sc. |
Program: | Electrical and Computer Engineering |
Date: | 1 July 2021 |
Thesis Supervisor(s): | Amer, Maria |
ID Code: | 988559 |
Deposited By: | Amin Ansarian |
Deposited On: | 29 Nov 2021 16:23 |
Last Modified: | 29 Nov 2021 16:23 |
Repository Staff Only: item control page