Salehi Dastjerdi, Niloufar (2025) Improving Visual Processing through Feature Descriptor Enhancement and Depth Upsampling. PhD thesis, Concordia University.
Preview |
Text (application/pdf)
5MBSalehi Dastjerdi_PhD_F2025.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
Visual processing is a core aspect of modern computer vision, supporting applications such as object recognition, tracking, and 3D reconstruction. However, challenges like occlusions, appearance variations, and inconsistencies between RGB and depth data continue to hinder reliable scene interpretation. Traditional feature descriptors often lack adaptability in complex environments, and depth maps acquired from low-cost sensors typically exhibit low resolution, noise, and missing values. These limitations obstruct the accurate extraction of spatial and contextual information required for robust scene analysis. To address these issues, the overall objective of this thesis is to advance visual processing through adaptive, feature-driven techniques that leverage the spatial and contextual richness of color and depth data, thereby contributing to more reliable interpretation of complex visual scenes. Aligned with this objective, the thesis is structured around two main parts aimed at achieving more accurate and context-aware scene understanding.
The first part of the thesis is focused on the design of two advanced feature descriptors and an object tracking scheme. The first descriptor, r-spatiogram, captures spatial, color, and texture information within image regions to provide detail-preserving, context-aware representation. The second descriptor, adaptive multi-scale (AMS), improves adaptability by employing strategies suited to diverse visual environments. In continuation of this part, a novel object tracking
iv
framework is introduced that utilizes color and depth information to address common challenges such as occlusions and target-background similarity. This framework initially operates independently and is then extended by incorporating each proposed descriptor to evaluate their impact on tracking accuracy and robustness in dynamic environments.
The second part of the thesis is concerned with a novel depth upsampling scheme that improves the quality of low-resolution depth maps. It employs a joint local–nonlocal framework guided by an adaptive bandwidth mechanism that dynamically adjusts the influence of neighboring pixels. A distance-based patch similarity map is introduced to support this adaptation. Two similarity strategies are explored, one using a standard metric, and the other incorporating the AMS descriptor for capturing more complex structural relationships.
Extensive experiments are conducted on multiple benchmark datasets to validate the effectiveness of the proposed schemes.
| Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Electrical and Computer Engineering |
|---|---|
| Item Type: | Thesis (PhD) |
| Authors: | Salehi Dastjerdi, Niloufar |
| Institution: | Concordia University |
| Degree Name: | Ph. D. |
| Program: | Electrical and Computer Engineering |
| Date: | 10 June 2025 |
| Thesis Supervisor(s): | Ahmad, M. Omair |
| ID Code: | 995896 |
| Deposited By: | NILOUFAR SALEHI DASTJERDI |
| Deposited On: | 04 Nov 2025 16:15 |
| Last Modified: | 04 Nov 2025 16:15 |
References:
[1] B. Ying, B. Xue, P. Mesejo, S. Cagnoni and M. Zhang, "A survey on evolutionary computation for computer vision and image analysis: Past, present, and future trends," IEEE Transactions on Evolutionary Computation, vol. 27, no. 1, pp. 5-25, 2022.[2] D. Ganguly, S. Chakraborty, M. Balitanas and T.-h. Kim, "Medical imaging: A review.," in Proc. International Conference on Security-Enriched Urban Computing and Smart Grid, pp. 504-516. Berlin, Heidelberg, 2010.
[3] O. Elharrouss, N. Almaadeed and S. Al-Maadeed, "A review of video surveillance systems," Journal of Visual Communication and Image Representation, vol. 77, 2021.
[4] N. Salehi Dastjerdi, J. Valognes and M. Amer, "Effective keyframe extraction from RGB and RGB-D video sequences," in Proc. Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA), pp. 1-5, 2017.
[5] J. Valognes, N. Salehi Dastjerdi and M. Amer, "Augmenting reality of tracked video objects using homography and keypoints," in Proc. Image Analysis and Recognition: 16th International Conference, ICIAR, Waterloo, ON, Canada, August 27–29, 2019.
[6] N. Salehi Dastjerdi and M. O. Ahmad, "Visual Tracking Applying Depth Spatiogram and Multi-feature Data," in Proc. Computer Science & Information Technology (CS & IT), pp. 135-145, 2019.
[7] N. Salehi Dastjerdi and M. O. Ahmad, "Applying R-Spatiogram in object tracking for occlusion handling," Signal & Image Processing: An International Journal (SIPIJ), vol. 9, no. 17, 2020.
[8] S. T. Birchfield and S. Rangarajan, "Spatiograms versus histograms for region-based tracking," in Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). vol. 2, 2005.
[9] M. Nilsson, J. S. Bartunek, J. Nordberg and I. Claesson, "On histograms and spatiograms-introduction of the mapogram.," in Proc. 15th IEEE International Conference on Image Processing, pp. 973-976, 2008.
[10] N. Salehi Dastjerdi and M. O. Ahmad, "A new hybrid descriptor based on spatiogram and region covariance descriptor," in Proc. CS & IT Conference Proceedings. vol. 9. no. 17, Sydney, Australia, 2019.
[11] D. G. Lowe, "Distinctive image features from scale-invariant keypoints.," International journal of computer vision, vol. 60, pp. 91-110, 2004.
[12] H. Bay, A. Ess, T. Tuytelaars and L. Van Gool, "SURF: Speeded Up Robust Features," in Proc. Computer Vision and Image Understanding, May 7-13, Graz, Austria, , Part I9, pp. 404-417, 2006.
[13] P. F. Alcantarilla, A. Bartoli and A. J. Davison, "KAZE features.," in Proc. Computer Vision–ECCV, 12th European Conference on Computer Vision. Proceedings, , Florence, Italy, Part VI 12, pp. 214-227, 2012.
[14] S. Leutenegger, M. Chli and R. Y. Siegwart, "BRISK: Binary Robust invariant scalable keypoints.," in Proc. of the IEEE International Conference on Computer Vision. pp. 2548-2555, 2011.
[15] A. Alahi, R. Ortiz and P. Vandergheynst, "FREAK: Fast retina keypoint.," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition., 2012.
[16] E. Rublee, V. Rabaud, K. Konolige and G. Bradski, "ORB: An efficient alternative to SIFT or SURF.," in Proc. International conference on computer vision, pp. 2564-2571, 2011.
[17] K. He, X. Zhang, S. Ren and J. Sun, "Deep residual learning for image recognition.," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition., 2016.
[18] K. M. Yi, E. Trulls, V. Lepetit and P. Fua, "Lift: Learned invariant feature transform.," in Proc. Computer Vision–ECCV 2016: 14th European Conference, Proceedings, Amsterdam, The Netherlands, Part VI 14, pp. 467-483, 2016.
[19] M. Y. Liu, O. Tuzel and Y. Taguchi, "Joint geodesic upsampling of depth images," in Proc. the IEEE conference on computer vision and pattern recognition, pp. 169-176., 2013.
[20] W. Dong, G. Shi, X. Li, K. Peng, J. Wu and Z. Guo, "Color-guided depth recovery via joint local structural and nonlocal low-rank regularization," IEEE Transactions on Multimedia, vol. 19, no. 2, pp. 293-301, 2016.
[21] Y. Zhang, L. Ding and G. Sharma, "Local-linear-fitting-based matting for joint hole filling and depth upsampling of RGB-D images.," Journal of Electronic Imaging, vol. 28, no. 3 pp. 033019-033019, 2019.
[22] W. Liu, X. Chen, J. Yang and Q. Wu, "Robust color guided depth map restoration.," IEEE Transactions on Image Processing, vol. 26, no. 1 pp. 315-327, 2016.
[23] Y. Zuo, Q. Wu, J. Zhang and P. An, "Explicit edge inconsistency evaluation model for color-guided depth map enhancement.," IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 2., pp. 439-453.2016.
[24] M. Yang, Y. Cheng, Y. Guang, J. Wang and N. Zheng, "Boundary recovery of depth map for synthesis view optimization in 3D video.," in Proc. IEEE International Conference on Consumer Electronics (ICCE), pp. 1-4, 2019.
[25] H. Wang, M. Yang, X. Lan, C. Zhu and N. Zheng, "Depth map recovery based on a unified depth boundary distortion model," IEEE Trans. Image Process, vols. 31, pp. 7020–7035, 2022.
[26] J. Yang, X. Ye, K. Li, C. Hou and Y. Wang, "Color-guided depth recovery from RGB-D data using an adaptive autoregressive model.," IEEE transactions on image processing, vol. 23, no. 8, pp. 3443-3458, 2014.
[27] T.-W. Hui, C. C. Loy and X. Tang, "Depth map super-resolution by deep multi-scale guidance.," in Proc. Computer Vision–ECCV, 14th European Conference, Amsterdam, Netherlands, Part III 14, pp. 353-36, 2016.
[28] B. Ham, M. Cho and J. Ponce, "Robust image filtering using joint static and dynamic guidance.," in Proc. of the IEEE conference on computer vision and pattern recognition, pp. 4823-4831 , 2015.
[29] J. Xie, R. S. Feris and M.-T. Sun, "Edge-guided single depth image super resolution.," IEEE Transactions on Image Processing, vol. 25, no. 1: 428-438, 2015.
[30] Z. Jiang, Y. Hou, H. Yue, J. Yang and C. Hou, "Depth super-resolution from RGB-D pairs with transform and spatial domain regularization.," in IEEE Transactions on Image Processing, vol. 27, no. 5, pp. 2587-2602, 2018.
[31] J. Diebel and S. Thrun, "An application of markov random fields to range sensing.," Advances in neural information processing systems 18, 2005.
[32] J. Park, H. Kim, Y.-W. Tai, M. S. Brown and I. Kweon, "High quality depth map upsampling for 3D-TOF cameras.," in Proc. International Conference on Computer Vision, pp. 1623-1630, 2011.
[33] W. Liu, X. Chen, J. Yang and Q. Wu, "Variable bandwidth weighting for texture copy artifact suppression in guided depth upsampling.," IEEE Transactions on Circuits and Systems for Video Technology, vol. 27, no. 10, pp. 2072-2085, 2016.
[34] O. Tuzel, F. Porikli and P. Meer, "Region covariance: A fast descriptor for detection and classification.," in Proc. Computer Vision–ECCV, 9th European Conference on Computer Vision, Graz, Austria, Part II 9, pp. 589, 2006.
[35] A. Cherian, S. Sra, A. Banerjee and N. Papanikolopoulos, "Efficient similarity search for covariance matrices via the Jensen-Bregman LogDet divergence.," in Proc. International conference on computer vision, pp. 2399-2406, 2011.
[36] A. Ulges, C. Lampert and D. Keysers, "Spatiogram-Based Shot Distances for Video Retrieval.," In Trecvid., 2006.
[37] N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection.," in Proc. IEEE computer society conference on computer vision and pattern recognition (CVPR'05), vol. 1, pp. 886-893, 2005.
[38] T. Lindeberg, "Scale-space theory: A basic tool for analyzing structures at different scales.," Journal of applied statistics, vol. 21, no. 1-2: 225-270, 1994.
[39] P. J. Burt and E. H. Adelson, "The Laplacian pyramid as a compact image code.," in Readings computer vision, Morgan Kaufmann, pp. 671-679, 1987.
[40] W. K. Leow and R. Li, "The analysis and applications of adaptive-binning color histograms.," Computer Vision and Image Understanding, vol. 94, no. 1-3, pp. 67-91, 2004.
[41] M. Ozuysal, P. Fua and V. Lepetit, "Fast keypoint recognition in ten lines of code.," in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-8, 2007.
[42] A. Kendall and Y. Gal, "What uncertainties do we need in bayesian deep learning for computer vision?," Advances in neural information processing systems 30, 2017.
[43] Mikolajczyk(Oxford) Datasets. Available online: https://www.robots.ox.ac.uk/~vgg/research/affine/ (accessed on 15 June 2024).
[44] H. S. DGT and K. J. Udesang, "A survey on object detection and tracking methods," International Journal of Innovative Research in Computer and Communication Engineering, vol. 2, no. 2, 2014.
[45] P. Sebastian, Y. Vooi Voon and R. Comley, "Colour space effect on tracking in video surveillance.," International Journal on Electrical Engineering and Informatics , vol. 2, pp. 298, 2010.
[46] J. Athanesious and P. Suresh, "Implementation and comparison of kernel and silhouette based object tracking.," International Journal of Advanced Research in Computer Engineering & Technology, vol. 2, no. 3, pp. 1298-1303, 2013.
[47] Y. Cheng, "Mean shift, mode seeking, and clustering.," IEEE transactions on pattern analysis and machine intelligence , vol. 17, no. 8, pp. 790-799., 1995.
[48] N. Salehi Dastjerdi and M. O. Ahmad, "Adaptive multi-scale spatiogram (AMS): A robust and efficient descriptor for complex visual environments," in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), London, UK, May 2025.
[49] I. Tosic and S. Drewes."Learning Joint Intensity-Depth Sparse Representations." IEEE Transactions on Image Processing vol.23, no.5, pp.2122-2132, 2014.
[50] VIPeR Datasets. Available online: https://vision.soe.ucsc.edu/node/248, (accessed on September 2017).
[51] M. Farenzena, L. Bazzani, A. Perina, V. Murino and M. Cristani, "Person re-identification by symmetry-driven accumulation of local features.," in Proc. IEEE computer society conference on computer vision and pattern recognition, pp. 2360-2367, 2010.
[52] Y. Hu, S. Liao, Z. Lei, D. Yi and S. Li, "Exploring structural information and fusing multiple features for person re-identification.," in Proc. IEEE conference on computer vision and pattern recognition workshops, pp. 794-799, 2013.
[53] B. Ma, Y. Su and F. Jurie, "Covariance descriptor based on bio-inspired features for person re-identification and face verification.," Image and Vision Computing, vol. 32, no. 6-7, pp. 379-390, 2014.
[54] Y. Xu, L. Lin, W.-S. Zheng and X. Liu, "Human re-identification by matching compositional template with cluster sampling.," in Proc. IEEE International Conference on Computer Vision, pp. 3152-3159 , 2013.
[55] A. W. M. Smeulders, D. M. Chu, R. Cucchiara, S. Calderara, A. Dehghan and M. Shah, "Visual tracking: An experimental survey.," "IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 7, pp. 1442-1468, 2013.
[56] A. Asvadi, H. Mahdavinataj, M. R. Karami and Y. Baleghi, "Online visual object tracking using incremental discriminative color learning," in CSI Journal on Computer Science and Engineering, vol. 12, no.2 & 4 (b), 2014.
[57] Princeton Datasets. Available online: https://tracking.cs.princeton.edu/ (accessed on 14 February 2018).
[58] J. Ning, L. Zhang, D. Zhang and C. Wu, "Robust mean-shift tracking with corrected background-weighted histogram.," IET computer vision, vol. 6, no. 1, pp. 62-69, 2012.
[59] S. He, Q. Yang, R. W. H. Lau, J. Wang and M.-H. Yang, "Visual tracking via locality sensitive histograms," in proc. IEEE conference on computer vision and pattern recognition, pp. 2427-2434, 2013.
[60] N. Salehi Dastjerdi and M. O. Ahmad, "Depth Upsampling with Local and Nonlocal Models Using Adaptive Bandwidth.,"in Electronics , vol. 14, no. 8, 2025.
[61] Y. Hou, J. Xu, M. Liu, G. Liu, L. Liu, F. Zhu and L. Shao, "NLH: A blind pixel-level non-local method for real-world image denoising.,"in IEEE transactions on image processing , vol. 29, pp. 5121-5135., 2020.
[62] A. Beck and M. Teboulle, "A fast iterative shrinkage-thresholding algorithm for linear inverse problems.,"in SIAM journal on imaging sciences, vol. 2, no. 1, pp. 183-202, 2009.
[63] Middlebury datasets, Available online: https://vision.middlebury.edu/stereo/data/, (accessed on 8 November 2024).
[64] D. Scharstein and C. Pal, "Learning conditional random fields for stereo," in proc. IEEE conference on computer vision and pattern recognition, pp. 1-8, 2007.
[65] H. Hirschmuller and D. Scharstein, "Evaluation of cost functions for stereo matching.," in proc. IEEE conference on computer vision and pattern recognition, pp. 1-8, 2007.
[66] D. Scharstein, H. Hirschmüller, Y. Kitajima, G. Krathwohl, N. Nešić, X. Wang and P. Westling, "High-resolution stereo datasets with subpixel-accurate ground truth.," in Proc. Pattern Recognition: 36th German Conference, GCPR, Münster, Germany, pp. 31-42, September 2-5, 2014.
[67] F. Wilcoxon, "Individual comparisons by ranking methods.," Breakthroughs in statistics: Methodology and distribution, New York, NY, pp. 196-202, 1992.
[68] D. C. Montgomery and G. C. Runger, Applied statistics and probability for engineers., John wiley & sons, 2010.
[69] NYU Datasets. Available online: https://cs.nyu.edu/fergus/datasets/ (accessed on 12 March 2025).
[70] X. Hong, H. Chang, S. Shan, X. Chen and W. Gao, "Sigma set: A small second order statistical region descriptor," in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1802-1809, 2009.
[71] C. O. Conaire, N. E. O'Connor and A. F. Smeaton, "An improved spatiogram similarity measure for robust object localisation.," in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP'07, vol. 1, pp. I-1069, 2007.
Repository Staff Only: item control page


Download Statistics
Download Statistics