Context Mining for Visual Object Counting

Title:

Context Mining for Visual Object Counting

Zgaren, Ahmed ORCID: https://orcid.org/0000-0001-5777-3440 (2025) Context Mining for Visual Object Counting. PhD thesis, Concordia University.

Text (application/pdf)
Zgaren_PhD_F2025.pdf - Accepted Version
Restricted to Registered users only until 1 May 2026.
Available under License Spectrum Terms of Access.

33MB

Abstract

Visual object counting is a fundamental task in computer vision that aims to accurately estimate the number of objects of interest within an image. This task has widespread applications across various domains, including environmental monitoring, surveillance, retail analytics, and medical imaging. Traditional counting methods often face challenges such as object occlusion, variation in scale and appearance, and complex scene backgrounds. Although deep learning has significantly advanced this field, there are still limitations, particularly regarding the accurate capture of contextual information.
This thesis focuses on developing novel approaches to enhance visual object counting, targeting key research problems related to accuracy, efficiency, and robustness in both class-specific and class-agnostic counting scenarios. To address these challenges, this thesis makes several key contributions. First, we propose a novel hybrid counting method that combines local detection with global estimation to accurately count objects in aerial imagery. This approach efficiently exploits both local and global information, enhancing counting accuracy in high-density situations. Second, we introduce a self-attention-based model for class-agnostic counting, which effectively encodes repetitive object patterns, allowing for precise counting even in the presence of object variations and background clutter. This method improves feature representation and matching, leading to enhanced robustness and generalization capabilities. Finally, we present a novel box-free counting model requiring only one annotation point per object, significantly reducing the annotation task. This method employs contextual transformers and a position-aware attention encoder to achieve accurate object counts with minimal annotation effort. The effectiveness of our proposed methods is rigorously demonstrated through extensive experiments conducted on both public and private datasets. By comparing our results to those achieved by state-of-the-art methods, we showcase the superior performance of our approaches in addressing several challenges in visual object counting. These contributions collectively advance the field of visual object counting by providing more accurate, efficient, and robust counting methods, opening new possibilities for automated object counting in various applications.

Divisions:	Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering
Item Type:	Thesis (PhD)
Authors:	Zgaren, Ahmed
Institution:	Concordia University
Degree Name:	Ph. D.
Program:	Information and Systems Engineering
Date:	1 April 2025
Thesis Supervisor(s):	Bouguila, Nizar and Bouachir, Wassim
ID Code:	995586
Deposited By:	Ahmed Zgaren
Deposited On:	04 Nov 2025 16:48
Last Modified:	04 Nov 2025 16:48

Repository Staff Only: item control page

Download Statistics

Downloads per month over past year

Research related to the current document (at the CORE website)

Spectrum Research Repository

Context Mining for Visual Object Counting

Context Mining for Visual Object Counting

Abstract