Karimi, Amin (2025) Learning to Segment with Deep Models in Low-Data Regimes. PhD thesis, Concordia University.
Preview |
Text (application/pdf)
23MBKarimi_PhD_S2026.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
This thesis addresses the challenge of few-shot semantic segmentation (FSS), aiming to achieve accurate image understanding in low-data regimes. Traditional few-shot semantic segmentation methods often struggle to generalize effectively, primarily due to the limited availability of labeled support examples. This scarcity makes it difficult to capture the full variability of object appearances, leading to poor performance in the presence of occlusions, appearance shifts, and viewpoint differences between support and query samples. To overcome these limitations, we first propose a transductive meta-learning framework that leverages an ensemble of features from pretrained classification and semantic segmentation networks. This method enhances discriminative power by capturing both high-level semantic cues and pixel-level spatial information, and introduces a two-pass correlation mechanism to improve intra-class and intra-object similarity modeling while reducing false positives — all with minimal trainable parameters.
However, despite strong performance, this approach remains limited in its ability to reason about object semantics or adapt flexibly to complex query-support discrepancies. Motivated by these challenges, we introduce a second framework that unifies visual features with semantic knowledge derived from large multimodal language models (LLMs). By generating adaptive class-specific semantic prompts using multi-modal LLMs and integrating them with dense visual correspondences between support and query samples, our model performs reasoning-driven segmentation and achieves robust generalization even in cross-domain setting. The resulting vision-language system addresses key failure cases of prior work, particularly in scenes with severe appearance variation or ambiguous context.
Extensive experiments on Pascal-5i and COCO-20i demonstrate that our proposed frameworks outperform prior methods, both in standard few-shot settings and under cross-domain evaluation. Together, these contributions represent a significant advancement in learning to segment with limited supervision, offering a path forward for more intelligent and adaptable vision systems.
| Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering |
|---|---|
| Item Type: | Thesis (PhD) |
| Authors: | Karimi, Amin |
| Institution: | Concordia University |
| Degree Name: | Ph. D. |
| Program: | Computer Science |
| Date: | 24 November 2025 |
| Thesis Supervisor(s): | Poullis, Charalmbos |
| ID Code: | 996500 |
| Deposited By: | Amin Karimi |
| Deposited On: | 29 Jun 2026 15:33 |
| Last Modified: | 30 Jun 2026 00:30 |
Repository Staff Only: item control page


Download Statistics
Download Statistics