Login | Register

Learning to Segment with Deep Models in Low-Data Regimes

Title:

Learning to Segment with Deep Models in Low-Data Regimes

Karimi, Amin (2025) Learning to Segment with Deep Models in Low-Data Regimes. PhD thesis, Concordia University.

[thumbnail of Karimi_PhD_S2026.pdf]
Preview
Text (application/pdf)
Karimi_PhD_S2026.pdf - Accepted Version
Available under License Spectrum Terms of Access.
23MB

Abstract

This thesis addresses the challenge of few-shot semantic segmentation (FSS), aiming to achieve accurate image understanding in low-data regimes. Traditional few-shot semantic segmentation methods often struggle to generalize effectively, primarily due to the limited availability of labeled support examples. This scarcity makes it difficult to capture the full variability of object appearances, leading to poor performance in the presence of occlusions, appearance shifts, and viewpoint differences between support and query samples. To overcome these limitations, we first propose a transductive meta-learning framework that leverages an ensemble of features from pretrained classification and semantic segmentation networks. This method enhances discriminative power by capturing both high-level semantic cues and pixel-level spatial information, and introduces a two-pass correlation mechanism to improve intra-class and intra-object similarity modeling while reducing false positives — all with minimal trainable parameters.

However, despite strong performance, this approach remains limited in its ability to reason about object semantics or adapt flexibly to complex query-support discrepancies. Motivated by these challenges, we introduce a second framework that unifies visual features with semantic knowledge derived from large multimodal language models (LLMs). By generating adaptive class-specific semantic prompts using multi-modal LLMs and integrating them with dense visual correspondences between support and query samples, our model performs reasoning-driven segmentation and achieves robust generalization even in cross-domain setting. The resulting vision-language system addresses key failure cases of prior work, particularly in scenes with severe appearance variation or ambiguous context.

Extensive experiments on Pascal-5i and COCO-20i demonstrate that our proposed frameworks outperform prior methods, both in standard few-shot settings and under cross-domain evaluation. Together, these contributions represent a significant advancement in learning to segment with limited supervision, offering a path forward for more intelligent and adaptable vision systems.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Thesis (PhD)
Authors:Karimi, Amin
Institution:Concordia University
Degree Name:Ph. D.
Program:Computer Science
Date:24 November 2025
Thesis Supervisor(s):Poullis, Charalmbos
ID Code:996500
Deposited By: Amin Karimi
Deposited On:29 Jun 2026 15:33
Last Modified:30 Jun 2026 00:30
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top