Datta, Shamita (2025) Fine-Tuning CLIP for Security Object Classification and Detection in X-ray Images. Masters thesis, Concordia University.
Text (application/pdf)
3MBDatta_MSc_F2025.pdf - Accepted Version Restricted to Repository staff only until 1 November 2027. Available under License Spectrum Terms of Access. |
Abstract
Security X-ray imaging is a vital tool for detecting threats and ensuring public safety. In modern security systems, computer vision and deep learning have driven major advances, particularly in object classification and detection. However, the diversity of threat objects and the limited availability of high-quality labeled X-ray data pose persistent challenges for conventional detectors. Emerging vision–language models, such as CLIP, provide a promising direction by enabling few-shot classification and detection, reducing dependence on large annotated datasets.
CLIP’s strength lies in its zero-shot capability to perform classification using textual labels as
classifiers without task-specific fine-tuning. It associates image and text features through shared representations but performs suboptimally on security X-rays, where domain-specific patterns are absent from pre-training. To overcome this limitation, we explore few-shot adaptation techniques that allow CLIP to specialize in the X-ray domain with minimal supervision, leveraging its pre-trained visual–textual foundations while introducing lightweight domain-specific fine-tuning.
This thesis investigates CLIP-based adaptation strategies for both classification and detection in X-ray imagery. For detection, CLIP is integrated with a region proposal network using Faster R-CNN to localize prohibited items, followed by fine-tuned CLIP for label assignment.
Subsequent chapters present three adaptation strategies, adapter-based fine-tuning, full model fine-tuning, and LoRA-based fine -tuning in a few-shot setting. Through systematic evaluation on benchmark X-ray datasets, we demonstrate how CLIP’s vision–language pretraining can be effectively adapted to specialized security data, achieving strong performance even with limited samples, and compare these methods using classification accuracy and average precision for classification and detection.
| Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering |
|---|---|
| Item Type: | Thesis (Masters) |
| Authors: | Datta, Shamita |
| Institution: | Concordia University |
| Degree Name: | M. Sc. |
| Program: | Computer Science |
| Date: | 3 November 2025 |
| Thesis Supervisor(s): | Wang, Yang and Zuo, Xinxin |
| ID Code: | 996418 |
| Deposited By: | Shamita Datta |
| Deposited On: | 29 Jun 2026 14:55 |
| Last Modified: | 29 Jun 2026 14:55 |
Repository Staff Only: item control page


Download Statistics
Download Statistics