Sharma, Vasudev (2025) Investigating Zero-Shot Diagnostic Pathology in Vision-Language Models with Efficient Prompt Design. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
16MBSharma_MSc_S2025.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
Vision-Language Models (VLMs) have emerged as powerful tools in computational pathology, offering the ability to perform zero-shot diagnostic inference on gigapixel whole slide images (WSIs). However, a core challenge remains: these models exhibit high sensitivity to the linguistic structure and specificity of prompts, which can significantly impact diagnostic accuracy, reproducibility, and clinical interpretability. This thesis systematically investigates the role of prompt engineering in enhancing the diagnostic performance of VLMs in histopathology. We propose a structured prompt engineering framework that modulates four critical dimensions: anatomical precision, information density, instructional framing, and output constraints, to evaluate their effect on model behavior. Using a clinically validated in-house dataset of 3,507 digestive system WSIs spanning multiple tissue types and pathological conditions, we conduct a comprehensive evaluation of four state of the art VLMs called Biomedical Contrastive Language-Image Pre-training (BioMedCLIP), Quilt-Net, Quilt-Large Language and Vision Assistant (Quilt-LLAVA), and Contrastive Learning from captions for Histopathology (CONCH). Our methodology includes a combination of quantitative assessments using Area Under the Curve (AUC) analysis , Receiver Operating Characteristic (ROC) and qualitative analyses to understand how prompt design influences diagnostic inference, interpretability, and generalization across tissues.
Our results demonstrate that prompt formulation significantly affects model performance across the full dataset. In particular, prompts that incorporate high anatomical specificity and clear instructional framing yield consistent improvements in classification accuracy across multiple tissue sites. The study further reveals that domain aligned prompting strategies are often more effective than increase in architectural complexity, highlighting the centrality of human and AI communication in medical vision-language tasks. In addition to empirical findings, we contribute actionable guidelines for implementing VLMs in clinical computational pathology workflows, emphasizing prompt standardization and interpretability. This work shifts the emphasis from purely architectural innovation to optimizing the language mediated interface between human expertise and AI systems, thereby enhancing both diagnostic performance and clinical utility in zero-shot medical image analysis.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | Sharma, Vasudev |
Institution: | Concordia University |
Degree Name: | M. Sc. |
Program: | Computer Science |
Date: | 28 March 2025 |
Thesis Supervisor(s): | Hosseini, Dr. Mahdi |
ID Code: | 995442 |
Deposited By: | Vasudev Sharma |
Deposited On: | 17 Jun 2025 17:35 |
Last Modified: | 17 Jun 2025 17:35 |
Repository Staff Only: item control page