Login | Register

Depth and Segmentation Aware frameworks for Multiple Object Tracking

Title:

Depth and Segmentation Aware frameworks for Multiple Object Tracking

Khanchi, Milad (2025) Depth and Segmentation Aware frameworks for Multiple Object Tracking. Masters thesis, Concordia University.

[thumbnail of Khanchi_MA_F2025.pdf]
Preview
Text (application/pdf)
Khanchi_MA_F2025.pdf - Accepted Version
Available under License Spectrum Terms of Access.
11MB

Abstract

Multi-Object Tracking (MOT) remains a challenging problem, particularly in crowded scenes with occlusion, appearance ambiguity, and non-linear motion. Conventional MOT frameworks often rely on appearance-based Re-Identification (Re-ID) and Intersection-over-Union (IoU) of object bounding boxes for object association. However, these cues become unreliable when objects are visually similar or overlapping, and computing pixel-level IoU for segmentation masks can be computationally expensive.

In this thesis, we propose two complementary MOT frameworks that incorporate monocular depth and segmentation cues to improve robustness in association. The first zero-shot depth-aware framework is training-free and introduces a Hierarchical Alignment Score (HAS), a novel metric that combines coarse bounding box IoU with fine-grained mask-level IoU using promptable segmentation. This hierarchical formulation improves matching precision in cluttered or occluded scenes.

The second framework avoids computing segmentation IoU altogether. Instead, it leverages a self-supervised encoder to fuse and refine depth-segmentation features into temporally stable embeddings, which are then used as an additional similarity signal in the association process. This reduces computational overhead while improving robustness to noise and appearance variation.

Both approaches operate under the efficient Tracking-by-Detection (TBD) paradigm and extend conventional 2D association strategies with spatially expressive cues. Evaluations on DanceTrack and SportsMOT benchmarks with non-linear motion demonstrate competitive performance, highlighting the utility of depth and segmentation as underutilized, yet powerful, cues for robust non-linear MOT.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Electrical and Computer Engineering
Item Type:Thesis (Masters)
Authors:Khanchi, Milad
Institution:Concordia University
Degree Name:M.A. Sc.
Program:Electrical and Computer Engineering
Date:20 June 2025
Thesis Supervisor(s):Amer, Maria and Poullis, Charalambos
ID Code:995666
Deposited By: Milad Khanchi
Deposited On:04 Nov 2025 16:09
Last Modified:04 Nov 2025 16:09
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top