Login | Register

Exploring Convex Optimization and Transformer based Methods for Efficient Visual Object Tracking

Title:

Exploring Convex Optimization and Transformer based Methods for Efficient Visual Object Tracking

Yelluru Gopal, Goutam (2024) Exploring Convex Optimization and Transformer based Methods for Efficient Visual Object Tracking. PhD thesis, Concordia University.

[thumbnail of YelluruGopal_PhD_S2024.pdf]
Preview
Text (application/pdf)
YelluruGopal_PhD_S2024.pdf - Accepted Version
Available under License Spectrum Terms of Access.
22MB

Abstract

The effectiveness of a visual object tracking algorithm heavily relies on how well it represents the target object through a collection of feature templates, or channels. However, some channels lose their discriminative power during challenging video conditions such as target deformation, occlusion, and motion blur, leading to tracking failures. Discriminative Correlation Filter-based (DCF) trackers address these video challenges by aggregating hand-crafted and deep Convolutional Neural Network-based (CNN) channels. However, this approach increases the computational complexity of the tracker and significantly reduces inference speed, especially on constrained hardware such as a Central Processing Unit (CPU) or edge devices. We observe a parallel trend in end-to-end trainable deep Siamese Network-based (SN) trackers, which deploy parameter-heavy backbones for feature extraction and rely on specialized hardware such as a Graphics Processing Unit (GPU) for faster inference. In this thesis, we propose computationally efficient solutions to both DCF and SN tracking algorithms while improving their accuracy.

For multi-channel DCF tracking, we present three solutions to alleviate the impact of non-discriminative features (or channels). These methods leverage the concept of reliability to quantify the discriminative power of a feature (or a channel) based on its filter response. The proposed solutions dynamically lower the weightage of unreliable features (or channels) while emphasizing the temporal smoothness of the learned weights. We formulate the process of learning adaptive weights as a convex optimization problem and derive efficient solutions to maintain tracking speed. Expanding on the lightweight SN tracking paradigm, our first algorithm, MVT, employs a cascaded arrangement of CNN and transformer blocks in its backbone. This approach fuses template and search regions during feature extraction to generate superior feature encoding for target localization. Our second tracking algorithm, a Separable Self and Mixed Attention Transformer-based tracker (SMAT), further increases the efficiency of MVT by replacing the standard attention with a computationally efficient separable attention block. Proposed trackers exhibit superior performance on eight challenging benchmarks compared to the related lightweight trackers, with SMAT emerging as the top performer. The computationally efficient architecture enables our MVT and SMAT trackers to run at real-time tracking speed on a CPU, while achieving a high speed of 150 frames-per-second on a GPU.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Electrical and Computer Engineering
Item Type:Thesis (PhD)
Authors:Yelluru Gopal, Goutam
Institution:Concordia University
Degree Name:Ph. D.
Program:Electrical and Computer Engineering
Date:27 February 2024
Thesis Supervisor(s):Amer, Maria
ID Code:993729
Deposited By: Goutam Yelluru Gopal
Deposited On:05 Jun 2024 15:27
Last Modified:05 Jun 2024 15:27
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top