The effectiveness of a visual object tracking algorithm heavily relies on how well it represents the target object through a collection of feature templates, or channels. However, some channels lose their discriminative power during challenging video conditions such as target deformation, occlusion, and motion blur, leading to tracking failures. Discriminative Correlation Filter-based (DCF) trackers address these video challenges by aggregating hand-crafted and deep Convolutional Neural Network-based (CNN) channels. However, this approach increases the computational complexity of the tracker and significantly reduces inference speed, especially on constrained hardware such as a Central Processing Unit (CPU) or edge devices. We observe a parallel trend in end-to-end trainable deep Siamese Network-based (SN) trackers, which deploy parameter-heavy backbones for feature extraction and rely on specialized hardware such as a Graphics Processing Unit (GPU) for faster inference. In this thesis, we propose computationally efficient solutions to both DCF and SN tracking algorithms while improving their accuracy. For multi-channel DCF tracking, we present three solutions to alleviate the impact of non-discriminative features (or channels). These methods leverage the concept of reliability to quantify the discriminative power of a feature (or a channel) based on its filter response. The proposed solutions dynamically lower the weightage of unreliable features (or channels) while emphasizing the temporal smoothness of the learned weights. We formulate the process of learning adaptive weights as a convex optimization problem and derive efficient solutions to maintain tracking speed. Expanding on the lightweight SN tracking paradigm, our first algorithm, MVT, employs a cascaded arrangement of CNN and transformer blocks in its backbone. This approach fuses template and search regions during feature extraction to generate superior feature encoding for target localization. Our second tracking algorithm, a Separable Self and Mixed Attention Transformer-based tracker (SMAT), further increases the efficiency of MVT by replacing the standard attention with a computationally efficient separable attention block. Proposed trackers exhibit superior performance on eight challenging benchmarks compared to the related lightweight trackers, with SMAT emerging as the top performer. The computationally efficient architecture enables our MVT and SMAT trackers to run at real-time tracking speed on a CPU, while achieving a high speed of 150 frames-per-second on a GPU.