A challenge in objective no-reference video quality assessment (VQA) research is incorporating memory effects and long-term dependencies observed in subjective VQA studies. To address this challenge, we propose to use a stack of six bi-directional Long-Short Term Memory (LSTM) layers of different units to model temporal characteristics of video sequences. We feed this bi-directional LSTM network with spatial features extracted from video frames using pre-trained convolution neural network (CNN); we assess three pre-trained CNN, MobileNet, ResNet-50 and Inception-ResNet-V2, as feature extractors and select ResNet-50 since it showed the best performance. In this thesis, we assess the stability of our VQA method and conduct an ablation study to highlight the importance of the bi-directional LSTM layers. Furthermore, we compare the performance of the proposed method with state-of-the-art VQA methods on three publicly available datasets, KoNVid-1K, LIVE-Qualcomm, and CVD2014; these experiments, using same set of parameters, demonstrate that our method outperforms these VQA methods by a significant margin in terms of Spearman’s Rank-Order Correlation Coefficient (SROCC), Pearson’s Linear Correlation Coefficient (PLCC), and Root Mean Square Error (RMSE).