Login | Register

Pre-trained CNN and bi-directional LSTM for no-reference video quality assessment

Title:

Pre-trained CNN and bi-directional LSTM for no-reference video quality assessment

Duoduaah, Doreen (2022) Pre-trained CNN and bi-directional LSTM for no-reference video quality assessment. Masters thesis, Concordia University.

[thumbnail of Duoduaah_MASc_F2022.pdf]
Text (application/pdf)
Duoduaah_MASc_F2022.pdf - Accepted Version
Restricted to Repository staff only until 31 August 2023.
Available under License Spectrum Terms of Access.
1MB

Abstract

A challenge in objective no-reference video quality assessment (VQA) research is incorporating memory effects and long-term dependencies observed in subjective VQA studies. To address this challenge, we propose to use a stack of six bi-directional Long-Short Term Memory (LSTM) layers of different units to model temporal characteristics of video sequences. We feed this bi-directional LSTM network with spatial features extracted from video frames using pre-trained convolution neural network (CNN); we assess three pre-trained CNN, MobileNet, ResNet-50 and Inception-ResNet-V2, as feature extractors and select ResNet-50 since it showed the best performance. In this thesis, we assess the stability of our VQA method and conduct an ablation study to highlight the importance of the bi-directional LSTM layers. Furthermore, we compare the performance of the proposed method with state-of-the-art VQA methods on three publicly available datasets, KoNVid-1K, LIVE-Qualcomm, and CVD2014; these experiments, using same set of parameters, demonstrate that our method outperforms these VQA methods by a significant margin in terms of Spearman’s Rank-Order Correlation Coefficient (SROCC), Pearson’s Linear Correlation Coefficient (PLCC), and Root Mean Square Error (RMSE).

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Electrical and Computer Engineering
Item Type:Thesis (Masters)
Authors:Duoduaah, Doreen
Institution:Concordia University
Degree Name:M.A. Sc.
Program:Electrical and Computer Engineering
Date:24 June 2022
Thesis Supervisor(s):Amer, Maria
Keywords:Video quality assessment, pre-trained CNN, transfer learning, bi-directional LSTM, long-term dependencies, deep spatial and temporal features.
ID Code:990683
Deposited By: Doreen Duoduaah
Deposited On:27 Oct 2022 14:30
Last Modified:27 Oct 2022 14:30

References:

[1] M. Agarla, L. Celona, and R. Schettini. “An efficient method for no-reference video quality assessment”. In: Journal of Imaging 7.3 (2021), p. 55.
[2] S. Ahn and S. Lee. “Deep blind video quality assessment based on temporal hu- man perception”. In: 25th IEEE International Conference on Image Processing (ICIP). 2018, pp. 619–623.
[3] L. Ali et al. “Performance evaluation of deep CNN-based crack detection and localization techniques for concrete structures”. In: Sensors 21.5 (2021), p. 1688.
[4] J. Benesty et al. “Pearson correlation coefficient”. In: Noise reduction in speech processing. Springer, 2009, pp. 1–4.
[5] Y. Bengio, P. Simard, and P. Frasconi. “Learning long-term dependencies with gradient descent is difficult”. In: IEEE transactions on neural networks 5.2 (1994), pp. 157–166.
[6] H. Boujut et al. “No-reference video quality assessment of H. 264 video streams based on semantic saliency maps”. In: IS&T/SPIE Electronic Imaging. Vol. 8293. 2012, pp. 8293–28.
[7] T. Brandao and M.P. Queluz. “No-reference quality assessment of H. 264/AVC encoded video”. In: IEEE Transactions on Circuits and Systems for Video Tech- nology 20.11 (2010), pp. 1437–1447.
[8] R. Cahuantzi, X. Chen, and S. Gu ̈ttel. “A comparison of LSTM and GRU networks for learning symbolic sequences”. In: arXiv preprint arXiv:2107.02248 (2021).
[9] F. Chollet et al. Keras. https://keras.io. 2015.
[10] Z.L. Chu, T.J. Liu, and K.H. Liu. “No-Reference Video Quality Assessment by
A Cascade Combination of Neural Networks and Regression Model”. In: IEEE International Conference on Systems, Man, and Cybernetics (SMC). 2020, pp. 4116– 4121.
48
[11] J. Deng et al. “Imagenet: A large-scale hierarchical image database”. In: IEEE conference on computer vision and pattern recognition. 2009, pp. 248–255.
[12] M. Dimitrievski and Z. Ivanovski. “No-reference quality assessment of highly compressed video sequences”. In: IEEE 15th International Workshop on Mul- timedia Signal Processing (MMSP). 2013, pp. 266–271.
[13] J. Donahue et al. “Decaf: A deep convolutional activation feature for generic visual recognition”. In: International conference on machine learning. PMLR. 2014, pp. 647–655.
[14] Q. Dou et al. “Automatic detection of cerebral microbleeds from MR images via 3D convolutional neural networks”. In: IEEE transactions on medical imaging 35.5 (2016), pp. 1182–1195.
[15] J. Duchi, E. Hazan, and Y. Singer. “Adaptive subgradient methods for online learning and stochastic optimization.” In: Journal of machine learning research 12.7 (2011).
[16] V. Frants et al. “Blind visual quality assessment for smart cloud-based video storage”. In: IEEE International Conference on Smart Cloud (SmartCloud). 2018, pp. 171–174.
[17] L. Gatys, A. Ecker, and M. Bethge. “A Neural Algorithm of Artistic Style”. In: Journal of Vision 16.12 (2016), pp. 326–326.
[18] A.G ́eron.Hands-onmachinelearningwithScikit-Learn,Keras,andTensor- Flow: Concepts, tools, and techniques to build intelligent systems. ” O’Reilly Media, Inc.”, 2019.
[19] D. Ghadiyaram et al. “In-capture mobile video distortions: A study of subjective behavior and objective algorithms”. In: IEEE Transactions on Circuits and Systems for Video Technology 28.9 (2017), pp. 2061–2077.
[20] A. Graves et al. “A novel connectionist system for unconstrained handwriting recognition”. In: IEEE transactions on pattern analysis and machine intelli- gence 31.5 (2008), pp. 855–868.
[21] A. Graves and N. Jaitly. “Towards end-to-end speech recognition with recurrent neural networks”. In: International conference on machine learning. PMLR. 2014, pp. 1764–1772.
[22] Z. Guan et al. “A novel objective quality assessment method for video confer- encing coding”. In: China Communications 16.4 (2019), pp. 89–104.
49

[23] J. Han and C. Moraga. “The influence of the sigmoid function parameters on the speed of backpropagation learning”. In: International workshop on artificial neural networks. Springer. 1995, pp. 195–201.
[24] K. He et al. “Deep residual learning for image recognition”. In: IEEE conference on computer vision and pattern recognition. 2016, pp. 770–778.
[25] S. Hochreiter and J. Schmidhuber. “Long short-term memory”. In: Neural com- putation 9.8 (1997), pp. 1735–1780.
[26] V. Hosu et al. “The Konstanz natural video database (KoNViD-1k)”. In: IEEE Ninth international conference on quality of multimedia experience (QoMEX). 2017.
[27] A.G. Howard et al. “Mobilenets: Efficient convolutional neural networks for mobile vision applications”. In: arXiv preprint arXiv:1704.04861 (2017).
[28] B. Karlik and A. V. Olgac. “Performance analysis of various activation functions in generalized MLP architectures of neural networks”. In: International Journal of Artificial Intelligence and Expert Systems 1.4 (2011), pp. 111–122.
[29] A. Karpathy et al. “Large-scale video classification with convolutional neural networks”. In: IEEE conference on Computer Vision and Pattern Recognition. 2014, pp. 1725–1732.
[30] D.P Kingma and J. Ba. “Adam: A Method for Stochastic Optimization”. In: International Conference on Learning Representations (Poster). 2015.
[31] J. Korhonen. “Two-level approach for no-reference consumer video quality as- sessment”. In: IEEE Transactions on Image Processing 28.12 (2019), pp. 5923– 5938.
[32] A Krizhevsky. “Learning Multiple Layers of Features from Tiny Images”. In: Master’s thesis, University of Tront (2009).
[33] A. Krizhevsky, I. Sutskever, and G.E. Hinton. “Imagenet classification with deep convolutional neural networks”. In: Advances in neural information pro- cessing systems 25 (2012).
[34] Y. LeCun, Y. Bengio, and G. Hinton. “Deep learning”. In: nature 521.7553 (2015), pp. 436–444.
[35] D. Li, T. Jiang, and M. Jiang. “Quality assessment of in-the-wild videos”. In: 27th ACM International Conference on Multimedia. 2019, pp. 2351–2359.
50

[36] D. Li, T. Jiang, and M. Jiang. “Recent advances and challenges in video quality assessment”. In: ZTE Communications 17.1 (2019), pp. 3–11.
[37] L. Li et al. “Hyperband: A novel bandit-based approach to hyperparameter opti- mization”. In: The Journal of Machine Learning Research 18.1 (2017), pp. 6765– 6816.
[38] R. Li, B. Zeng, and M.L. Liou. “A new three-step search algorithm for block motion estimation”. In: IEEE transactions on circuits and systems for video technology 4.4 (1994), pp. 438–442.
[39] S. Li et al. “Image quality assessment by separately evaluating detail losses and additive impairments”. In: IEEE Transactions on Multimedia 13.5 (2011), pp. 935–949.
[40] Y. Li et al. “Video quality assessment with deep architecture”. In: IEEE In- ternational Conference on Artificial Intelligence and Computer Applications (ICAICA). 2021, pp. 268–271.
[41] Z. Li et al. “Toward a practical perceptual video quality metric”. In: The Netflix Tech Blog 6.2 (2016).
[42] T. Lin et al. “Microsoft COCO: Common objects in context”. In: European conference on computer vision. Springer. 2014, pp. 740–755.
[43] W. Lu et al. “A spatiotemporal model of video quality assessment via 3D gra- dient differencing”. In: Information Sciences 478 (2019), pp. 141–151.
[44] W.S. McCulloch and W. Pitts. “A logical calculus of the ideas immanent in ner- vous activity”. In: The bulletin of mathematical biophysics 5.4 (1943), pp. 115– 133.
[45] L. Mou, Y. Hua, and X. Zhu. “Relation matters: Relational context-aware fully convolutional network for semantic segmentation of high-resolution aerial im- ages”. In: IEEE Transactions on Geoscience and Remote Sensing 58.11 (2020), pp. 7557–7569.
[46] V. Nair and G.E. Hinton. “Rectified linear units improve restricted boltzmann machines”. In: Icml. 2010.
[47] Y. Nesterov. “A method for unconstrained convex minimization problem with the rate of convergence O (1/kˆ 2)”. In: Doklady an ussr. Vol. 269. 1983, pp. 543– 547.
51

[48] M. Nuutinen et al. “CVD2014 — A database for evaluating no-reference video quality assessment algorithms”. In: IEEE Transactions on Image Processing 25.7 (2016), pp. 3073–3086.
[49] T. O’Malley et al. Keras Tuner. https://github.com/keras-team/keras- tuner. 2019.
[50] I. Oksuz et al. “Deep learning-based detection and correction of cardiac MR motion artefacts during reconstruction for high-quality segmentation”. In: IEEE Transactions on Medical Imaging 39.12 (2020), pp. 4001–4010.
[51] S. Pang et al. “Spineparsenet: spine parsing for volumetric MR image by a two- stage segmentation framework with semantic image representation”. In: IEEE Transactions on Medical Imaging 40.1 (2020), pp. 262–273.
[52] J. Park et al. “Video quality pooling adaptive to perceptual distortion severity”. In: IEEE Transactions on Image Processing 22.2 (2012), pp. 610–620.
[53] R. Pascanu, T. Mikolov, and Y. Bengio. “On the difficulty of training recurrent neural networks”. In: International conference on machine learning. PMLR. 2013, pp. 1310–1318.
[54] A. Paszke et al. “Pytorch: An imperative style, high-performance deep learning library”. In: Advances in neural information processing systems 32 (2019).
[55] M.H. Pinson and S. Wolf. “A new standardized method for objectively measur- ing video quality”. In: IEEE Transactions on broadcasting 50.3 (2004), pp. 312– 322.
[56] S. Rimac-Drlje, M. Vranjes, and D. Zagar. “Influence of temporal pooling method on the objective video quality evaluation”. In: IEEE International Sym- posium on Broadband Multimedia Systems and Broadcasting. 2009, pp. 1–5.
[57] D.E. Rumelhart, G.E. Hinton, and R.J. Williams. Learning internal represen- tations by error propagation. Tech. rep. California Univ San Diego La Jolla Inst for Cognitive Science, 1985.
[58] M.A. Saad, A.C. Bovik, and C. Charrier. “Blind prediction of natural video quality”. In: IEEE Transactions on Image Processing 23.3 (2014), pp. 1352– 1365.
[59] H. Sak, A. Senior, and F. Beaufays. “Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition”. In: arXiv preprint arXiv:1402.1128 (2014).
52

[60] A.L. Samuel. “Some studies in machine learning using the game of checkers. II—Recent progress”. In: IBM Journal of research and development 11.6 (1967), pp. 601–617.
[61] M.J. Scott et al. “Do personality and culture influence perceived video quality and enjoyment?” In: IEEE Transactions on Multimedia 18.9 (2016), pp. 1796– 1807.
[62] K. Seshadrinathan and A.C. Bovik. “Motion tuned spatio-temporal quality as- sessment of natural videos”. In: IEEE transactions on image processing 19.2 (2009), pp. 335–350.
[63] K. Seshadrinathan and A.C. Bovik. “Temporal hysteresis model of time varying subjective video quality”. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). 2011, pp. 1153–1156.
[64] H.O. Shahreza, A. Amini, and H. Behroozi. “No-reference video quality assess- ment using recurrent neural networks”. In: IEEE 5th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS). 2019, pp. 1–5.
[65] H.R. Sheikh and A.C. Bovik. “Image information and visual quality”. In: IEEE Transactions on image processing 15.2 (2006), pp. 430–444.
[66] D.J. Sheskin. “Spearman’s rank-order correlation coefficient”. In: Handbook of parametric and nonparametric statistical procedures 1353 (2007).
[67] M. Shi, K. Wang, and C. Li. “A C-LSTM with word embedding model for news text classification”. In: IEEE/ACIS 18th International Conference on Computer and Information Science (ICIS). 2019, pp. 253–257.
[68] K. Simonyan, A. Vedaldi, and A. Zisserman. “Deep inside convolutional net- works: visualising image classification models and saliency maps”. In: Interna- tional Conference on Learning Representations. 2014.
[69] K. Simonyan and A. Zisserman. “Very deep convolutional networks for large- scale image recognition”. In: arXiv preprint arXiv:1409.1556 (2014).
[70] R. Soundararajan and A.C. Bovik. “Video quality assessment by reduced refer- ence spatio-temporal entropic differencing”. In: IEEE Transactions on Circuits and Systems for Video Technology 23.4 (2012), pp. 684–694.
[71] C. Szegedy et al. “Going deeper with convolutions”. In: IEEE conference on computer vision and pattern recognition. 2015, pp. 1–9.
53

[72] C. Szegedy et al. “Inception-v4, inception-resnet and the impact of residual con- nections on learning”. In: Thirty-first AAAI conference on artificial intelligence. 2017.
[73] C. Szegedy et al. “Rethinking the inception architecture for computer vision”. In: IEEE conference on computer vision and pattern recognition. 2016, pp. 2818– 2826.
[74] B. Thomee et al. “The new data and new challenges in multimedia research”. In: arXiv preprint arXiv:1503.01817 1.8 (2015).
[75] T. Tominaga et al. “Performance comparisons of subjective quality assessment methods for mobile video”. In: IEEE Second international workshop on quality of multimedia experience (QoMEX). 2010, pp. 82–87.
[76] Z. Tu et al. “A comparative evaluation of temporal pooling methods for blind video quality assessment”. In: IEEE International Conference on Image Pro- cessing (ICIP). 2020, pp. 141–145.
[77] D. Varga. “No-reference video quality assessment based on the temporal pooling of deep features”. In: Neural Processing Letters 50.3 (2019), pp. 2595–2608.
[78] P.V. Vu, C.T. Vu, and D.M. Chandler. “A spatiotemporal most-apparent- distortion model for video quality assessment”. In: 18th IEEE International Conference on Image Processing. 2011, pp. 2505–2508.
[79] C. Wang, L. Su, and W. Zhang. “COME for no-reference video quality as- sessment”. In: IEEE Conference on Multimedia Information Processing and Retrieval (MIPR). 2018, pp. 232–237.
[80] Z. Wang et al. “Image quality assessment: from error visibility to structural similarity”. In: IEEE transactions on image processing 13.4 (2004), pp. 600– 612.
[81] J. Xu et al. “No-reference video quality assessment via feature learning”. In: IEEE international conference on image processing (ICIP). 2014, pp. 491–495.
[82] F. Yang et al. “No-reference quality assessment for networked video via primary analysis of bit stream”. In: IEEE Transactions on Circuits and Systems for Video Technology 20.11 (2010), pp. 1544–1554.
[83] F. Yi et al. “Attention Based Network For No-Reference UGC Video Quality Assessment”. In: IEEE International Conference on Image Processing (ICIP). 2021, pp. 1414–1418.
54

[84] J. You and J. Korhonen. “Deep neural networks for no-reference video quality assessment”. In: IEEE International Conference on Image Processing (ICIP). 2019, pp. 2349–2353.
[85] C. Zhang and J. Kim. “Object detection with location-aware deformable convo- lution and backward attention filtering”. In: IEEE/CVF Conference on Com- puter Vision and Pattern Recognition. 2019, pp. 9452–9461.
[86] Y. Zhang et al. “Blind video quality assessment with weakly supervised learning and resampling strategy”. In: IEEE Transactions on Circuits and Systems for Video Technology 29.8 (2018), pp. 2244–2255.
[87] Y. Zhang, J. Lu, and J. Zhou. “Objects are different: Flexible monocular 3D object detection”. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, pp. 3289–3298.
[88] W. Zhou and Z. Chen. “Deep local and global spatiotemporal feature aggrega- tion for blind video quality assessment”. In: IEEE International Conference on Visual Communications and Image Processing (VCIP). 2020, pp. 338–341.
[89] K. Zhu et al. “A no-reference video quality assessment based on laplacian pyra- mids”. In: IEEE International Conference on Image Processing. 2013, pp. 49– 53.
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top