[1] M. Agarla, L. Celona, and R. Schettini. “An efficient method for no-reference video quality assessment”. In: Journal of Imaging 7.3 (2021), p. 55. [2] S. Ahn and S. Lee. “Deep blind video quality assessment based on temporal hu- man perception”. In: 25th IEEE International Conference on Image Processing (ICIP). 2018, pp. 619–623. [3] L. Ali et al. “Performance evaluation of deep CNN-based crack detection and localization techniques for concrete structures”. In: Sensors 21.5 (2021), p. 1688. [4] J. Benesty et al. “Pearson correlation coefficient”. In: Noise reduction in speech processing. Springer, 2009, pp. 1–4. [5] Y. Bengio, P. Simard, and P. Frasconi. “Learning long-term dependencies with gradient descent is difficult”. In: IEEE transactions on neural networks 5.2 (1994), pp. 157–166. [6] H. Boujut et al. “No-reference video quality assessment of H. 264 video streams based on semantic saliency maps”. In: IS&T/SPIE Electronic Imaging. Vol. 8293. 2012, pp. 8293–28. [7] T. Brandao and M.P. Queluz. “No-reference quality assessment of H. 264/AVC encoded video”. In: IEEE Transactions on Circuits and Systems for Video Tech- nology 20.11 (2010), pp. 1437–1447. [8] R. Cahuantzi, X. Chen, and S. Gu ̈ttel. “A comparison of LSTM and GRU networks for learning symbolic sequences”. In: arXiv preprint arXiv:2107.02248 (2021). [9] F. Chollet et al. Keras. https://keras.io. 2015. [10] Z.L. Chu, T.J. Liu, and K.H. Liu. “No-Reference Video Quality Assessment by A Cascade Combination of Neural Networks and Regression Model”. In: IEEE International Conference on Systems, Man, and Cybernetics (SMC). 2020, pp. 4116– 4121. 48 [11] J. Deng et al. “Imagenet: A large-scale hierarchical image database”. In: IEEE conference on computer vision and pattern recognition. 2009, pp. 248–255. [12] M. Dimitrievski and Z. Ivanovski. “No-reference quality assessment of highly compressed video sequences”. In: IEEE 15th International Workshop on Mul- timedia Signal Processing (MMSP). 2013, pp. 266–271. [13] J. Donahue et al. “Decaf: A deep convolutional activation feature for generic visual recognition”. In: International conference on machine learning. PMLR. 2014, pp. 647–655. [14] Q. Dou et al. “Automatic detection of cerebral microbleeds from MR images via 3D convolutional neural networks”. In: IEEE transactions on medical imaging 35.5 (2016), pp. 1182–1195. [15] J. Duchi, E. Hazan, and Y. Singer. “Adaptive subgradient methods for online learning and stochastic optimization.” In: Journal of machine learning research 12.7 (2011). [16] V. Frants et al. “Blind visual quality assessment for smart cloud-based video storage”. In: IEEE International Conference on Smart Cloud (SmartCloud). 2018, pp. 171–174. [17] L. Gatys, A. Ecker, and M. Bethge. “A Neural Algorithm of Artistic Style”. In: Journal of Vision 16.12 (2016), pp. 326–326. [18] A.G ́eron.Hands-onmachinelearningwithScikit-Learn,Keras,andTensor- Flow: Concepts, tools, and techniques to build intelligent systems. ” O’Reilly Media, Inc.”, 2019. [19] D. Ghadiyaram et al. “In-capture mobile video distortions: A study of subjective behavior and objective algorithms”. In: IEEE Transactions on Circuits and Systems for Video Technology 28.9 (2017), pp. 2061–2077. [20] A. Graves et al. “A novel connectionist system for unconstrained handwriting recognition”. In: IEEE transactions on pattern analysis and machine intelli- gence 31.5 (2008), pp. 855–868. [21] A. Graves and N. Jaitly. “Towards end-to-end speech recognition with recurrent neural networks”. In: International conference on machine learning. PMLR. 2014, pp. 1764–1772. [22] Z. Guan et al. “A novel objective quality assessment method for video confer- encing coding”. In: China Communications 16.4 (2019), pp. 89–104. 49 [23] J. Han and C. Moraga. “The influence of the sigmoid function parameters on the speed of backpropagation learning”. In: International workshop on artificial neural networks. Springer. 1995, pp. 195–201. [24] K. He et al. “Deep residual learning for image recognition”. In: IEEE conference on computer vision and pattern recognition. 2016, pp. 770–778. [25] S. Hochreiter and J. Schmidhuber. “Long short-term memory”. In: Neural com- putation 9.8 (1997), pp. 1735–1780. [26] V. Hosu et al. “The Konstanz natural video database (KoNViD-1k)”. In: IEEE Ninth international conference on quality of multimedia experience (QoMEX). 2017. [27] A.G. Howard et al. “Mobilenets: Efficient convolutional neural networks for mobile vision applications”. In: arXiv preprint arXiv:1704.04861 (2017). [28] B. Karlik and A. V. Olgac. “Performance analysis of various activation functions in generalized MLP architectures of neural networks”. In: International Journal of Artificial Intelligence and Expert Systems 1.4 (2011), pp. 111–122. [29] A. Karpathy et al. “Large-scale video classification with convolutional neural networks”. In: IEEE conference on Computer Vision and Pattern Recognition. 2014, pp. 1725–1732. [30] D.P Kingma and J. Ba. “Adam: A Method for Stochastic Optimization”. In: International Conference on Learning Representations (Poster). 2015. [31] J. Korhonen. “Two-level approach for no-reference consumer video quality as- sessment”. In: IEEE Transactions on Image Processing 28.12 (2019), pp. 5923– 5938. [32] A Krizhevsky. “Learning Multiple Layers of Features from Tiny Images”. In: Master’s thesis, University of Tront (2009). [33] A. Krizhevsky, I. Sutskever, and G.E. Hinton. “Imagenet classification with deep convolutional neural networks”. In: Advances in neural information pro- cessing systems 25 (2012). [34] Y. LeCun, Y. Bengio, and G. Hinton. “Deep learning”. In: nature 521.7553 (2015), pp. 436–444. [35] D. Li, T. Jiang, and M. Jiang. “Quality assessment of in-the-wild videos”. In: 27th ACM International Conference on Multimedia. 2019, pp. 2351–2359. 50 [36] D. Li, T. Jiang, and M. Jiang. “Recent advances and challenges in video quality assessment”. In: ZTE Communications 17.1 (2019), pp. 3–11. [37] L. Li et al. “Hyperband: A novel bandit-based approach to hyperparameter opti- mization”. In: The Journal of Machine Learning Research 18.1 (2017), pp. 6765– 6816. [38] R. Li, B. Zeng, and M.L. Liou. “A new three-step search algorithm for block motion estimation”. In: IEEE transactions on circuits and systems for video technology 4.4 (1994), pp. 438–442. [39] S. Li et al. “Image quality assessment by separately evaluating detail losses and additive impairments”. In: IEEE Transactions on Multimedia 13.5 (2011), pp. 935–949. [40] Y. Li et al. “Video quality assessment with deep architecture”. In: IEEE In- ternational Conference on Artificial Intelligence and Computer Applications (ICAICA). 2021, pp. 268–271. [41] Z. Li et al. “Toward a practical perceptual video quality metric”. In: The Netflix Tech Blog 6.2 (2016). [42] T. Lin et al. “Microsoft COCO: Common objects in context”. In: European conference on computer vision. Springer. 2014, pp. 740–755. [43] W. Lu et al. “A spatiotemporal model of video quality assessment via 3D gra- dient differencing”. In: Information Sciences 478 (2019), pp. 141–151. [44] W.S. McCulloch and W. Pitts. “A logical calculus of the ideas immanent in ner- vous activity”. In: The bulletin of mathematical biophysics 5.4 (1943), pp. 115– 133. [45] L. Mou, Y. Hua, and X. Zhu. “Relation matters: Relational context-aware fully convolutional network for semantic segmentation of high-resolution aerial im- ages”. In: IEEE Transactions on Geoscience and Remote Sensing 58.11 (2020), pp. 7557–7569. [46] V. Nair and G.E. Hinton. “Rectified linear units improve restricted boltzmann machines”. In: Icml. 2010. [47] Y. Nesterov. “A method for unconstrained convex minimization problem with the rate of convergence O (1/kˆ 2)”. In: Doklady an ussr. Vol. 269. 1983, pp. 543– 547. 51 [48] M. Nuutinen et al. “CVD2014 — A database for evaluating no-reference video quality assessment algorithms”. In: IEEE Transactions on Image Processing 25.7 (2016), pp. 3073–3086. [49] T. O’Malley et al. Keras Tuner. https://github.com/keras-team/keras- tuner. 2019. [50] I. Oksuz et al. “Deep learning-based detection and correction of cardiac MR motion artefacts during reconstruction for high-quality segmentation”. In: IEEE Transactions on Medical Imaging 39.12 (2020), pp. 4001–4010. [51] S. Pang et al. “Spineparsenet: spine parsing for volumetric MR image by a two- stage segmentation framework with semantic image representation”. In: IEEE Transactions on Medical Imaging 40.1 (2020), pp. 262–273. [52] J. Park et al. “Video quality pooling adaptive to perceptual distortion severity”. In: IEEE Transactions on Image Processing 22.2 (2012), pp. 610–620. [53] R. Pascanu, T. Mikolov, and Y. Bengio. “On the difficulty of training recurrent neural networks”. In: International conference on machine learning. PMLR. 2013, pp. 1310–1318. [54] A. Paszke et al. “Pytorch: An imperative style, high-performance deep learning library”. In: Advances in neural information processing systems 32 (2019). [55] M.H. Pinson and S. Wolf. “A new standardized method for objectively measur- ing video quality”. In: IEEE Transactions on broadcasting 50.3 (2004), pp. 312– 322. [56] S. Rimac-Drlje, M. Vranjes, and D. Zagar. “Influence of temporal pooling method on the objective video quality evaluation”. In: IEEE International Sym- posium on Broadband Multimedia Systems and Broadcasting. 2009, pp. 1–5. [57] D.E. Rumelhart, G.E. Hinton, and R.J. Williams. Learning internal represen- tations by error propagation. Tech. rep. California Univ San Diego La Jolla Inst for Cognitive Science, 1985. [58] M.A. Saad, A.C. Bovik, and C. Charrier. “Blind prediction of natural video quality”. In: IEEE Transactions on Image Processing 23.3 (2014), pp. 1352– 1365. [59] H. Sak, A. Senior, and F. Beaufays. “Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition”. In: arXiv preprint arXiv:1402.1128 (2014). 52 [60] A.L. Samuel. “Some studies in machine learning using the game of checkers. II—Recent progress”. In: IBM Journal of research and development 11.6 (1967), pp. 601–617. [61] M.J. Scott et al. “Do personality and culture influence perceived video quality and enjoyment?” In: IEEE Transactions on Multimedia 18.9 (2016), pp. 1796– 1807. [62] K. Seshadrinathan and A.C. Bovik. “Motion tuned spatio-temporal quality as- sessment of natural videos”. In: IEEE transactions on image processing 19.2 (2009), pp. 335–350. [63] K. Seshadrinathan and A.C. Bovik. “Temporal hysteresis model of time varying subjective video quality”. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). 2011, pp. 1153–1156. [64] H.O. Shahreza, A. Amini, and H. Behroozi. “No-reference video quality assess- ment using recurrent neural networks”. In: IEEE 5th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS). 2019, pp. 1–5. [65] H.R. Sheikh and A.C. Bovik. “Image information and visual quality”. In: IEEE Transactions on image processing 15.2 (2006), pp. 430–444. [66] D.J. Sheskin. “Spearman’s rank-order correlation coefficient”. In: Handbook of parametric and nonparametric statistical procedures 1353 (2007). [67] M. Shi, K. Wang, and C. Li. “A C-LSTM with word embedding model for news text classification”. In: IEEE/ACIS 18th International Conference on Computer and Information Science (ICIS). 2019, pp. 253–257. [68] K. Simonyan, A. Vedaldi, and A. Zisserman. “Deep inside convolutional net- works: visualising image classification models and saliency maps”. In: Interna- tional Conference on Learning Representations. 2014. [69] K. Simonyan and A. Zisserman. “Very deep convolutional networks for large- scale image recognition”. In: arXiv preprint arXiv:1409.1556 (2014). [70] R. Soundararajan and A.C. Bovik. “Video quality assessment by reduced refer- ence spatio-temporal entropic differencing”. In: IEEE Transactions on Circuits and Systems for Video Technology 23.4 (2012), pp. 684–694. [71] C. Szegedy et al. “Going deeper with convolutions”. In: IEEE conference on computer vision and pattern recognition. 2015, pp. 1–9. 53 [72] C. Szegedy et al. “Inception-v4, inception-resnet and the impact of residual con- nections on learning”. In: Thirty-first AAAI conference on artificial intelligence. 2017. [73] C. Szegedy et al. “Rethinking the inception architecture for computer vision”. In: IEEE conference on computer vision and pattern recognition. 2016, pp. 2818– 2826. [74] B. Thomee et al. “The new data and new challenges in multimedia research”. In: arXiv preprint arXiv:1503.01817 1.8 (2015). [75] T. Tominaga et al. “Performance comparisons of subjective quality assessment methods for mobile video”. In: IEEE Second international workshop on quality of multimedia experience (QoMEX). 2010, pp. 82–87. [76] Z. Tu et al. “A comparative evaluation of temporal pooling methods for blind video quality assessment”. In: IEEE International Conference on Image Pro- cessing (ICIP). 2020, pp. 141–145. [77] D. Varga. “No-reference video quality assessment based on the temporal pooling of deep features”. In: Neural Processing Letters 50.3 (2019), pp. 2595–2608. [78] P.V. Vu, C.T. Vu, and D.M. Chandler. “A spatiotemporal most-apparent- distortion model for video quality assessment”. In: 18th IEEE International Conference on Image Processing. 2011, pp. 2505–2508. [79] C. Wang, L. Su, and W. Zhang. “COME for no-reference video quality as- sessment”. In: IEEE Conference on Multimedia Information Processing and Retrieval (MIPR). 2018, pp. 232–237. [80] Z. Wang et al. “Image quality assessment: from error visibility to structural similarity”. In: IEEE transactions on image processing 13.4 (2004), pp. 600– 612. [81] J. Xu et al. “No-reference video quality assessment via feature learning”. In: IEEE international conference on image processing (ICIP). 2014, pp. 491–495. [82] F. Yang et al. “No-reference quality assessment for networked video via primary analysis of bit stream”. In: IEEE Transactions on Circuits and Systems for Video Technology 20.11 (2010), pp. 1544–1554. [83] F. Yi et al. “Attention Based Network For No-Reference UGC Video Quality Assessment”. In: IEEE International Conference on Image Processing (ICIP). 2021, pp. 1414–1418. 54 [84] J. You and J. Korhonen. “Deep neural networks for no-reference video quality assessment”. In: IEEE International Conference on Image Processing (ICIP). 2019, pp. 2349–2353. [85] C. Zhang and J. Kim. “Object detection with location-aware deformable convo- lution and backward attention filtering”. In: IEEE/CVF Conference on Com- puter Vision and Pattern Recognition. 2019, pp. 9452–9461. [86] Y. Zhang et al. “Blind video quality assessment with weakly supervised learning and resampling strategy”. In: IEEE Transactions on Circuits and Systems for Video Technology 29.8 (2018), pp. 2244–2255. [87] Y. Zhang, J. Lu, and J. Zhou. “Objects are different: Flexible monocular 3D object detection”. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, pp. 3289–3298. [88] W. Zhou and Z. Chen. “Deep local and global spatiotemporal feature aggrega- tion for blind video quality assessment”. In: IEEE International Conference on Visual Communications and Image Processing (VCIP). 2020, pp. 338–341. [89] K. Zhu et al. “A no-reference video quality assessment based on laplacian pyra- mids”. In: IEEE International Conference on Image Processing. 2013, pp. 49– 53.