Montajabi, Zahra (2022) Deep Learning Methods for Codecs. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
18MBMontajabi_MASc_S2023.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
Due to the recent advent of high-resolution mobile and camera devices, it is necessary to develop an efficient solution for saving the new video content instead of traditional compression methods. Recently, video compression received enormous attention among computer vision problems in media technologies. Using state-of-the-art video compression methods, videos can be transmitted in a better quality requiring less bandwidth and memory. The advent of neural network-based video compression methods remarkably promoted video coding performance.
In this thesis, two different video compression methods are proposed. The details of the models architectures and evaluation methods are elaborated, and the results are reported numerically and visually. In the first method, Recurrent Neural Network (RNN) and long short-term memory (LSTM) units are used to keep the valuable information and eliminate unnecessary ones to iteratively reduce the quality loss of reconstructed videos and therefore, encode the videos with less quality loss.
In the second method, an Invertible Neural Network (INN) is utilized to reduce the information loss problem. Unlike the classic auto-encoders which lose some information during encoding, INN can preserve more information and therefore, reconstruct videos with more clear details. The proposed methods are evaluated using the peak signal-to-noise ratio (PSNR), video multimethod assessment fusion (VMAF), and structural similarity index measure (SSIM) quality metrics. The proposed methods are applied to two different public video compression datasets, the Ultra Video Group (UVG) dataset and the YouTube UGC video compression dataset, and the results show that our methods outperform existing standard video encoding schemes such as H.264 and H.265.
In the third part of this thesis, a deep learning method is used to find the semantic regions of interest (SRoI) which is one of the most challenging problems in computer vision and image processing. Finding the semantic regions of interest can be used in different image processing tasks such as image and video compression, enhancement, and reformatting. By knowing the semantic region of interest within images, we can improve the visual quality of images by compressing the more important parts with higher quality and the less important parts, such as the background, with a lower quality. This operation can be achieved without changing the overall compression ratio and the Peak Signal-to-noise Ratio (PSNR) quality metric. Finding the SRoI can make the processes of image enhancement and color correction more accurate by focusing only on the important parts. Moreover, for the image reformatting process, the important parts of the image may be lost. But by using the SRoI, we can reformat the image in a better way by keeping the most important regions in the frame.
For these purposes, a method is proposed using OpenAI’s CLIP model to find SRoI by performing a semantic search for objects in the image which are detected by an object detection model called Generic RoI Extractor (GRoIE). The results of the proposed method are reported.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | Montajabi, Zahra |
Institution: | Concordia University |
Degree Name: | M.A. Sc. |
Program: | Quality Systems Engineering |
Date: | 21 December 2022 |
Thesis Supervisor(s): | Bouguila, Nizar |
ID Code: | 991468 |
Deposited By: | Zahra Montajabi |
Deposited On: | 21 Jun 2023 14:37 |
Last Modified: | 01 Sep 2023 00:00 |
Repository Staff Only: item control page