Patel, Dhyey Devendrakumar (2022) Visual Dubbing Pipeline using Two-Pass Identity Transfer. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
17MBPatel_MSc_F2022.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
Visual dubbing uses visual computing and deep learning to alter the lip and mouth articulations of the actor to sync with the dubbed speech. It has the potential to disrupt the dubbing industry. Quality of the dubbed result is primary for the industry. An important requirement is that visual lip sync changes be localized to the mouth region and not affect the rest of the actor's face or the rest of the video frame. Current methods can create realistic looking fake faces with expressions. However, many fail to localize lip sync and have quality problems such as identity loss, low-res, blurs, face skin feature or colour loss, and temporal jitter. These problems mainly arise because end-to-end trained networks poorly disentangle these different visual dubbing parameters (pose, skin colour, identity, lip movements, etc.). Our main contribution is a new visual dubbing pipeline, in which, instead of end-to-end training we apply incrementally different disentangling techniques for each parameter. Our pipeline is composed of three main steps: pose alignment, identity transfer, and video reassembly. Expert models in each step are fine-tuned for the actor. We propose an identity transfer network with an added style block, which with pre-training is able to decouple face components, specifically identity and expression, and also works with short video clips like TV ads. Our pipeline also includes novel stages related to temporal smoothing of the reenacted face, actor specific super resolution to retain fine facial details, and a second pass through the identity transfer network for preserving actor identity. Localization of lip-sync is achieved by restricting changes in the original video frame to just the actor's mouth region. The results are convincing, and a user survey also confirms their quality. Relevant quantitative metrics are included.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | Patel, Dhyey Devendrakumar |
Institution: | Concordia University |
Degree Name: | M. Comp. Sc. |
Program: | Computer Science |
Date: | August 2022 |
Thesis Supervisor(s): | Popa, Tiberiu and Mudur, Sudhir |
ID Code: | 991098 |
Deposited By: | Dhyey Devendrakumar Patel |
Deposited On: | 27 Oct 2022 14:38 |
Last Modified: | 06 Mar 2023 16:31 |
Repository Staff Only: item control page