Saryazdi, Soroush (2021) End-to-end Representation Learning for 3D Reconstruction. Masters thesis, Concordia University.
Preview |
Text (application/pdf)
30MBSaryazdi_MSc_S2021.pdf - Accepted Version Available under License Spectrum Terms of Access. |
Abstract
Physically based rendering requires the digital representation of a scene to include both 3D geometry and material appearance properties of objects in the scene. Reconstructing such 3D representations from images of real-world environments has been a long-standing goal in the fields of computer vision, computer graphics, robotics, augmented and virtual reality, etc. Recently, representation learning based approaches have transformed the landscape of several domains such as image recognition and semantic segmentation. However, despite many encouraging advances in other domains, how these learning-based approaches can be leveraged in the realm of 3D reconstruction is still an open question. In this thesis, we propose approaches for using neural networks in conjunction with the 3D reconstruction pipeline such that they can be trained end-to-end based on a single end objective (e.g., to reconstruct an accurate 3D representation). Our main contributions include the following:
- A fully differentiable dense visual SLAM framework for reconstructing the 3D geometry of a scene from a sequence of RGB-D images, called gradslam. This work, carried out in collaboration with the Robotics and Embodied AI Lab (REAL) at MILA, resulted in the release of the first open-source library for differentiable SLAM.
- We propose the disentangled rendering loss for training neural networks to estimate material appearance parameters from image(s) of a near-flat surface. The disentangled rendering loss allows the network to weigh the importance of each material appearance parameter based on its effect on the final appearance of the material, while also having desirable mathematical properties for gradient-based training.
- We describe work towards an end-to-end trainable model that can simultaneously reconstruct the 3D geometry and predict the material appearance properties of a scene. A publicly available dataset for training such a model is not currently available. Thus, we have created a dataset of material appearance properties for complex scenes which we intend to release publicly.
Our approach enjoys many of the benefits of classical 3D reconstruction approaches such as interpretability (due to the modular nature) and the ability to use well-understood components from the reconstruction pipeline. Further, this approach also enjoys representation learning benefits such as the capability of solving challenging tasks which have been difficult to solve by designing explicit algorithms (e.g., material appearance property estimation for complex scenes), and their strong performance on end-to-end training tasks.
Divisions: | Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering |
---|---|
Item Type: | Thesis (Masters) |
Authors: | Saryazdi, Soroush |
Institution: | Concordia University |
Degree Name: | M. Sc. |
Program: | Computer Science |
Date: | 18 April 2021 |
Thesis Supervisor(s): | Mudur, Sudhir and Mendhurwar, Kaustubha |
Keywords: | Representation Learning, Deep Learning, Simultaneous localization and mapping, SLAM, 3D Reconstruction, Material Appearance Modeling |
ID Code: | 988331 |
Deposited By: | Soroush Saryazdi |
Deposited On: | 29 Jun 2021 23:07 |
Last Modified: | 06 Apr 2023 00:00 |
Repository Staff Only: item control page