Login | Register

End-to-end Representation Learning for 3D Reconstruction


End-to-end Representation Learning for 3D Reconstruction

Saryazdi, Soroush (2021) End-to-end Representation Learning for 3D Reconstruction. Masters thesis, Concordia University.

[thumbnail of Saryazdi_MSc_S2021.pdf]
Text (application/pdf)
Saryazdi_MSc_S2021.pdf - Accepted Version
Available under License Spectrum Terms of Access.


Physically based rendering requires the digital representation of a scene to include both 3D geometry and material appearance properties of objects in the scene. Reconstructing such 3D representations from images of real-world environments has been a long-standing goal in the fields of computer vision, computer graphics, robotics, augmented and virtual reality, etc. Recently, representation learning based approaches have transformed the landscape of several domains such as image recognition and semantic segmentation. However, despite many encouraging advances in other domains, how these learning-based approaches can be leveraged in the realm of 3D reconstruction is still an open question. In this thesis, we propose approaches for using neural networks in conjunction with the 3D reconstruction pipeline such that they can be trained end-to-end based on a single end objective (e.g., to reconstruct an accurate 3D representation). Our main contributions include the following:

- A fully differentiable dense visual SLAM framework for reconstructing the 3D geometry of a scene from a sequence of RGB-D images, called gradslam. This work, carried out in collaboration with the Robotics and Embodied AI Lab (REAL) at MILA, resulted in the release of the first open-source library for differentiable SLAM.

- We propose the disentangled rendering loss for training neural networks to estimate material appearance parameters from image(s) of a near-flat surface. The disentangled rendering loss allows the network to weigh the importance of each material appearance parameter based on its effect on the final appearance of the material, while also having desirable mathematical properties for gradient-based training.

- We describe work towards an end-to-end trainable model that can simultaneously reconstruct the 3D geometry and predict the material appearance properties of a scene. A publicly available dataset for training such a model is not currently available. Thus, we have created a dataset of material appearance properties for complex scenes which we intend to release publicly.

Our approach enjoys many of the benefits of classical 3D reconstruction approaches such as interpretability (due to the modular nature) and the ability to use well-understood components from the reconstruction pipeline. Further, this approach also enjoys representation learning benefits such as the capability of solving challenging tasks which have been difficult to solve by designing explicit algorithms (e.g., material appearance property estimation for complex scenes), and their strong performance on end-to-end training tasks.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Thesis (Masters)
Authors:Saryazdi, Soroush
Institution:Concordia University
Degree Name:M. Sc.
Program:Computer Science
Date:18 April 2021
Thesis Supervisor(s):Mudur, Sudhir and Mendhurwar, Kaustubha
Keywords:Representation Learning, Deep Learning, Simultaneous localization and mapping, SLAM, 3D Reconstruction, Material Appearance Modeling
ID Code:988331
Deposited By: Soroush Saryazdi
Deposited On:29 Jun 2021 23:07
Last Modified:06 Apr 2023 00:00
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top