Convolutional neural networks have been shown to have a very high accuracy when applied to certain visual tasks and in particular semantic segmentation. In this thesis we address the problem of semantic segmentation of buildings from remote sensor imagery. We explore different architectures to semantic segmentation and propose ICT-Net: a novel network with the underlying architecture of a fully convolutional network, infused with feature re-calibrated Dense blocks at each layer. Uniquely, the proposed network (ICT-Net) combines the localization accuracy and use of context of the U-Net network architecture, the compact internal representations and reduced feature redundancy of the Dense blocks, and the dynamic channel-wise feature re-weighting of the Squeeze-and-Excitation(SE) blocks. The proposed network has been tested on two benchmark datasets and is shown to outperform all other state-of-the-art by more than 1.5% on the Jaccard index on INRIA’s dataset and 1.8% on the Jaccard index on AIRS dataset. Furthermore, as the building classification is typically the first step of the reconstruction process, in the latter part of the work we investigate the relationship of the classification accuracy to the reconstruction accuracy. A comparative quantitative analysis of reconstruction accuracies corresponding to different classification accuracies confirms the strong correlation between the two. We present the results which show a consistent and considerable reduction in the reconstruction accuracy. The work presented in this thesis has been published in the 16th Conference on Computer and Robot Vision 2019.