TL;DR: A flexible implicit representation for accurate large-scale 3D reconstruction by combining convolutional encoders with implicit occupancy decoders.
Recently, implicit neural representations have gained popularity for learning-based 3D reconstruction. While demonstrating promising results, most implicit approaches are limited to comparably simple geometry of single objects and do not scale to more complicated or large-scale scenes. The key limiting factor of implicit methods is their simple fully-connected network architecture which does not allow for integrating local information in the observations or incorporating inductive biases such as translational equivariance. In this paper, we propose Convolutional Occupancy Networks, a more flexible implicit representation for detailed reconstruction of objects and 3D scenes. By combining convolutional encoders with implicit occupancy decoders, our model incorporates inductive biases, enabling structured reasoning in 3D space. We investigate the effectiveness of the proposed representation by reconstructing complex geometry from noisy point clouds and low-resolution voxel representations. We empirically find that our method enables the fine-grained implicit 3D reconstruction of single objects, scales to large indoor scenes, and generalizes well from synthetic to real data.
We thank Max Planck ETH Center for Learning Systems (CLS) for
supporting Songyou Peng and the International Max Planck Research School forIntelligent Systems (IMPRS-IS) for supporting Michael Niemeyer. We are also grateful for the kind support and helpful discussions from members in AVG.
Website template from Minyoung Huh, modified from Colorful Colorization. The flythrough video of 3D reconstruction was rendered with mitsuba-visualize.