Research on 3D Reconstruction Technology of Indoor Buildings Based on Depth Prediction

Authors

  • Dashi Qiu

DOI:

https://doi.org/10.54097/8a7mdq55

Keywords:

3D Reconstruction, Depth Prediction, Multi-view, Indoor Scene

Abstract

To address the limitations of traditional 3D indoor scene reconstruction methods in resource-constrained environments, this paper proposes a depth prediction-based 3D reconstruction method for indoor buildings. The method first employs a pre-trained image encoder to extract multi-scale features from input images, which are then combined with metadata containing ray direction, depth information, and relative pose distance to construct a feature volume. This volume is fed into a 2D convolutional neural network, while a multi-scale depth prediction strategy is adopted to progressively refine depth estimation, generating high-quality depth predictions for more detailed 3D reconstruction. Experimental results demonstrate that the proposed method significantly outperforms traditional depth estimation approaches on the public dataset ScanNet, achieving a 21% improvement under the threshold accuracy metric δ < 1.05. In 3D reconstruction tasks, the method achieves near state-of-the-art performance (F-Score = 0.658) while enabling online real-time reconstruction with low memory consumption, exhibiting a per-frame latency of only 72ms.

Downloads

Download data is not yet available.

References

[1] Huang H, Yan X, Zheng Y, et al. Multi-view stereo algorithms based on deep learning: a survey [J]. Multimedia Tools and Applications, 2024: 1-32.

[2] Maglo A, Lavoué G, Dupont F, et al. 3d mesh compression: Survey, comparisons, and emerging trends [J]. ACM Computing Surveys (CSUR), 2015, 47(3): 1-41.

[3] Tan M, Le Q. Efficientnetv2: Smaller models and faster training; proceedings of the International conference on machine learning, F, 2021 [C]. PMLR.

[4] Wang K, Shen S. MVDepthNet: Real-time Multiview Depth Estimation Neural Network; proceedings of the 2018 International Conference on 3D Vision (3DV), F, 2018 [C].

[5] Im S, Jeon H-G, Lin S, et al. Dpsnet: End-to-end deep plane sweep stereo [J]. arXiv preprint arXiv:190500538, 2019.

[6] Schönberger J L, Zheng E, Frahm J-M, et al. Pixelwise view selection for unstructured multi-view stereo; proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III 14, F, 2016 [C]. Springer.

[7] Murez Z, Van As T, Bartolozzi J, et al. Atlas: End-to-end 3d scene reconstruction from posed images; proceedings of the European conference on computer vision, F, 2020 [C]. Springer.

[8] Stier N, Rich A, Sen P, et al. VoRTX: Volumetric 3D Reconstruction With Transformers for Voxelwise View Selection and Fusion [J]. 2021.

[9] Sun J, Xie Y, Chen L, et al. Neuralrecon: Real-time coherent 3d reconstruction from monocular video; proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, F, 2021 [C].

Downloads

Published

30-04-2025

Issue

Section

Articles

How to Cite

Qiu, D. (2025). Research on 3D Reconstruction Technology of Indoor Buildings Based on Depth Prediction. Journal of Computing and Electronic Information Management, 16(3), 88-91. https://doi.org/10.54097/8a7mdq55