ORIGINAL ARTICLE
Single-image indoor localization using cross-domain learning from BIM models
,
 
,
 
 
 
More details
Hide details
1
Institute of Geodesy and Geoinformatics, Wrocław University of Environmental and Life Sciences, Grunwaldzka 53, 50-357, Wrocław, Poland
 
2
Faculty of Information and Communication Technology, Wrocław University of Science and Technology, Wybrzeże Stanisława Wyspiańskiego 27, 50-370 Wrocław, Poland
 
 
A - Research concept and design; B - Collection and/or assembly of data; C - Data analysis and interpretation; D - Writing the article; E - Critical revision of the article; F - Final approval of article
 
 
Submission date: 2025-11-27
 
 
Final revision date: 2026-03-18
 
 
Acceptance date: 2026-04-02
 
 
Publication date: 2026-04-29
 
 
Corresponding author
Małgorzata Jarząbek-Rychard   

Institute of Geodesy and Geoinformatics, Wrocław University of Environmental and Life Sciences, Grunwaldzka 53, 50-357, Wrocław, Poland
 
 
Reports on Geodesy and Geoinformatics 2026;121:50-58
 
KEYWORDS
TOPICS
ABSTRACT
Accurate indoor camera localization is crucial for applications in augmented reality, robotics, and autonomous navigation. While single-image deep learning models for 6-DOF pose regression have shown competitive results on established benchmarks, their development still requires extensive data annotation and hyperparameter tuning. In this work, we investigate the combination of advanced network architectures, transfer learning, and synthetic data to improve single-image indoor pose regression. Our approach employs a ResNet50 backbone pre-trained on the Places365 dataset and further trained and evaluated on established benchmarks. To enhance the training data, synthetic images are generated from 3D BIM models using Unreal Engine, with alignment procedures ensuring accurate correspondence between synthetic and real environments. Real RGB images are preprocessed to resemble synthetic data, enabling effective cross-domain evaluation. Experiments demonstrate that both architectural design and pretraining significantly influence model performance. On the UniMelb dataset (real-to-real scenario), the model achieves 0.21 m and 0.80° errors, surpassing baseline accuracy. We also present cross-validation and synthetic-to-synthetic experiments, providing insights into factors affecting performance and interactions between architecture, pretraining, and dataset characteristics.
ACKNOWLEDGEMENTS
The research was supported by the Wrocław University of Environmental and Life Sciences (Poland) as part of the research project no. N060/0002/24.
FUNDING
Wrocław University of Environmental and Life Sciences
DATA AVAILABILITY
The datasets generated and analyzed in this study are available from the corresponding author upon reasonable request.
REFERENCES (31)
1.
Acharya D. (2020). Visual indoor localisation using a 3D building model, phdthesis. The University of Melbourne.
 
2.
Acharya D., Khoshelham K. (2023). Reverse domain adaptation for indoor camera pose regression. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences. X-1/W1-2023: 453–460-453–460. doi:10.5194/isprs-annals-X-1-W1-2023-453-2023.
 
3.
Acharya D., Khoshelham K., Winter S. (2019). BIM-PoseNet: Indoor camera localisation using a 3D indoor model and deep learning from synthetic images. ISPRS Journal of Photogrammetry and Remote Sensing. 150: 245–258-245–258. doi:10.1016/j.isprsjprs.2019.02.020.
 
4.
Acharya D., Tatli C. J., Khoshelham K. (2023). Synthetic-real image domain adaptation for indoor camera pose regression using a 3D model. ISPRS Journal of Photogrammetry and Remote Sensing. 202: 405–421-405–421. doi:10.1016/j.isprsjprs.2023.06.013.
 
5.
Acharya D., Tennakoon R., Muthu S., Khoshelham K., Hoseinnezhad R., Bab-Hadiashar A. (2022). Single-image localisation using 3D models: Combining hierarchical edge maps and semantic segmentation for domain adaptation. Automation in Construction. 136: 104152-104152. doi:10.1016/j.autcon.2022.104152.
 
6.
Agarwal S., Snavely N., Simon I., Seitz S. M., Szeliski R. (2009). Building Rome in a day. 2009 IEEE 12th International Conference on Computer Vision. 72-79-72-79. doi:10.1109/ICCV.2009.5459148.
 
7.
Bach T. B., Dinh T. T., Lee J. H. (2022). FeatLoc: Absolute pose regressor for indoor 2D sparse features with simplistic view synthesizing. ISPRS Journal of Photogrammetry and Remote Sensing. 189: 50–62-50–62. doi:10.1016/j.isprsjprs.2022.04.021.
 
8.
Bay H., Tuytelaars T., Van Gool L. (2006). SURF: Speeded Up Robust Features. Computer Vision – ECCV 2006. 404–417-404–417.
 
9.
Blanton H. (2021). Revisiting Absolute Pose Regression, phdthesis. University of Kentucky.
 
10.
Bousmalis K., Silberman N., Dohan D., Erhan D., Krishnan D. (2016). Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). : 95-104-95-104.
 
11.
Clark R., Wang S., Markham A., Trigoni N., Wen H. (2017). VidLoc: 6-DoF Video-Clip Relocalization. CoRR. abs/1702.06521.
 
12.
Deng J., Dong W., Socher R., Li L. J., Li K., Fei-Fei L. (2009). ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248-255-248-255. doi:10.1109/CVPR.2009.5206848.
 
13.
Dosovitskiy A., Beyer L., Kolesnikov A., Weissenborn D., Zhai X., Unterthiner T., Dehghani M., Minderer M., Heigold G., Gelly S., Uszkoreit J., Houlsby N. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.
 
14.
Dozat T. (2016). Incorporating Nesterov Momentum into Adam. Proceedings of the 4th International Conference on Learning Representations (ICLR) Workshop.
 
15.
Furukawa Y., Ponce J. (2010). Accurate, Dense, and Robust Multiview Stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence. 32 (8): 1362-1376-1362-1376. doi:10.1109/TPAMI.2009.161.
 
16.
Goodfellow I. J., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., Bengio Y. (2014). Generative Adversarial Networks.
 
17.
He K., Zhang X., Ren S., Sun J. (2015). Deep Residual Learning for Image Recognition.
 
18.
Kendall A., Cipolla R. (2016). Modelling uncertainty in deep learning for camera relocalization. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). 4762-4769-4762-4769. doi:10.1109/ICRA.2016.7487679.
 
19.
Kendall A., Cipolla R. (2017). Geometric loss functions for camera pose regression with deep learning. CoRR. abs/1704.00390.
 
20.
Kendall A., Grimes M., Cipolla R. (2015). Convolutional networks for real-time 6-DOF camera relocalization. CoRR. abs/1505.07427.
 
21.
Li M., Qin J., Li D., Chen R., Liao X., Guo B. (2021). VNLSTM-PoseNet: A novel deep ConvNet for real-time 6-DOF camera relocalization in urban streets. Geo-spatial Information Science. 24 (3): 422–437-422–437. doi:10.1080/10095020.2021.1960779.
 
22.
Lowe D. G. (2004). Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision. 60: 91–110-91–110. doi:10.1023/B:VISI.0000029664.99615.94.
 
23.
Nurutdinova I., Fitzgibbon A. (2015). Towards Pointless Structure from Motion: 3D Reconstruction and Camera Parameters from General 3D Curves. 2015 IEEE International Conference on Computer Vision (ICCV). 2363-2371-2363-2371. doi:10.1109/ICCV.2015.272.
 
24.
Peng X., Sun B., Ali K., Saenko K. (2015). Learning Deep Object Detectors from 3D Models.
 
25.
Sattler T., Zhou Q., Pollefeys M., Leal-Taixé L. (2019). Understanding the Limitations of CNN-based Absolute Camera Pose Regression. CoRR. abs/1903.07504.
 
26.
Szegedy C., Liu W., Jia Y., Sermanet P., Reed S., Anguelov D., Erhan D., Vanhoucke V., Rabinovich A. (2015). Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1–9-1–9. doi:10.1109/CVPR.2015.7298594.
 
27.
Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A. N., Kaiser L., Polosukhin I. (2017). Attention Is All You Need.
 
28.
Walch F., Hazirbas C., Leal-Taixé L., Sattler T., Hilsenbeck S., Cremers D. (2016). Image-based Localization with Spatial LSTMs. CoRR. abs/1611.07890.
 
29.
Yao D., Zhu H., Ren B., Zhuang X. (2024). Improving single image localization through domain adaptation and large kernel attention with synthetic data. Engineering Applications of Artificial Intelligence. 137: 108951-108951. doi:10.1016/j.engappai.2024.108951.
 
30.
Zhou B., Lapedriza A., Khosla A., Oliva A., Torralba A. (2018). Places: A 10 Million Image Database for Scene Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. 40 (6): 1452–1464-1452–1464. doi:10.1109/TPAMI.2017.2723009.
 
31.
Zhu J. Y., Park T., Isola P., Efros A. A. (2020). Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks.
 
eISSN:2391-8152
ISSN:2391-8365
Journals System - logo
Scroll to top