Single-image indoor localization using cross-domain learning from BIM models

Piotr Paweł Ryszko; Dorota Włodarczyk; Małgorzata Jarząbek-Rychard

doi:10.2478/rgg-2026-0004

2026 vol. 121

Stats

CC BY-NC-ND 4.0

Get citation

ORIGINAL ARTICLE

Single-image indoor localization using cross-domain learning from BIM models

Piotr Paweł Ryszko ^{1,2, A-D}

Dorota Włodarczyk ^{1, B}

Małgorzata Jarząbek-Rychard ^{1, A,C,E-F}

More details

Hide details

Institute of Geodesy and Geoinformatics, Wrocław University of Environmental and Life Sciences, Grunwaldzka 53, 50-357, Wrocław, Poland

Faculty of Information and Communication Technology, Wrocław University of Science and Technology, Wybrzeże Stanisława Wyspiańskiego 27, 50-370 Wrocław, Poland

A - Research concept and design; B - Collection and/or assembly of data; C - Data analysis and interpretation; D - Writing the article; E - Critical revision of the article; F - Final approval of article

Submission date: 2025-11-27

Final revision date: 2026-03-18

Acceptance date: 2026-04-02

Publication date: 2026-04-29

Corresponding author

Małgorzata Jarząbek-Rychard

Institute of Geodesy and Geoinformatics, Wrocław University of Environmental and Life Sciences, Grunwaldzka 53, 50-357, Wrocław, Poland

Reports on Geodesy and Geoinformatics 2026;121:50-58

DOI: https://doi.org/10.2478/rgg-2026-0004

Article (PDF, 16.5 MB)

References (31)

KEYWORDS

indoor localization

camera pose estimation

deep learning

3D models

BIM

TOPICS

ABSTRACT

Accurate indoor camera localization is crucial for applications in augmented reality, robotics, and autonomous navigation. While single-image deep learning models for 6-DOF pose regression have shown competitive results on established benchmarks, their development still requires extensive data annotation and hyperparameter tuning. In this work, we investigate the combination of advanced network architectures, transfer learning, and synthetic data to improve single-image indoor pose regression. Our approach employs a ResNet50 backbone pre-trained on the Places365 dataset and further trained and evaluated on established benchmarks. To enhance the training data, synthetic images are generated from 3D BIM models using Unreal Engine, with alignment procedures ensuring accurate correspondence between synthetic and real environments. Real RGB images are preprocessed to resemble synthetic data, enabling effective cross-domain evaluation. Experiments demonstrate that both architectural design and pretraining significantly influence model performance. On the UniMelb dataset (real-to-real scenario), the model achieves 0.21 m and 0.80° errors, surpassing baseline accuracy. We also present cross-validation and synthetic-to-synthetic experiments, providing insights into factors affecting performance and interactions between architecture, pretraining, and dataset characteristics.

ACKNOWLEDGEMENTS

The research was supported by the Wrocław University of Environmental and Life Sciences (Poland) as part of the research project no. N060/0002/24.

FUNDING

Wrocław University of Environmental and Life Sciences

DATA AVAILABILITY

The datasets generated and analyzed in this study are available from the corresponding author upon reasonable request.

REFERENCES (31)

Acharya D. (2020). Visual indoor localisation using a 3D building model, phdthesis. The University of Melbourne.

eISSN:	2391-8152
ISSN:	2391-8365