Generalizable Thermal-based Depth Estimation via Pre-trained Visual Foundation Model

Published in ICRA, 2024

Abstract: Depth estimation is a crucial task in computer vision, applicable to various domains such as 3D reconstruction, robotics, and autonomous driving. In particular, thermal-based depth estimation has unique advantages, including night-time vision. However, the existing depth estimation method remains challenging in robust generalization due to limited data resources and spectral differences between thermal and RGB images. In this paper, we present a self-supervised approach to enhance thermal-based depth estimation by leveraging pre-trained visual models initially designed for RGB data. In detail, we design a novel two-stage training strategy, incorporating Low-rank Adapters and Convolutional Adapters, which not only significantly improves accuracy and robustness but also enables impressive zero-shot generalization capabilities. Our method outperforms existing thermal-based depth estimation models, opening new possibilities for cross-modal applications in computer vision and robotics research.

Download paper here

Recommended citation: Ruoyu Fan, Wang Zhao, Matthieu Lin, Qi Wang,Yong-Jin Liu*and Wenping Wang.Generalizable Thermal-based Depth Estimation via Pre-trained Visual Foundation Model. ICRA 2024.