HalluciDet: Hallucinating RGB Modality for Person Detection Through Privileged Information (2310.04662v2)

Published 7 Oct 2023 in cs.CV and cs.AI

Abstract: A powerful way to adapt a visual recognition model to a new domain is through image translation. However, common image translation approaches only focus on generating data from the same distribution as the target domain. Given a cross-modal application, such as pedestrian detection from aerial images, with a considerable shift in data distribution between infrared (IR) to visible (RGB) images, a translation focused on generation might lead to poor performance as the loss focuses on irrelevant details for the task. In this paper, we propose HalluciDet, an IR-RGB image translation model for object detection. Instead of focusing on reconstructing the original image on the IR modality, it seeks to reduce the detection loss of an RGB detector, and therefore avoids the need to access RGB data. This model produces a new image representation that enhances objects of interest in the scene and greatly improves detection performance. We empirically compare our approach against state-of-the-art methods for image translation and for fine-tuning on IR, and show that our HalluciDet improves detection accuracy in most cases by exploiting the privileged information encoded in a pre-trained RGB detector. Code: https://github.com/heitorrapela/HalluciDet

Authors (6)

Heitor Rapela Medeiros (5 papers)
Masih Aminbeidokhti (9 papers)
Thomas Dubail (4 papers)
Eric Granger (121 papers)
Marco Pedersoli (81 papers)
Fidel A. Guerrero Pena (1 paper)

Citations (3)

View on Semantic Scholar

Summary

The paper introduces a novel cross-modal method that leverages a pre-trained RGB detector to guide IR-to-RGB image translation.
It employs a U-Net-based network with attention blocks and a custom detection-specific hallucination loss to optimize detection accuracy.
Experimental results on LLVIP and FLIR ADAS datasets show significant AP improvements, notably with the Faster R-CNN architecture.

Overview of "HalluciDet: Hallucinating RGB Modality for Person Detection Through Privileged Information"

This paper introduces HalluciDet, a novel approach for enhancing object detection tasks through cross-modal image translation, specifically focusing on infrared (IR) to visible (RGB) modalities. The primary objective is to improve person detection in domains where RGB data is unavailable during testing but an infrared modality is present, which is common in low-light conditions or surveillance applications. The proposed method is rooted in the framework of learning using privileged information (LUPI), leveraging a pre-trained RGB detector to guide the translation process. This approach emphasizes task-specific adaptations over mere image reconstruction, optimizing detection performance by reducing irrelevant details.

Methodology

HalluciDet employs a U-Net-based hallucination network augmented with attention blocks for performing IR to RGB translation. The network focuses on enhancing the modality representation tailored for detection tasks. The translation process optimizes a detection-specific loss function, referred to as the hallucination loss, integrating both classification and regression terms to improve IR detection accuracy.

Instead of merely replicating the original RGB images, HalluciDet enhances the representation space to facilitate better detection by utilizing the privileged information encoded in pre-trained RGB detectors. The hallucinated output prioritizes key features necessary for effective object detection, while mitigating noise and enhancing object distinction in low-light conditions.

Experimental Evaluation

The efficacy of HalluciDet was evaluated on two standard IR-RGB datasets: LLVIP and FLIR ADAS. Across different backbone networks (FCOS, RetinaNet, and Faster R-CNN), HalluciDet outperformed conventional image translation methods such as CycleGAN and FastCUT, and baseline methods like pixel manipulation techniques. Notably, HalluciDet showed substantial improvement in detection accuracy, particularly with the Faster R-CNN architecture, yielding a significant increase in Average Precision (AP).

The experiments demonstrated that HalluciDet achieved comparable or superior performance to models fine-tuned on IR data, with the advantage of retaining performance on the RGB task. This attribute makes HalluciDet an attractive solution for applications requiring dual-modality support without compromising the original RGB model performance.

Implications and Future Work

The implications of this research extend into applications where cross-modal detection is critical, such as autonomous vehicles and nighttime surveillance, particularly where light conditions are suboptimal. By exploiting privileged information during the training phase, HalluciDet presents a practical solution to enhance detection capabilities without extensive retraining on IR data alone.

Future work could explore integrating HalluciDet with other modalities and enhancing the hallucination network's representation capacity through advanced architectures or additional contextual features. Further research may also evaluate the scalability of HalluciDet across larger, more diverse datasets and its applicability in real-time scenarios where processing efficiency is paramount.

In summary, HalluciDet contributes to the field of computer vision by providing a robust methodology for adapting pre-trained RGB detectors to work effectively with IR data, utilizing privileged information to bridge the modality gap and enhancing detection accuracy in practical, cross-modal settings.

PDF Markdown

Related Papers

GitHub

GitHub - heitorrapela/HalluciDet: HalluciDet: Hallucinating RGB Modality for Person Detection Through Privileged Information (Accepted at WACV 2024) (14 stars)

Tweets

https://twitter.com/HeitorRapela/status/1772739226632667173

YouTube

Show All Videos