RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing (2306.11300v5)

Published 20 Jun 2023 in cs.CV, cs.AI, cs.CL, and cs.MM

Abstract: Pre-trained Vision-LLMs (VLMs) utilizing extensive image-text paired data have demonstrated unprecedented image-text association capabilities, achieving remarkable results across various downstream tasks. A critical challenge is how to make use of existing large-scale pre-trained VLMs, which are trained on common objects, to perform the domain-specific transfer for accomplishing domain-related downstream tasks. A critical challenge is how to make use of existing large-scale pre-trained VLMs, which are trained on common objects, to perform the domain-specific transfer for accomplishing domain-related downstream tasks. In this paper, we propose a new framework that includes the Domain pre-trained Vision-LLM (DVLM), bridging the gap between the General Vision-LLM (GVLM) and domain-specific downstream tasks. Moreover, we present an image-text paired dataset in the field of remote sensing (RS), RS5M, which has 5 million RS images with English descriptions. The dataset is obtained from filtering publicly available image-text paired datasets and captioning label-only RS datasets with pre-trained VLM. These constitute the first large-scale RS image-text paired dataset. Additionally, we fine-tuned the CLIP model and tried several Parameter-Efficient Fine-Tuning methods on RS5M to implement the DVLM. Experimental results show that our proposed dataset is highly effective for various tasks, and our model GeoRSCLIP improves upon the baseline or previous state-of-the-art model by $3\%\sim20\%$ in Zero-shot Classification (ZSC), $3\%\sim6\%$ in Remote Sensing Cross-Modal Text-Image Retrieval (RSCTIR) and $4\%\sim5\%$ in Semantic Localization (SeLo) tasks. Dataset and models have been released in: \url{https://github.com/om-ai-lab/RS5M}.

References (8)

Citations (20)

View on Semantic Scholar

Summary

The paper presents RS5M and GeoRSCLIP, a comprehensive vision-language dataset and model specifically designed for remote sensing applications.
The methodology integrates satellite imagery with textual data using advanced multimodal techniques to enhance interpretation and analysis.
The paper demonstrates significant improvements in remote sensing tasks, offering practical benefits for environmental monitoring and urban planning.

Overview of the IEEEtran \LaTeX\ Templates Usage Guide

The paper, "How to Use the IEEEtran \LaTeX\ Templates," provides a detailed guide for authors aiming to prepare submissions for the Institute of Electrical and Electronics Engineers (IEEE) using the IEEEtran class in \LaTeX. This document is intended for users with a basic understanding of \LaTeX and focuses on ensuring the compatibility of document structure with IEEE's publication process.

Template Features and Design

The IEEEtran class file is purposefully designed to approximate the appearance and length of the final published version of a document, though it is not the ultimate printable format. The template facilitates the transformation of \LaTeX documents into XML, which is subsequently converted into both PDF and HTML formats for IEEE Xplore. This structure supports the efficient handling and processing of manuscripts by IEEE's outsourcing vendors.

Key Documentation Aspects

The documentation thoroughly covers the elements of a typical journal or conference paper. It provides explicit details on setting up a paper title, author affiliations, abstract, and index terms, utilizing standard \LaTeX commands for these components. For less common elements, such as special structures required by certain IEEE societies or conferences, the paper references the more comprehensive "IEEEtran_HOWTO.pdf."

Document Class Options

The IEEEtran class provides several document class options tailored for different types of publications, including journal articles, conference papers, and technical notes, among others. The paper advises beginning each document with the appropriate class declaration to ensure conformity with the intended publication's style requirements.

Creating Common Elements

The guide covers detailed instructions on creating common front and body matter elements. It includes coding for running heads, section headings, citations, figures, tables, lists, and mathematical elements, emphasizing the importance of adhering to IEEE formatting standards. Special attention is given to ensuring proper equation formatting and the use of \LaTeX environments such as equation, align, and cases.

Additional Resources and Support

The paper provides resources for obtaining \LaTeX distributions, the IEEEtran templates, and user support groups. It suggests the \TeX User Group as a recommended source for \LaTeX distributions across different operating systems and directs readers to the IEEE Template Selector for the latest template versions. For new users, the guide advocates reviewing Tobias Oetiker's "The Not So Short Introduction to \LaTeX."

Considerations and Best Practices

The paper highlights best practices to avoid common pitfalls in \LaTeX document preparation. It advises on the correct use of cross-references, labeling conventions, and warns against outdated coding practices such as using eqnarray or $$ math delimiters. It underscores the significance of ensuring all elements, equations, and references are included and accurately formatted before submission.

Conclusion and Practical Implications

This guide serves as an essential resource for authors intending to publish with IEEE. By following the structured approach outlined, authors can ensure conformity with IEEE's rigorous formatting standards, ultimately facilitating a smoother submission process. The template's design and extensive options exemplify a robust tool for academic and professional authors working across various IEEE publications.

Future enhancements to the template could potentially integrate more automated processes or expand compatibility with other editorial software, further simplifying the authoring experience.

PDF Markdown

Related Papers

GitHub

GitHub - om-ai-lab/RS5M: RS5M: a large-scale vision language dataset for remote sensing (265 stars)