RE-Adapt: Reverse Engineered Adaptation of Large Language Models (2405.15007v1)

Published 23 May 2024 in cs.CL, cs.AI, and cs.LG

Abstract: We introduce RE-Adapt, an approach to fine-tuning LLMs on new domains without degrading any pre-existing instruction-tuning. We reverse engineer an adapter which isolates what an instruction-tuned model has learned beyond its corresponding pretrained base model. Importantly, this requires no additional data or training. We can then fine-tune the base model on a new domain and readapt it to instruction following with the reverse engineered adapter. RE-Adapt and our low-rank variant LoRE-Adapt both outperform other methods of fine-tuning, across multiple popular LLMs and datasets, even when the models are used in conjunction with retrieval-augmented generation.

Citations (1)

View on Semantic Scholar

Summary

The paper introduces a reverse engineered adapter that isolates instruction-tuning capabilities to enable domain adaptation without additional labeled data.
It details a technique for precise control over adapter strengths via low-rank matrix approximation, significantly boosting performance in QA settings.
Experimental results on Llama-3 and other models demonstrate enhanced generalization on closed-book and retrieval-augmented QA tasks.

Overview of RE-Adapt: Domain Fine-Tuning for Instruction-Tuned LLMs

The paper introduces RE-Adapt, a novel approach for fine-tuning LLMs on new domains while preserving pre-existing instruction-tuning. The primary innovation in RE-Adapt lies in reverse engineering an adapter that isolates the capabilities obtained from instruction-tuning, without requiring additional labeled data or further training efforts. This allows for the model to be fine-tuned on new, unlabeled domains and subsequently readapted to the original instruction-following capabilities using this reverse-engineered adapter.

Key Contributions

Adapter Isolation: The paper proposes a method for isolating instruction-tuning into an adapter, identifying the difference in weights between a pretrained model and its instruction-tuned counterpart.
Partial Adaptation: The authors introduce a technique for scaling the strength of adapters, allowing for fine-grained control over the influence of newly-added knowledge versus the original instruction-tuning.
Low-Rank Representation: The paper demonstrates that these RE-Adapters can be effectively approximated using low-rank matrices, resulting in significant parameter reduction without loss of performance.
Experimental Validation: The methods were validated across multiple LLMs including Llama-3, Gemma, and Mistral, showing superior performance in both closed-book and retrieval-augmented QA settings.

Detailed Methodology

The core idea behind RE-Adapt involves isolating the instruction-following capabilities of a model into what the authors term as a Reverse Engineered Adapter (RE-Adapter). This is achieved by computing the difference in weights between an instruction-tuned model and its respective pretrained version. Once this adapter is isolated, the pretrained model can be fine-tuned on a new domain using a knowledge adapter (implemented using DoRA in experiments). The final model integrates both the knowledge and instruction adapters, with the strength of each adapter controlled via scaling factors.

Experimental Findings

Closed-Book QA

The closed-book QA performance demonstrated that both RE-Adapt and its low-rank variant LoRE-Adapt outperform pretrained and instruction-tuned models on new domain-specific datasets. For instance, with the Llama-3 model, RE-Adapt achieved a Rouge-L score of 46 on StreamingQA, significantly higher compared to the pretrained score of 9 and the instruction-tuned model score of 33.

Retrieval-Augmented QA (RAG)

The benefits of RE-Adapt persisted even when retrieval-augmented generation (RAG) was used. Notably, RE-Adapt improved the performance of models using both BM-25 and oracle retrievers. These improvements suggest that fine-tuning the model itself, in conjunction with RAG, leads to better interpretation of retrieved context.

Model Generalization

Interestingly, the RE-Adapters also improved performance on out-of-domain tasks, such as the Natural Questions dataset. This finding indicates that RE-Adapt can not only incorporate new domain knowledge but can also recover and retain pretraining knowledge that might be suppressed by instruction-tuning.

Implications and Future Work

Practical Implications

The ability to fine-tune LLMs on new domains without compromising existing instruction-following capabilities has broad implications for both academia and industry. Resource-constrained organizations can now leverage state-of-the-art instruction-tuned models and adapt them to specific tasks or domains without the need for extensive annotated datasets or prohibitive computational resources.

Theoretical Implications

From a theoretical standpoint, the isolation of instruction-tuning into parameter-efficient adapters and the introduction of partial adaptation provide new avenues for understanding and controlling the balance between task-specific knowledge and general problem-solving capabilities. This opens up further research into the nature of these learned capabilities and optimal ways to integrate them.

Future Developments

Future work could extend the RE-Adapt methodology to a wider range of tasks beyond question answering, and further investigate the potential of mixed-domain adapters. Additionally, exploring the interplay between different adapter types and their scaling factors in more depth could yield more sophisticated multi-domain adaptation techniques.

Conclusion

The RE-Adapt paper presents a robust and efficient methodology for fine-tuning instruction-tuned LLMs on new domains, significantly enhancing their versatility and utility. Through isolating and preserving instruction-following capabilities, RE-Adapt not only addresses a critical limitation in current fine-tuning practices but also sets a strong foundation for future enhancements in adaptive AI systems.

Related Papers

Tweets

https://twitter.com/willcfleshman/status/1795069242695667739

https://twitter.com/fly51fly/status/1795216530986369143

https://twitter.com/moo_hax/status/1799520340261122357

https://twitter.com/LucasPCaccia/status/1905305776698118477

https://twitter.com/knishimae0531/status/1795250087758803199

https://twitter.com/GptMaestro/status/1795251963950719240