Emergent Mind

RE-Adapt: Reverse Engineered Adaptation of Large Language Models

(2405.15007)
Published May 23, 2024 in cs.CL , cs.AI , and cs.LG

Abstract

We introduce RE-Adapt, an approach to fine-tuning LLMs on new domains without degrading any pre-existing instruction-tuning. We reverse engineer an adapter which isolates what an instruction-tuned model has learned beyond its corresponding pretrained base model. Importantly, this requires no additional data or training. We can then fine-tune the base model on a new domain and readapt it to instruction following with the reverse engineered adapter. RE-Adapt and our low-rank variant LoRE-Adapt both outperform other methods of fine-tuning, across multiple popular LLMs and datasets, even when the models are used in conjunction with retrieval-augmented generation.

RE-Adapt allows adding new knowledge to an instruction-tuned model without degrading pretraining knowledge.

Overview

  • The paper introduces RE-Adapt, a method for fine-tuning LLMs on new domains while preserving their instruction-following capabilities by reverse engineering an adapter without additional labeled data or further training efforts.

  • RE-Adapt employs adapter isolation, partial adaptation, and low-rank representation techniques to balance new domain knowledge and original instruction-tuning, achieving superior performance in experimental validation across multiple LLMs including Llama-3, Gemma, and Mistral.

  • The approach has significant practical and theoretical implications, allowing for resource-efficient fine-tuning of LLMs in new domains, with the potential for future research into mixed-domain adapters and multi-domain adaptation techniques.

Overview of RE-Adapt: Domain Fine-Tuning for Instruction-Tuned LLMs

The paper introduces RE-Adapt, a novel approach for fine-tuning LLMs on new domains while preserving pre-existing instruction-tuning. The primary innovation in RE-Adapt lies in reverse engineering an adapter that isolates the capabilities obtained from instruction-tuning, without requiring additional labeled data or further training efforts. This allows for the model to be fine-tuned on new, unlabeled domains and subsequently readapted to the original instruction-following capabilities using this reverse-engineered adapter.

Key Contributions

  1. Adapter Isolation: The paper proposes a method for isolating instruction-tuning into an adapter, identifying the difference in weights between a pretrained model and its instruction-tuned counterpart.
  2. Partial Adaptation: The authors introduce a technique for scaling the strength of adapters, allowing for fine-grained control over the influence of newly-added knowledge versus the original instruction-tuning.
  3. Low-Rank Representation: The paper demonstrates that these RE-Adapters can be effectively approximated using low-rank matrices, resulting in significant parameter reduction without loss of performance.
  4. Experimental Validation: The methods were validated across multiple LLMs including Llama-3, Gemma, and Mistral, showing superior performance in both closed-book and retrieval-augmented QA settings.

Detailed Methodology

The core idea behind RE-Adapt involves isolating the instruction-following capabilities of a model into what the authors term as a Reverse Engineered Adapter (RE-Adapter). This is achieved by computing the difference in weights between an instruction-tuned model and its respective pretrained version. Once this adapter is isolated, the pretrained model can be fine-tuned on a new domain using a knowledge adapter (implemented using DoRA in experiments). The final model integrates both the knowledge and instruction adapters, with the strength of each adapter controlled via scaling factors.

Experimental Findings

Closed-Book QA

The closed-book QA performance demonstrated that both RE-Adapt and its low-rank variant LoRE-Adapt outperform pretrained and instruction-tuned models on new domain-specific datasets. For instance, with the Llama-3 model, RE-Adapt achieved a Rouge-L score of 46 on StreamingQA, significantly higher compared to the pretrained score of 9 and the instruction-tuned model score of 33.

Retrieval-Augmented QA (RAG)

The benefits of RE-Adapt persisted even when retrieval-augmented generation (RAG) was used. Notably, RE-Adapt improved the performance of models using both BM-25 and oracle retrievers. These improvements suggest that fine-tuning the model itself, in conjunction with RAG, leads to better interpretation of retrieved context.

Model Generalization

Interestingly, the RE-Adapters also improved performance on out-of-domain tasks, such as the Natural Questions dataset. This finding indicates that RE-Adapt can not only incorporate new domain knowledge but can also recover and retain pretraining knowledge that might be suppressed by instruction-tuning.

Implications and Future Work

Practical Implications

The ability to fine-tune LLMs on new domains without compromising existing instruction-following capabilities has broad implications for both academia and industry. Resource-constrained organizations can now leverage state-of-the-art instruction-tuned models and adapt them to specific tasks or domains without the need for extensive annotated datasets or prohibitive computational resources.

Theoretical Implications

From a theoretical standpoint, the isolation of instruction-tuning into parameter-efficient adapters and the introduction of partial adaptation provide new avenues for understanding and controlling the balance between task-specific knowledge and general problem-solving capabilities. This opens up further research into the nature of these learned capabilities and optimal ways to integrate them.

Future Developments

Future work could extend the RE-Adapt methodology to a wider range of tasks beyond question answering, and further investigate the potential of mixed-domain adapters. Additionally, exploring the interplay between different adapter types and their scaling factors in more depth could yield more sophisticated multi-domain adaptation techniques.

Conclusion

The RE-Adapt paper presents a robust and efficient methodology for fine-tuning instruction-tuned LLMs on new domains, significantly enhancing their versatility and utility. Through isolating and preserving instruction-following capabilities, RE-Adapt not only addresses a critical limitation in current fine-tuning practices but also sets a strong foundation for future enhancements in adaptive AI systems.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.