Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 150 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 28 tok/s Pro
GPT-4o 80 tok/s Pro
Kimi K2 211 tok/s Pro
GPT OSS 120B 435 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning (2307.02053v1)

Published 5 Jul 2023 in cs.CL

Abstract: Recently, the release of INSTRUCTEVAL has provided valuable insights into the performance of LLMs that utilize encoder-decoder or decoder-only architecture. Interestingly, despite being introduced four years ago, T5-based LLMs, such as FLAN-T5, continue to outperform the latest decoder-based LLMs, such as LLAMA and VICUNA, on tasks that require general problem-solving skills. This performance discrepancy can be attributed to three key factors: (1) Pre-training data, (2) Backbone architecture, and (3) Instruction dataset. In this technical report, our main focus is on investigating the impact of the third factor by leveraging VICUNA, a LLM based on LLAMA, which has undergone fine-tuning on ChatGPT conversations. To achieve this objective, we fine-tuned VICUNA using a customized instruction dataset collection called FLANMINI. This collection includes a subset of the large-scale instruction dataset known as FLAN, as well as various code-related datasets and conversational datasets derived from ChatGPT/GPT-4. This dataset comprises a large number of tasks that demand problem-solving skills. Our experimental findings strongly indicate that the enhanced problem-solving abilities of our model, FLACUNA, are obtained through fine-tuning VICUNA on the FLAN dataset, leading to significant improvements across numerous benchmark datasets in INSTRUCTEVAL. FLACUNA is publicly available at https://huggingface.co/declare-lab/flacuna-13b-v1.0.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. Instructeval: Towards holistic evaluation of instruction-tuned large language models, 2023.
  2. Stanford alpaca: An instruction-following llama model, 2023. URL https://github.com/tatsu-lab/stanford_alpaca.
  3. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023. URL https://vicuna.lmsys.org.
  4. Llama: Open and efficient foundation language models. ArXiv, abs/2302.13971, 2023.
  5. The flan collection: Designing data and methods for effective instruction tuning. arXiv preprint arXiv:2301.13688, 2023.
  6. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  7. Competition-level code generation with AlphaCode. Science, 378(6624):1092–1097, dec 2022a. doi: 10.1126/science.abq1158. URL https://doi.org/10.1126%2Fscience.abq1158.
  8. Measuring coding challenge competence with apps. ArXiv, abs/2105.09938, 2021a.
  9. Codesearchnet challenge: Evaluating the state of semantic code search. ArXiv, abs/1909.09436, 2019a.
  10. CodeSearchNet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436, 2019b.
  11. Competition-level code generation with alphacode. arXiv preprint arXiv:2203.07814, 2022b.
  12. Measuring coding challenge competence with apps. NeurIPS, 2021b.
  13. Sahil Chaudhary. Code alpaca: An instruction-following llama model for code generation. https://github.com/sahil280114/codealpaca, 2023.
  14. Measuring massive multitask language understanding. In International Conference on Learning Representations, 2021c. URL https://openreview.net/forum?id=d7KBjmI3GmQ.
  15. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models, 2022.
  16. Challenging big-bench tasks and whether chain-of-thought can solve them. ArXiv, abs/2210.09261, 2022.
  17. Evaluating large language models trained on code. ArXiv, abs/2107.03374, 2021.
  18. A general language assistant as a laboratory for alignment, 2021.
  19. Judging llm-as-a-judge with mt-bench and chatbot arena, 2023.
  20. Full parameter fine-tuning for large language models with limited resources, 2023.
Citations (15)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Questions

We haven't generated a list of open questions mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.