Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models (2403.12881v1)

Published 19 Mar 2024 in cs.CL

Abstract: Open-sourced LLMs have achieved great success in various NLP tasks, however, they are still far inferior to API-based models when acting as agents. How to integrate agent ability into general LLMs becomes a crucial and urgent problem. This paper first delivers three key observations: (1) the current agent training corpus is entangled with both formats following and agent reasoning, which significantly shifts from the distribution of its pre-training data; (2) LLMs exhibit different learning speeds on the capabilities required by agent tasks; and (3) current approaches have side-effects when improving agent abilities by introducing hallucinations. Based on the above findings, we propose Agent-FLAN to effectively Fine-tune LLMs for Agents. Through careful decomposition and redesign of the training corpus, Agent-FLAN enables Llama2-7B to outperform prior best works by 3.5\% across various agent evaluation datasets. With comprehensively constructed negative samples, Agent-FLAN greatly alleviates the hallucination issues based on our established evaluation benchmark. Besides, it consistently improves the agent capability of LLMs when scaling model sizes while slightly enhancing the general capability of LLMs. The code will be available at https://github.com/InternLM/Agent-FLAN.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. FireAct: Toward language agent fine-tuning. arXiv preprint arXiv:2310.05915, 2023a.
  2. AgentVerse: Facilitating multi-agent collaboration and exploring emergent behaviors in agents. arXiv preprint arXiv:2308.10848, 2023b.
  3. T-eval: Evaluating the tool utilization capability step by step. arXiv preprint arXiv:2312.14033, 2023c.
  4. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023. URL https://lmsys.org/blog/2023-03-30-vicuna/.
  5. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416, 2022.
  6. Mind2web: Towards a generalist agent for the web. arXiv preprint arXiv:2306.06070, 2023.
  7. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314, 2023.
  8. Specializing smaller language models towards multi-step reasoning. arXiv preprint arXiv:2301.12726, 2023.
  9. GlaiveAI. glaive-function-calling-v2, 2023. URL https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2.
  10. ToRA: A tool-integrated reasoning agent for mathematical problem solving. arXiv preprint arXiv:2309.17452, 2023.
  11. Textbooks are all you need. arXiv preprint arXiv:2306.11644, 2023.
  12. MetaGPT: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352, 2023.
  13. Unnatural instructions: Tuning language models with (almost) no human labor. arXiv preprint arXiv:2212.09689, 2022.
  14. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  15. Inner monologue: Embodied reasoning through planning with language models. In Conference on Robot Learning, pp.  1769–1782. PMLR, 2023.
  16. Hint: Hypernetwork instruction tuning for efficient zero-shot generalisation. arXiv preprint arXiv:2212.10315, 2022.
  17. Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38, 2023.
  18. Mixtral of experts. arXiv preprint arXiv:2401.04088, 2024.
  19. Language models can solve computer tasks, 2023.
  20. Multi-step jailbreaking privacy attacks on chatgpt. arXiv preprint arXiv:2304.05197, 2023.
  21. Encouraging divergent thinking in large language models through multi-agent debate, 2023.
  22. Generated knowledge prompting for commonsense reasoning. arXiv preprint arXiv:2110.08387, 2021.
  23. AgentBench: Evaluating llms as agents. arXiv preprint arXiv:2308.03688, 2023a.
  24. Bolaa: Benchmarking and orchestrating llm-augmented autonomous agents, 2023b.
  25. The flan collection: Designing data and methods for effective instruction tuning. arXiv preprint arXiv:2301.13688, 2023.
  26. Full parameter fine-tuning for large language models with limited resources. arXiv preprint arXiv:2306.09782, 2023.
  27. Editing personality for llms. arXiv preprint arXiv:2310.02168, 2023.
  28. Augmented language models: a survey. arXiv preprint arXiv:2302.07842, 2023.
  29. OpenAI. Openai: Introducing chatgpt, 2022. URL https://openai.com/blog/chatgpt.
  30. OpenAI. Gpt-4 technical report, 2023.
  31. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, pp.  1–22, 2023.
  32. Gorilla: Large language model connected with massive apis. arXiv preprint arXiv:2305.15334, 2023.
  33. Making language models better tool learners with execution feedback. arXiv preprint arXiv:2305.13068, 2023.
  34. ToolLLM: Facilitating large language models to master 16000+ real-world apis. arXiv preprint arXiv:2307.16789, 2023.
  35. Multitask prompted training enables zero-shot task generalization. arXiv preprint arXiv:2110.08207, 2021.
  36. Reflexion: Language agents with verbal reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  37. Alfworld: Aligning text and embodied environments for interactive learning. arXiv preprint arXiv:2010.03768, 2020.
  38. Llm-planner: Few-shot grounded planning for embodied agents with large language models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  2998–3009, 2023.
  39. Cognitive architectures for language agents. arXiv preprint arXiv:2309.02427, 2023.
  40. Multi-agent collaboration: Harnessing the power of intelligent llm agents. arXiv preprint arXiv:2306.03314, 2023.
  41. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  42. A survey on large language model based autonomous agents. arXiv preprint arXiv:2308.11432, 2023a.
  43. Scienceworld: Is your agent smarter than a 5th grader? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp.  11279–11298, 2022a.
  44. Instructuie: Multi-task instruction tuning for unified information extraction. arXiv preprint arXiv:2304.08085, 2023b.
  45. Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560, 2022b.
  46. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
  47. AutoGen: Enabling next-gen llm applications via multi-agent conversation framework. arXiv preprint arXiv:2308.08155, 2023.
  48. The rise and potential of large language model based agents: A survey. arXiv preprint arXiv:2309.07864, 2023.
  49. Rewoo: Decoupling reasoning from observations for efficient augmented language models. arXiv preprint arXiv:2305.18323, 2023.
  50. Hotpotqa: A dataset for diverse, explainable multi-hop question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp.  2369–2380, 2018.
  51. Webshop: Towards scalable real-world web interaction with grounded language agents. Advances in Neural Information Processing Systems, 35:20744–20757, 2022a.
  52. React: Synergizing reasoning and acting in language models. In The Eleventh International Conference on Learning Representations, 2022b.
  53. GPT-4 is too smart to be safe: Stealthy chat with llms via cipher. arXiv preprint arXiv:2308.06463, 2023.
  54. AgentTuning: Enabling generalized agent abilities for llms. arXiv preprint arXiv:2310.12823, 2023.
  55. Exploring collaboration mechanisms for llm agents: A social psychology view, 2023a.
  56. Instruction tuning for large language models: A survey, 2023b.
  57. Webarena: A realistic web environment for building autonomous agents. arXiv preprint arXiv:2307.13854, 2023.
Citations (19)

Summary

  • The paper introduces Agent-FLAN, a fine-tuning methodology that restructures agent training data and decomposes tasks to improve LLM agent capabilities.
  • It employs differentiated training speeds and negative samples to mitigate hallucinations while achieving a 3.5% performance boost in experiments.
  • The findings bridge the gap between open-sourced and API-based LLMs, paving the way for more adaptable and intelligent AI agents.

Enhancing Agent Abilities in LLMs with Agent-FLAN

Introduction to Agent-FLAN

The quest to imbue LLMs with robust agent capabilities has led to the development of Agent-FLAN, a fine-tuning methodology designed to effectively enhance LLMs' performance in agent tasks. The research stems from the observation that while open-sourced LLMs demonstrate exceptional proficiency in natural language understanding and generation, their ability to act as agents—making decisions based on environmental inputs and executing tasks—lags behind that of their API-based counterparts. Agent-FLAN (Fine-tuning LLMs for Agents) addresses this gap by refining the agent training corpus and introducing novel fine-tuning techniques tailored for agent tasks.

Key Observations and Methodology

The development of Agent-FLAN was guided by three pivotal observations, each highlighting specific challenges and opportunities in agent tuning:

  1. Entanglement of Agent Training Data: The paper found that most agent training data mixes format adherence with agent reasoning, diverging significantly from the pre-training data distribution. This misalignment complicates the learning process for LLMs, constraining their ability to acquire agent-specific skills effectively.
  2. Variable Learning Speeds: LLMs exhibit different learning velocities across various agent-related capabilities. This discrepancy suggests a need for tailored training approaches that account for the unique learning dynamics of each capability.
  3. Side-Effects of Existing Approaches: Current strategies to enhance agent abilities in LLMs often lead to unintended consequences, such as the introduction of hallucinations—misleading, inaccurate, or irrelevant outputs.

To navigate these challenges, Agent-FLAN employs a multi-faceted approach:

  • Alignment with Natural Language Domain: By restructuring agent training data to resemble natural conversations, Agent-FLAN mitigates the issue of data entanglement, facilitating more effective learning of agent abilities.
  • Decomposition and Balanced Training: The methodology breaks down agent tasks into fundamental capabilities and adjusts the training focus according to the distinct learning rates of these capabilities.
  • Mitigation of Hallucinations: Through the creation of an evaluation benchmark for hallucination and the incorporation of negative samples, Agent-FLAN significantly reduces the occurrence of hallucination in LLM outputs.

Empirical Validation and Results

Agent-FLAN's efficacy is demonstrated through a series of comprehensive experiments using the Llama2-7B model across various agent evaluation benchmarks. The approach achieved a 3.5\% improvement over previous works, showcasing its potential to significantly enhance the agent capabilities of LLMs. Additionally, Agent-FLAN was found to not only boost agent-specific abilities but also slightly improve the general capabilities of LLMs, underscoring the versatile benefits of the proposed fine-tuning methodology.

Implications and Future Directions

The success of Agent-FLAN in enhancing agent abilities of LLMs has several important implications:

  • Bridging the Gap: The methodology represents a significant step toward narrowing the performance gap between open-sourced LLMs and API-based models in agent tasks.
  • Flexible Learning: The differentiated learning strategies for various agent capabilities highlight the importance of adaptable training methods in maximizing LLMs' potential.
  • Holistic Model Improvement: The positive impact of Agent-FLAN on both agent and general capabilities of LLMs suggests a pathway for developing more universally competent models.

Looking ahead, the insights gained from Agent-FLAN pave the way for further exploration in integrating effective agent functions into LLMs. Future research may delve into more granular training data decomposition, examine the scalability of Agent-FLAN across larger model sizes, and explore its applicability to a broader range of agent tasks.

In conclusion, Agent-FLAN offers a promising avenue for fortifying the agent capabilities of LLMs, marking an important advancement in the pursuit of more intelligent and versatile AI agents.