Emergent Mind

TART: A plug-and-play Transformer module for task-agnostic reasoning

(2306.07536)
Published Jun 13, 2023 in cs.LG , cs.AI , and cs.CL

Abstract

LLMs exhibit in-context learning abilities which enable the same model to perform several tasks without any task-specific training. In contrast, traditional adaptation approaches, such as fine-tuning, modify the underlying models for each specific task. In-context learning, however, consistently underperforms task-specific tuning approaches even when presented with the same examples. While most existing approaches (e.g., prompt engineering) focus on the LLM's learned representations to patch this performance gap, our analysis actually reveal that LLM representations contain sufficient information to make good predictions. As such, we focus on the LLM's reasoning abilities and demonstrate that this performance gap exists due to their inability to perform simple probabilistic reasoning tasks. This raises an intriguing question: Are LLMs actually capable of learning how to reason in a task-agnostic manner? We answer this in the affirmative and propose TART which generically improves an LLM's reasoning abilities using a synthetically trained Transformer-based reasoning module. TART trains this reasoning module in a task-agnostic manner using only synthetic logistic regression tasks and composes it with an arbitrary real-world pre-trained model without any additional training. With a single inference module, TART improves performance across different model families (GPT-Neo, Pythia, BLOOM), model sizes (100M - 6B), tasks (14 NLP binary classification tasks), and even across different modalities (audio and vision). Additionally, on the RAFT Benchmark, TART improves GPT-Neo (125M)'s performance such that it outperforms BLOOM (176B), and is within 4% of GPT-3 (175B). Our code and models are available at https://github.com/HazyResearch/TART .

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a detailed summary of this paper with a premium account.

We ran into a problem analyzing this paper.

Subscribe by Email

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

References
  1. Large Language Models are Few-Shot Clinical Information Extractors
  2. “RAFT: A Real-World Few-Shot Text Classification Benchmark” In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021
  3. Tiago A. Almeida, Jose Maria Gomez Hidalgo and Akebo Yamakami “Contributions to the Study of SMS Spam Filtering: New Collection and Results” In Proceedings of the 2011 ACM Symposium on Document Engineering (DOCENG’11), 2011
  4. “Ask Me Anything: A simple strategy for prompting language models” In ICLR 2023, 2023
  5. Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
  6. “GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow” Zenodo, 2021
  7. On the Opportunities and Risks of Foundation Models
  8. Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification
  9. “Language models are few-shot learners” In Advances in neural information processing systems 33, 2020, pp. 1877–1901
  10. Active Prompting with Chain-of-Thought for Large Language Models
  11. “What can transformers learn in-context? a case study of simple function classes” In Advances in Neural Information Processing Systems 35, 2022, pp. 30583–30598
  12. “Parameter-efficient transfer learning for NLP” In International Conference on Machine Learning, 2019, pp. 2790–2799 PMLR
  13. “LoRA: Low-Rank Adaptation of Large Language Models” In International Conference on Learning Representations, 2022
  14. Chip Huyen “Prompting vs. Finetuning vs. Alternatives”, 2023
  15. ChatGPT: Jack of all trades, master of none
  16. “Large Language Models are Zero-Shot Reasoners” In ICML 2022 Workshop on Knowledge Retrieval and Language Models, 2022
  17. Alex Krizhevsky “Learning multiple layers of features from tiny images”, 2009
  18. Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution
  19. Yann LeCun, Corinna Cortes and CJ Burges “MNIST handwritten digit database” In ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist 2, 2010

  20. Brian Lester, Rami Al-Rfou and Noah Constant “The Power of Scale for Parameter-Efficient Prompt Tuning” In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021, pp. 3045–3059
  21. Xiang Lisa Li and Percy Liang “Prefix-Tuning: Optimizing Continuous Prompts for Generation” In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021, pp. 4582–4597
  22. Holistic Evaluation of Language Models
  23. “Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning” In Advances in Neural Information Processing Systems 35, 2022, pp. 1950–1965
  24. “What Makes Good In-Context Examples for GPT-3?” In Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, 2022, pp. 100–114
  25. “P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks” In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics Dublin, Ireland: Association for Computational Linguistics, 2022
  26. “Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity” In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022, pp. 8086–8098
  27. “Learning Word Vectors for Sentiment Analysis” In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies Portland, Oregon, USA: Association for Computational Linguistics, 2011, pp. 142–150
  28. “Can Foundation Models Wrangle Your Data?” In Proc. VLDB Endow. 16.4 VLDB Endowment, 2022
  29. Transformers learn in-context by gradient descent
  30. Bo Pang, Lillian Lee and Shivakumar Vaithyanathan “Thumbs Up? Sentiment Classification Using Machine Learning Techniques” In Proceedings of EMNLP, 2002, pp. 79–86
  31. Hyena Hierarchy: Towards Larger Convolutional Language Models
  32. “Probability theory: The logic of science” Cambridge university press, 2003
  33. “Robust Speech Recognition via Large-Scale Weak Supervision” arXiv, 2022
  34. “Improving language understanding by generative pre-training” In , 2018
  35. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
  36. “Understanding machine learning: From theory to algorithms” Cambridge university press, 2014
  37. “Recursive deep models for semantic compositionality over a sentiment treebank” In Proceedings of the 2013 conference on empirical methods in natural language processing, 2013, pp. 1631–1642
  38. “GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model”, https://github.com/kingoflolz/mesh-transformer-jax, 2021

  39. Rationale-Augmented Ensembles in Language Models
  40. Self-Consistency Improves Chain of Thought Reasoning in Language Models
  41. P. Warden “Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition” In ArXiv e-prints, 2018
  42. “Emergent Abilities of Large Language Models” Survey Certification In Transactions on Machine Learning Research, 2022
  43. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
  44. Larger language models do in-context learning differently
  45. “Visual Transformers: Token-based Image Representation and Processing for Computer Vision”, 2020
  46. An Explanation of In-context Learning as Implicit Bayesian Inference
  47. “STaR: Bootstrapping Reasoning With Reasoning” In Advances in Neural Information Processing Systems, 2022
  48. “WRENCH: A Comprehensive Benchmark for Weak Supervision” In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2021
  49. LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
  50. Xiang Zhang, Junbo Zhao and Yann LeCun “Character-level convolutional networks for text classification” In Advances in neural information processing systems 28, 2015

Show All 50