Investigating Pre-trained Language Models on Cross-Domain Datasets, a Step Closer to General AI (2306.12205v1)
Abstract: Pre-trained LLMs have recently emerged as a powerful tool for fine-tuning a variety of language tasks. Ideally, when models are pre-trained on large amount of data, they are expected to gain implicit knowledge. In this paper, we investigate the ability of pre-trained LLMs to generalize to different non-language tasks. In particular, we test them on tasks from different domains such as computer vision, reasoning on hierarchical data, and protein fold prediction. The four pre-trained models that we used, T5, BART, BERT, and GPT-2 achieve outstanding results. They all have similar performance and they outperform transformers that are trained from scratch by a large margin. For instance, pre-trained LLMs perform better on the Listops dataset, with an average accuracy of 58.7\%, compared to transformers trained from scratch, which have an average accuracy of 29.0\%. The significant improvement demonstrated across three types of datasets suggests that pre-training on language helps the models to acquire general knowledge, bringing us a step closer to general AI. We also showed that reducing the number of parameters in pre-trained LLMs does not have a great impact as the performance drops slightly when using T5-Small instead of T5-Base. In fact, when using only 2\% of the parameters, we achieved a great improvement compared to training from scratch. Finally, in contrast to prior work, we find out that using pre-trained embeddings for the input layer is necessary to achieve the desired results.
- Dong, Linhao, Shuang Xu, and Bo Xu. “Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition.” 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018.
- Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. “Improving Language Understanding by Generative Pre-training”.
- Lu, K., Grover, A., Abbeel, P., & Mordatch, I. (2021). “Pretrained transformers as universal computation engines.” arXiv preprint arXiv:2103.05247.
- Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009, June). “Imagenet: A large-scale hierarchical image database.” In 2009 IEEE conference on computer vision and pattern recognition (pp. 248-255). Ieee.
- Tida, Vijay Srinivas, and Sonya Hsu. “Universal Spam Detection using Transfer Learning of BERT Model.” arXiv preprint arXiv:2202.03480 (2022).
- Azzouza, Noureddine, Karima Akli-Astouati, and Roliana Ibrahim. “Twitterbert: Framework for twitter sentiment analysis based on pre-trained language model representations.” International Conference of Reliable Information and Communication Technology. Springer, Cham, 2019.
- Nogueira, R., Jiang, Z., & Lin, J. (2021). “Investigating the limitations of transformers with simple arithmetic tasks.” arXiv preprint arXiv:2102.13019.
- Hu, Ronghang, and Amanpreet Singh. “Unit: Multimodal multitask learning with a unified transformer.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
- Kaiser, L., Gomez, A. N., Shazeer, N., Vaswani, A., Parmar, N., Jones, L., & Uszkoreit, J. (2017). “One model to learn them all.” arXiv preprint arXiv:1706.05137.
- Nangia, Nikita, and Samuel R. Bowman. “Listops: A diagnostic dataset for latent tree learning.” arXiv preprint arXiv:1804.06028 (2018).
- Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009.
- Beltagy, Iz, Matthew E. Peters, and Arman Cohan. “Longformer: The long-document transformer.” arXiv preprint arXiv:2004.05150 (2020).
- Wang, S., Li, B. Z., Khabsa, M., Fang, H., & Ma, H. (2020). “Linformer: Self-attention with linear complexity.” arXiv preprint arXiv:2006.04768.
- Kitaev, Nikita, Łukasz Kaiser, and Anselm Levskaya. “Reformer: The efficient transformer.” arXiv preprint arXiv:2001.04451 (2020).
- Child, R., Gray, S., Radford, A., & Sutskever, I. (2019). “Generating long sequences with sparse transformers.” arXiv preprint arXiv:1904.10509.
- Rebuffi, Sylvestre-Alvise, Hakan Bilen, and Andrea Vedaldi. “Learning multiple visual domains with residual adapters.” Advances in neural information processing systems 30 (2017).
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.