Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 159 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 28 tok/s Pro
GPT-5 High 26 tok/s Pro
GPT-4o 100 tok/s Pro
Kimi K2 193 tok/s Pro
GPT OSS 120B 352 tok/s Pro
Claude Sonnet 4.5 33 tok/s Pro
2000 character limit reached

Adaptable and Reliable Text Classification using Large Language Models (2405.10523v3)

Published 17 May 2024 in cs.CL

Abstract: Text classification is fundamental in NLP, and the advent of LLMs has revolutionized the field. This paper introduces an adaptable and reliable text classification paradigm, which leverages LLMs as the core component to address text classification tasks. Our system simplifies the traditional text classification workflows, reducing the need for extensive preprocessing and domain-specific expertise to deliver adaptable and reliable text classification results. We evaluated the performance of several LLMs, machine learning algorithms, and neural network-based architectures on four diverse datasets. Results demonstrate that certain LLMs surpass traditional methods in sentiment analysis, spam SMS detection, and multi-label classification. Furthermore, it is shown that the system's performance can be further enhanced through few-shot or fine-tuning strategies, making the fine-tuned model the top performer across all datasets. Source code and datasets are available in this GitHub repository: https://github.com/yeyimilk/LLM-zero-shot-classifiers.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. SMS Spam Collection. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5CC84.
  2. Anthropic, A. (2024). The claude 3 model family: Opus, sonnet, haiku. Claude-3 Model Card, .
  3. Dallmi: Domain adaption for llm-based multi-label classifier. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 277–289). Springer.
  4. Boubker, O. (2024). From chatting to self-educating: Can ai tools boost student learning outcomes? Expert Systems with Applications, 238, 121820.
  5. Language models are few-shot learners. Advances in neural information processing systems, 33, 1877–1901.
  6. Can chatgpt provide intelligent diagnoses? a comparative study between predictive models and chatgpt to define a new medical diagnostic bot. Expert Systems with Applications, 235, 121186.
  7. Large language models for text classification: From zero-shot learning to fine-tuning. Open Science Foundation, .
  8. A dirichlet process biterm-based mixture model for short text stream clustering. Applied Intelligence, 50, 1609–1619.
  9. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24, 1–113.
  10. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, .
  11. Fine-tuned generative llm oversampling can improve performance over traditional techniques on multiclass imbalanced text classification. In 2023 IEEE International Conference on Big Data (BigData) (pp. 5181–5186). doi:10.1109/BigData59044.2023.10386772.
  12. Exploring chatgpt’s code refactoring capabilities: An empirical study. Expert Systems with Applications, 249, 123602.
  13. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, .
  14. Gautam (2019). E commerce text dataset. URL: https://doi.org/10.5281/zenodo.3355823. doi:10.5281/zenodo.3355823.
  15. Large-scale bayesian logistic regression for text categorization. technometrics, 49, 291–304.
  16. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, .
  17. Knn model-based approach in classification. In On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE: OTM Confederated International Conferences, CoopIS, DOA, and ODBASE 2003, Catania, Sicily, Italy, November 3-7, 2003. Proceedings (pp. 986–996). Springer.
  18. Effective text classification using bert, mtm lstm, and dt. Data & Knowledge Engineering, (p. 102306).
  19. Joachims, T. (2002). Learning to classify text using support vector machines volume 668. Springer Science & Business Media.
  20. Kim, Y. (2014). Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882, .
  21. Text classification algorithms: A survey. Information, 10, 150.
  22. Deep learning. nature, 521, 436–444.
  23. Liu, B. (2022). Sentiment analysis and opinion mining. Springer Nature.
  24. Making llms worth every penny: Resource-limited text classification in banking. In Proceedings of the Fourth ACM International Conference on AI in Finance (pp. 392–400).
  25. Good debt or bad debt: Detecting semantic orientations in economic texts. Journal of the Association for Information Science and Technology, 65, 782–796.
  26. Deep learning–based text classification: a comprehensive review. ACM computing surveys (CSUR), 54, 1–40.
  27. The parrot dilemma: Human-labeled vs. llm-augmented data in classification tasks. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 179–192).
  28. Language model-guided student performance prediction with multimodal auxiliary information. Expert Systems with Applications, (p. 123960).
  29. Rwkv: Reinventing rnns for the transformer era. arXiv preprint arXiv:2305.13048, .
  30. Preda, G. (2020). Covid19 tweets. URL: https://www.kaggle.com/dsv/1451513. doi:10.34740/KAGGLE/DSV/1451513.
  31. A comparative study of cross-lingual sentiment analysis. Expert Systems with Applications, 247, 123247.
  32. Dcr-net: A deep co-interactive relation network for joint dialog act recognition and sentiment classification. In Proceedings of the AAAI conference on artificial intelligence (pp. 8665–8672). volume 34.
  33. Quinlan, J. R. (2014). C4. 5: programs for machine learning. Elsevier.
  34. Rabiner, L. R. (1989). A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77, 257–286.
  35. Improving language understanding by generative pre-training, .
  36. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21, 1–67.
  37. Nbias: A natural language processing framework for bias identification in text. Expert Systems with Applications, 237, 121542.
  38. Sarker, I. (2021). Machine learning: algorithms, real-world applications and research directions. sn comput sci 2: 160.
  39. Shortliffe, E. (2012). Computer-based medical consultations: MYCIN volume 2. Elsevier.
  40. Generating text with recurrent neural networks. In Proceedings of the 28th international conference on machine learning (ICML-11) (pp. 1017–1024).
  41. Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27.
  42. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, .
  43. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, .
  44. A survey of zero-shot learning: Settings, methods, and applications. ACM Transactions on Intelligent Systems and Technology (TIST), 10, 1–37.
  45. Empirical study of llm fine-tuning for text classification in legal document review. In 2023 IEEE International Conference on Big Data (BigData) (pp. 2786–2792). doi:10.1109/BigData59044.2023.10386911.
  46. Principal component analysis. Chemometrics and intelligent laboratory systems, 2, 37–52.
  47. Xu, S. (2018). Bayesian naïve bayes classifiers to text classification. Journal of Information Science, 44, 48–59.
  48. Cdgan-bert: Adversarial constraint and diversity discriminator for semi-supervised text classification. Knowledge-Based Systems, 284, 111291.
  49. Clcrnet: An optical character recognition network for cigarette laser code. IEEE Transactions on Instrumentation and Measurement, .

Summary

  • The paper introduces a Smart Expert System that integrates LLMs to achieve adaptable and reliable text classification.
  • It employs zero-shot and few-shot learning techniques combined with fine-tuning for superior model performance across diverse datasets.
  • Experimental evaluations demonstrate significant improvements in accuracy and F1 scores compared to traditional ML and DL methods.

Adaptable and Reliable Text Classification using LLMs

The paper "Adaptable and Reliable Text Classification using LLMs" introduces a novel text classification approach leveraging LLMs. LLMs have significantly reshaped the field of NLP through their advanced capabilities in language comprehension and generation. This paper encapsulates the implementation of the Smart Expert System that utilizes LLMs to revamp traditional text classification workflows, presenting performance analyses across multiple datasets.

Introduction to LLMs and Text Classification

Text classification has always been integral to NLP applications. Traditional methods, relying heavily on ML and deep learning (DL), are often resource-intensive, demanding substantial labeled data and careful configuration. These traditional architectures entail elaborate preprocessing, feature extraction, and dimensionality reductions that necessitate significant domain expertise.

In contrast, LLMs such as GPT and LLaMA, based on Transformer architectures, boast hundreds of billions of parameters pre-trained on extensive textual corpora. These models can natively process text classification tasks via zero-shot or few-shot learning paradigms, thereby alleviating the cumbersome preprocessing burden typical to traditional methodologies. The Smart Expert System proposed in the paper simplifies the traditional workflow, incorporating LLMs for robust and efficient text classification. Figure 1

Figure 1: Traditional text classification flow, illustrating the complexity of preprocessing and feature selection.

The introduction of zero-shot text classification allows practitioners to bypass data preprocessing, simply requiring data input for LLMs and directly obtaining classification results. This straightforward approach is particularly beneficial for smaller enterprises that lack extensive ML capabilities. Figure 2

Figure 2: LLMs' zero-shot text classification simple flow, emphasizing the reduced complexity compared to traditional methods.

Methodology: The Smart Expert System Framework

The Smart Expert System outlined in the paper employs LLMs within a structured framework. This system comprises several pivotal components:

  1. Data Aggregation: Collection of domain-specific data to formulate a comprehensive database.
  2. LLM Integration: Utilizing pre-trained LLMs, such as GPT-4 or LLaMA, followed by fine-tuning processes or few-shot learning with minimal domain-specific data.
  3. Prompt Optimization: Optionally involving domain experts to refine LLM prompts, enhancing model performance for nuanced tasks.
  4. Performance Evaluation: Constantly monitoring model accuracy through the newly introduced Uncertainty/Error Rate (U/E rate), offering insight into model performance under uncertain conditions. Figure 3

    Figure 3: Framework of the Expert System, detailing each stage from data collection to user queries and interactions.

This framework significantly reduces the need for expert-driven preprocessing, offering a more adaptable text classification method that can swiftly respond to user queries through an integrated interface.

Experimental Evaluation

The paper evaluates various LLMs and traditional models across multiple datasets, including COVID-19-related tweets, e-commerce product texts, economic sentiments, and SMS spam detection. Experimental results indicate that fine-tuned LLMs exhibit superior performance compared to traditional ML and DL methods, achieving higher accuracy and F1 scores.

  • COVID-19-related Tweets: The fine-tuned Qwen-7B model attained substantial performance improvements over both NN architectures and conventional ML models.
  • E-commerce Product Texts: Showing exceptional accuracy, with Qwen-7B leading post-fine-tuning.
  • Economic Texts: LLMs demonstrated robustness in handling complex financial language, with fine-tuning strategies enhancing accuracy.
  • SMS Spam Collection: Qwen-7B showcased near-perfect classification after fine-tuning, surpassing established NN models.

Across these diverse datasets, LLMs highlighted their forefront capabilities in efficiently managing zero-shot and few-shot learning scenarios.

Discussion

Few-shot Learning and Fine-tuning

Few-shot strategies offer mixed results based on the model and dataset, sometimes yielding marginal enhancements. However, fine-tuning unequivocally bolsters performance, regardless of dataset complexity, optimizing LLMs for specific domains.

Limitations

The paper identifies several limitations: inconsistent output formats from LLMs, constraints on content classification due to some models declining to classify sensitive or complex content, and operational challenges in hardware and costs associated with deploying LLMs at scale. Addressing these would further elevate LLMs' utility in practical applications.

Conclusion and Future Directions

The adoption of LLMs for text classification represents a significant advancement, simplifying workflows while providing scalable, accurate, and resource-efficient solutions. Future work aims to enhance the adaptability of LLMs by refining them through enriched background context and streamlined prompts. Addressing existing constraints around standardized output and classification barriers will be paramount in broadening LLM usability across diverse sectors. Integrating these improvements could foster broader acceptance of LLMs as ubiquitous tools in text classification endeavors, democratizing advanced NLP technologies.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 0 likes.

Upgrade to Pro to view all of the tweets about this paper: