Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 44 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 13 tok/s Pro
GPT-5 High 15 tok/s Pro
GPT-4o 86 tok/s Pro
Kimi K2 208 tok/s Pro
GPT OSS 120B 447 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

Towards Probing Contact Center Large Language Models (2312.15922v1)

Published 26 Dec 2023 in cs.CL

Abstract: Fine-tuning LLMs with domain-specific instructions has emerged as an effective method to enhance their domain-specific understanding. Yet, there is limited work that examines the core characteristics acquired during this process. In this study, we benchmark the fundamental characteristics learned by contact-center (CC) specific instruction fine-tuned LLMs with out-of-the-box (OOB) LLMs via probing tasks encompassing conversational, channel, and automatic speech recognition (ASR) properties. We explore different LLM architectures (Flan-T5 and Llama), sizes (3B, 7B, 11B, 13B), and fine-tuning paradigms (full fine-tuning vs PEFT). Our findings reveal remarkable effectiveness of CC-LLMs on the in-domain downstream tasks, with improvement in response acceptability by over 48% compared to OOB-LLMs. Additionally, we compare the performance of OOB-LLMs and CC-LLMs on the widely used SentEval dataset, and assess their capabilities in terms of surface, syntactic, and semantic information through probing tasks. Intriguingly, we note a relatively consistent performance of probing classifiers on the set of probing tasks. Our observations indicate that CC-LLMs, while outperforming their out-of-the-box counterparts, exhibit a tendency to rely less on encoding surface, syntactic, and semantic properties, highlighting the intricate interplay between domain-specific adaptation and probing task performance opening up opportunities to explore behavior of fine-tuned LLMs in specialized contexts.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. Guillaume Alain and Yoshua Bengio. 2017. Understanding intermediate layers using linear classifier probes. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Workshop Track Proceedings. OpenReview.net.
  2. Afra Amini and Massimiliano Ciaramita. 2023. Probing in context: Toward building robust classifiers via probing large language models. CoRR, abs/2305.14171.
  3. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
  4. What does BERT look at? an analysis of bert’s attention. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP@ACL 2019, Florence, Italy, August 1, 2019, pages 276–286. Association for Computational Linguistics.
  5. Alexis Conneau and Douwe Kiela. 2018. Senteval: An evaluation toolkit for universal sentence representations. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018. European Language Resources Association (ELRA).
  6. What you can cram into a single \$&!#* vector: Probing sentence embeddings for linguistic properties. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers, pages 2126–2136. Association for Computational Linguistics.
  7. Not all models localize linguistic knowledge in the same place: A layer-wise probing on bertoids’ representations. In Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP@EMNLP 2021, Punta Cana, Dominican Republic, November 11, 2021, pages 375–388. Association for Computational Linguistics.
  8. Lora: Low-rank adaptation of large language models. CoRR, abs/2106.09685.
  9. Probing biomedical embeddings from language models. NAACL HLT 2019, page 82.
  10. Revealing the dark secrets of BERT. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 4364–4373. Association for Computational Linguistics.
  11. What BERT based language model learns in spoken transcripts: An empirical study. In Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP@EMNLP 2021, Punta Cana, Dominican Republic, November 11, 2021, pages 322–336. Association for Computational Linguistics.
  12. Starcoder: may the source be with you! CoRR, abs/2305.06161.
  13. Open sesame: Getting inside bert’s linguistic knowledge. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP@ACL 2019, Florence, Italy, August 1, 2019, pages 241–253. Association for Computational Linguistics.
  14. The flan collection: Designing data and methods for effective instruction tuning. In International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pages 22631–22648. PMLR.
  15. Biogpt: generative pre-trained transformer for biomedical text generation and mining. Briefings Bioinform., 23(6).
  16. OpenAI. 2023. GPT-4 technical report. CoRR, abs/2303.08774.
  17. Code llama: Open foundation models for code. CoRR, abs/2308.12950.
  18. Large language models encode clinical knowledge. Nature, pages 1–9.
  19. Galactica: A large language model for science. CoRR, abs/2211.09085.
  20. BERT rediscovers the classical NLP pipeline. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pages 4593–4601. Association for Computational Linguistics.
  21. Probing language models for understanding of temporal expressions. In Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP@EMNLP 2021, Punta Cana, Dominican Republic, November 11, 2021, pages 396–406. Association for Computational Linguistics.
  22. Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288.
  23. Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, pages 8696–8708. Association for Computational Linguistics.
  24. Finetuned language models are zero-shot learners. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
  25. Bloomberggpt: A large language model for finance. CoRR, abs/2303.17564.

Summary

  • The paper demonstrates that fine-tuning LLMs with contact center-specific data, including LoRA, significantly enhances task performance.
  • Experiments show a 48% increase in response acceptability and evidence that core linguistic abilities are largely preserved post fine-tuning.
  • A comparative analysis of model architectures highlights shifts in encoding surface-level versus deep semantic properties, guiding future domain adaptations.

Towards Probing Contact Center LLMs

Introduction

The evolution of LLMs has reached a stage where fine-tuning on domain-specific data has become a focal point of investigation. The paper "Towards Probing Contact Center LLMs" addresses the unique challenges and properties associated with fine-tuning LLMs for the contact center (CC) industry. This industry, characterized by domain-specific jargon, conversational dynamics, and etiquette, presents a fertile ground for deploying specialized LLMs to enhance customer interactions.

Methodology

The authors evaluate the impact of instruction fine-tuning LLMs using data specific to contact center interactions. Models such as Flan-T5 and Llama, with parameter counts varying from 3B to 13B, undergo fine-tuning using both traditional full fine-tuning and parameter-efficient techniques like Low-Rank Adaptation (LoRA). This paper applies several probing tasks to assess the LLMs' understanding of conversational and automatic speech recognition (ASR) properties.

Key Findings

The results underscore a substantial improvement in task performance of CC-specific LLMs (CC-LLMs) over their out-of-the-box (OOB) counterparts on downstream CC tasks. This enhancement is quantitatively reflected in a 48% increase in response acceptability. Figure 1

Figure 1: Benchmarking quality of responses generated by CC LLMs versus OOB LLMs on downstream tasks in the contact-center domain.

Probationary assessments using the SentEval suite reveal a persistent encoding of linguistic properties in CC-LLMs, dispelling concerns that domain fine-tuning might dilute a model's linguistic abilities. However, CC-LLMs tend less to encode surface-level or semantic attributes, suggesting a shift in learned properties due to domain specificity.

Experiment Design

The paper designs experiments to dissect the contributions of model architectures, sizes, and fine-tuning paradigms to the performance of the LLMs. The authors juxtapose OOB LLMs and finely tuned CC-LLMs across a variety of probing tasks, ensuring a comprehensive performance evaluation using metrics such as the Macro F1 score.

Implementation Details

The implementation leverages extensive datasets comprising ASR transcripts from diverse sectors, integrating natural language instructions. The linear Multi-Layer Perceptron (MLP) probing mechanism extracts hidden state representations to evaluate linguistic knowledge embedding. This approach employs an AWS computational environment for resource-intensive model processing.

Discussion

This paper opens perspectives into the strategic design of domain-specific LMs in the CC sector. Despite the consistent linguistic capacity retained post fine-tuning, the inclination of CC-LLMs towards encoding unique CC properties hints at potential avenues to refine domain-centric adaptability further. The comparative analysis between encoder-only and decoder-only architectures provides foundational insights into architecture-specific performance on conversational tasks.

Conclusion

Probing contact center LLMs uncovers not only improvements in task-aligned performance but also introspective facets into the linguistic representations embedded within such tailored models. The research reveals that while fine-tuning enhances CC-LLM performance, understanding the subtleties of the learned properties remains crucial for future advancements. As AI continues to integrate more deeply with specialized industries, these insights will guide future explorations into the scalable application of LLMs in specialized domains, including but not limited to the contact center terrain.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.