Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 49 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 19 tok/s Pro
GPT-5 High 16 tok/s Pro
GPT-4o 103 tok/s Pro
Kimi K2 172 tok/s Pro
GPT OSS 120B 472 tok/s Pro
Claude Sonnet 4 39 tok/s Pro
2000 character limit reached

Efficient Toxic Content Detection by Bootstrapping and Distilling Large Language Models (2312.08303v1)

Published 13 Dec 2023 in cs.CL and cs.AI

Abstract: Toxic content detection is crucial for online services to remove inappropriate content that violates community standards. To automate the detection process, prior works have proposed varieties of ML approaches to train LLMs (LMs) for toxic content detection. However, both their accuracy and transferability across datasets are limited. Recently, LLMs have shown promise in toxic content detection due to their superior zero-shot and few-shot in-context learning ability as well as broad transferability on ML tasks. However, efficiently designing prompts for LLMs remains challenging. Moreover, the high run-time cost of LLMs may hinder their deployments in production. To address these challenges, in this work, we propose BD-LLM, a novel and efficient approach to Bootstrapping and Distilling LLMs for toxic content detection. Specifically, we design a novel prompting method named Decision-Tree-of-Thought (DToT) to bootstrap LLMs' detection performance and extract high-quality rationales. DToT can automatically select more fine-grained context to re-prompt LLMs when their responses lack confidence. Additionally, we use the rationales extracted via DToT to fine-tune student LMs. Our experimental results on various datasets demonstrate that DToT can improve the accuracy of LLMs by up to 4.6%. Furthermore, student LMs fine-tuned with rationales extracted via DToT outperform baselines on all datasets with up to 16.9\% accuracy improvement, while being more than 60x smaller than conventional LLMs. Finally, we observe that student LMs fine-tuned with rationales exhibit better cross-dataset transferability.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Falcon-40B: an open large language model with state-of-the-art performance.
  2. Hatebert: Retraining bert for abusive language detection in english. arXiv preprint arXiv:2010.12472.
  3. ZARA: Improving Few-Shot Self-Rationalization for Small Language Models. arXiv preprint arXiv:2305.07355.
  4. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
  5. Toxigen: A large-scale machine-generated dataset for adversarial and implicit hate speech detection. arXiv preprint arXiv:2203.09509.
  6. Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes. arXiv preprint arXiv:2305.02301.
  7. Generalizable implicit hate speech detection using contrastive learning. In Proceedings of the 29th International Conference on Computational Linguistics, 6667–6679.
  8. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35: 22199–22213.
  9. What Makes Good In-Context Examples for GPT-3333? arXiv preprint arXiv:2101.06804.
  10. SAIL: Search-Augmented Instruction Learning. arXiv preprint arXiv:2305.15225.
  11. Teaching small language models to reason. arXiv preprint arXiv:2212.08410.
  12. OpenAI. 2023. ChatGPT: get instant answers, find creative inspiration, and learn something new.
  13. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
  14. Social bias frames: Reasoning about social and power implications of language. arXiv preprint arXiv:1911.03891.
  15. Recitation-augmented language models. arXiv preprint arXiv:2210.01296.
  16. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  17. Learning from the worst: Dynamically generated datasets to improve online hate detection. arXiv preprint arXiv:2012.15761.
  18. Pinto: Faithful language reasoning using prompt-generated rationales. arXiv preprint arXiv:2211.01562.
  19. SCOTT: Self-consistent chain-of-thought distillation. arXiv preprint arXiv:2305.01879.
  20. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171.
  21. Large language models are implicitly topic models: Explaining and finding good demonstrations for in-context learning. arXiv preprint arXiv:2301.11916.
  22. Toxicity detection with generative prompt-based inference. arXiv preprint arXiv:2205.12390.
  23. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35: 24824–24837.
  24. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601.
  25. NoisyHate: Benchmarking Content Moderation Machine Learning Models with Human-Written Perturbations Online. arXiv preprint arXiv:2303.10430.
  26. Interpretable unified language checking. arXiv preprint arXiv:2304.03728.
  27. A survey of large language models. arXiv preprint arXiv:2303.18223.
  28. Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. arXiv:2306.05685.
Citations (9)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube