Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 30 tok/s Pro
GPT-5 High 37 tok/s Pro
GPT-4o 98 tok/s Pro
Kimi K2 195 tok/s Pro
GPT OSS 120B 442 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Improve Cost Efficiency of Active Learning over Noisy Dataset (2403.01346v1)

Published 2 Mar 2024 in cs.LG

Abstract: Active learning is a learning strategy whereby the machine learning algorithm actively identifies and labels data points to optimize its learning. This strategy is particularly effective in domains where an abundance of unlabeled data exists, but the cost of labeling these data points is prohibitively expensive. In this paper, we consider cases of binary classification, where acquiring a positive instance incurs a significantly higher cost compared to that of negative instances. For example, in the financial industry, such as in money-lending businesses, a defaulted loan constitutes a positive event leading to substantial financial loss. To address this issue, we propose a shifted normal distribution sampling function that samples from a wider range than typical uncertainty sampling. Our simulation underscores that our proposed sampling function limits both noisy and positive label selection, delivering between 20% and 32% improved cost efficiency over different test datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. P. Ren, Y. Xiao, X. Chang, P.-Y. Huang, Z. Li, B. B. Gupta, X. Chen, and X. Wang, “A survey of deep active learning,” ACM computing surveys (CSUR), vol. 54, no. 9, pp. 1–40, 2021.
  2. S. Budd, E. C. Robinson, and B. Kainz, “A survey on active learning and human-in-the-loop deep learning for medical image analysis,” Medical Image Analysis, vol. 71, p. 102062, 2021.
  3. B. Settles, “Active learning literature survey,” 2009.
  4. D. Lewis and W. Gale, “A sequential algorithmfor training text classifiers,” in SIGIR’94: Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, organised by Dublin City University, 1994, pp. 3–12.
  5. G. Schohn and D. Cohn, “Less is more: Active learning with support vector machines,” in Proceedings of the Seventeenth International Conference on Machine Learning, 2000, pp. 839–846.
  6. Y. Yang, Z. Ma, F. Nie, X. Chang, and A. G. Hauptmann, “Multi-class active learning by uncertainty sampling with diversity maximization,” International Journal of Computer Vision, vol. 113, pp. 113–127, 2015.
  7. J. Zhu, H. Wang, T. Yao, and B. K. Tsou, “Active learning with sampling by uncertainty and density for word sense disambiguation and text classification,” in 22nd International Conference on Computational Linguistics, Coling 2008, 2008, pp. 1137–1144.
  8. E. Lughofer and M. Pratama, “Online active learning in data stream regression using uncertainty sampling based on evolving generalized fuzzy models,” IEEE Transactions on fuzzy systems, vol. 26, no. 1, pp. 292–309, 2017.
  9. Y. Yang and M. Loog, “Active learning using uncertainty information,” in 2016 23rd International Conference on Pattern Recognition (ICPR).   IEEE, 2016, pp. 2646–2651.
  10. G. Wang, J.-N. Hwang, C. Rose, and F. Wallace, “Uncertainty sampling based active learning with diversity constraint by sparse selection,” in 2017 IEEE 19th International Workshop on Multimedia Signal Processing (MMSP).   IEEE, 2017, pp. 1–6.
  11. S. Mussmann and P. S. Liang, “Uncertainty sampling is preconditioned stochastic gradient descent on zero-one loss,” Advances in Neural Information Processing Systems, vol. 31, 2018.
  12. A. Raj and F. Bach, “Convergence of uncertainty sampling for active learning,” in International Conference on Machine Learning.   PMLR, 2022, pp. 18 310–18 331.
  13. H. S. Seung, M. Opper, and H. Sompolinsky, “Query by committee,” in Proceedings of the fifth annual workshop on Computational learning theory, 1992, pp. 287–294.
  14. B. Settles, M. Craven, and S. Ray, “Multiple-instance active learning,” Advances in neural information processing systems, vol. 20, 2007.
  15. N. Roy and A. McCallum, “Toward optimal active learning through monte carlo estimation of error reduction,” ICML, Williamstown, vol. 2, pp. 441–448, 2001.
  16. R. Wang, C.-Y. Chow, and S. Kwong, “Ambiguity-based multiclass active learning,” IEEE Transactions on Fuzzy Systems, vol. 24, no. 1, pp. 242–248, 2015.
  17. H. Hino, “Active learning: Problem settings and recent developments,” arXiv preprint arXiv:2012.04225, 2020.
  18. A. Tharwat and W. Schenck, “A survey on active learning: State-of-the-art, practical challenges and research directions,” Mathematics, vol. 11, no. 4, p. 820, 2023.
  19. G. Hacohen, A. Dekel, and D. Weinshall, “Active learning on a budget: Opposite strategies suit high and low budgets,” in International Conference on Machine Learning.   PMLR, 2022, pp. 8175–8195.
  20. T. Younesian, Z. Zhao, A. Ghiassi, R. Birke, and L. Y. Chen, “Qactor: Active learning on noisy labels,” in Asian Conference on Machine Learning.   PMLR, 2021, pp. 548–563.
  21. H. Kaur, H. S. Pannu, and A. K. Malhi, “A systematic review on imbalanced data challenges in machine learning: Applications and solutions,” ACM Computing Surveys (CSUR), vol. 52, no. 4, pp. 1–36, 2019.
  22. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
  23. E. LeDell and S. Poirier, “H2O AutoML: Scalable automatic machine learning,” 7th ICML Workshop on Automated Machine Learning (AutoML), July 2020.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.