Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Issue Report Validation in an Industrial Context (2311.17662v1)

Published 29 Nov 2023 in cs.SE

Abstract: Effective issue triaging is crucial for software development teams to improve software quality, and thus customer satisfaction. Validating issue reports manually can be time-consuming, hindering the overall efficiency of the triaging process. This paper presents an approach on automating the validation of issue reports to accelerate the issue triaging process in an industrial set-up. We work on 1,200 randomly selected issue reports in banking domain, written in Turkish, an agglutinative language, meaning that new words can be formed with linear concatenation of suffixes to express entire sentences. We manually label these reports for validity, and extract the relevant patterns indicating that they are invalid. Since the issue reports we work on are written in an agglutinative language, we use morphological analysis to extract the features. Using the proposed feature extractors, we utilize a machine learning based approach to predict the issue reports' validity, performing a 0.77 F1-score.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Ahmet Afsin Akın and Mehmet Dündar Akın. 2007. Zemberek, an open source NLP framework for Turkic languages. Structure 10, 2007 (2007), 1–5.
  2. Ethem Utku Aktas and Cemal Yilmaz. 2020. Automated issue assignment: results and insights from an industrial case. Empirical Software Engineering 25, 5 (2020), 3544–3589.
  3. Ethem Utku Aktas and Cemal Yilmaz. 2022. Using Screenshot Attachments in Issue Reports for Triaging. Empirical Software Engineering 27, 7 (2022), 181.
  4. Is it a bug or an enhancement? A text-based approach to classify change requests. In Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds. 304–318.
  5. Shikhar Bharadwaj and Tushar Kadam. 2022. Github issue classification using bert-style models. In 2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE). IEEE, 40–43.
  6. Detecting missing information in bug descriptions. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. 396–407.
  7. Issue report classification using pre-trained language models. In 2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE). IEEE, 29–32.
  8. Cagri Cöltekin. 2010. A Freely Available Morphological Analyzer for Turkish.. In LREC, Vol. 2. 19–28.
  9. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
  10. Geoff Dougherty. 2012. Pattern recognition and classification: an introduction. Springer Science & Business Media.
  11. Deep learning based valid bug reports determination and explanation. In 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE). IEEE, 184–194.
  12. On the feasibility of automated prediction of bug and non-bug issues. Empirical Software Engineering 25 (2020), 5333–5369.
  13. It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In 2013 35th international conference on software engineering (ICSE). IEEE, 392–401.
  14. Maliheh Izadi. 2022. Catiss: An intelligent tool for categorizing issues reports using transformers. In 2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE). IEEE, 44–47.
  15. Predicting the objective and priority of issue reports in software repositories. Empirical Software Engineering 27, 2 (2022), 50.
  16. Thorsten Joachims. 2005. Text categorization with support vector machines: Learning with many relevant features. In Machine Learning: ECML-98: 10th European Conference on Machine Learning Chemnitz, Germany, April 21–23, 1998 Proceedings. Springer, 137–142.
  17. Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016).
  18. Nlbse’22 tool competition. In 2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE). IEEE, 25–28.
  19. Ticket tagger: Machine learning driven issue classification. In 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 406–409.
  20. Predicting issue types on GitHub. Science of Computer Programming 205 (2021), 102598.
  21. Kemal Oflazer. 1994. Two-level description of Turkish morphology. Literary and linguistic computing 9, 2 (1994), 137–148.
  22. Kemal Oflazer. 2014. Turkish and its challenges for language processing. Language resources and evaluation 48 (2014), 639–653.
  23. Automated classification of software bug reports. In proceedings of the 9th international conference on information communication and management. 17–21.
  24. Automated classification of software issue reports using machine learning techniques: an empirical study. Innovations in Systems and Software Engineering 13 (2017), 279–297.
  25. Scikit-learn: Machine learning in Python. the Journal of machine Learning research 12 (2011), 2825–2830.
  26. Bug or not bug? That is the question. In 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC). IEEE, 47–58.
  27. Classifying bug reports to bugs and other requests using topic modeling. In 2013 20Th asia-pacific software engineering conference (APSEC), Vol. 2. IEEE, 13–18.
  28. Hanmin Qin and Xin Sun. 2018. Classifying bug reports into bugs and non-bugs using LSTM. In Proceedings of the 10th Asia-Pacific Symposium on Internetware. 1–4.
  29. Introduction to information retrieval. Vol. 39. Cambridge University Press Cambridge.
  30. Mohammed Latif Siddiq and Joanna CS Santos. 2022. Bert-based github issue report classification. In 2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE). IEEE, 33–36.
  31. Yang Song and Oscar Chaparro. 2020. BEE: a tool for structuring and analyzing bug reports. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1551–1555.
  32. Bug or not? bug report classification using n-gram idf. In 2017 IEEE international conference on software maintenance and evolution (ICSME). IEEE, 534–538.
  33. Alexander Trautsch and Steffen Herbold. 2022. Predicting issue types with sebert. In 2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE). IEEE, 37–39.
  34. MULA: A just-in-time multi-labeling system for issue reports. IEEE Transactions on Reliability 71, 1 (2021), 250–263.
  35. Combining text mining and data mining for bug report classification. Journal of Software: Evolution and Process 28, 3 (2016), 150–176.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Ethem Utku Aktas (5 papers)
  2. Ebru Cakmak (1 paper)
  3. Mete Cihad Inan (1 paper)
  4. Cemal Yilmaz (6 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.