Probabilistic Truly Unordered Rule Sets (2401.09918v1)
Abstract: Rule set learning has recently been frequently revisited because of its interpretability. Existing methods have several shortcomings though. First, most existing methods impose orders among rules, either explicitly or implicitly, which makes the models less comprehensible. Second, due to the difficulty of handling conflicts caused by overlaps (i.e., instances covered by multiple rules), existing methods often do not consider probabilistic rules. Third, learning classification rules for multi-class target is understudied, as most existing methods focus on binary classification or multi-class classification via the one-versus-rest" approach. To address these shortcomings, we propose TURS, for Truly Unordered Rule Sets. To resolve conflicts caused by overlapping rules, we propose a novel model that exploits the probabilistic properties of our rule sets, with the intuition of only allowing rules to overlap if they have similar probabilistic outputs. We next formalize the problem of learning a TURS model based on the MDL principle and develop a carefully designed heuristic algorithm. We benchmark against a wide range of rule-based methods and demonstrate that our method learns rule sets that have lower model complexity and highly competitive predictive performance. In addition, we empirically show that rules in our model are empirically
independent" and hence truly unordered.
- Associative classification approaches: review and comparison. Journal of Information & Knowledge Management, 13(03):1450027, 2014.
- Learning certifiably optimal rule lists for categorical data. arXiv preprint arXiv:1704.01701, 2017.
- Classification and regression trees. CRC press, 1984.
- A new approach to classification based on association rule mining. Decision Support Systems, 42(2):674–689, 2006.
- Rule induction with cn2: Some recent improvements. In European Working Session on Learning, pages 151–163. Springer, 1991.
- The cn2 induction algorithm. Machine learning, 3(4):261–283, 1989.
- William W Cohen. Fast effective rule induction. In Machine learning proceedings 1995, pages 115–123. Elsevier, 1995.
- Boolean decision rules via column generation. Advances in Neural Information Processing Systems, 31:4655–4665, 2018.
- Orange: Data mining toolbox in python. Journal of Machine Learning Research, 14:2349–2353, 2013.
- Uci machine learning repository, 2017. URL http://archive.ics.uci.edu/ml.
- Pyids-python implementation of interpretable decision sets algorithm by lakkaraju et al, 2016. In RuleML+ RR (Supplement), 2019.
- Generating accurate rule sets without global optimization. 1998.
- Bump hunting in high-dimensional data. Statistics and computing, 9(2):123–143, 1999.
- Roc ‘n’rule learning—towards a better understanding of covering algorithms. Machine learning, 58(1):39–77, 2005.
- Foundations of rule learning. Springer Science & Business Media, 2012.
- Esther Galbrun. The minimum description length principle for pattern mining: A survey. Data mining and knowledge discovery, 36(5):1679–1727, 2022.
- Minimum description length revisited. International journal of mathematics for industry, 11(01):1930001, 2019.
- Peter D Grünwald. The minimum description length principle. MIT press, 2007.
- The weka data mining software: an update. ACM SIGKDD explorations newsletter, 11(1):10–18, 2009.
- Adbench: Anomaly detection benchmark. Advances in Neural Information Processing Systems, 35:32142–32159, 2022.
- Optimal sparse decision trees. Advances in Neural Information Processing Systems, 32, 2019.
- Furia: an algorithm for unordered fuzzy rule induction. Data Mining and Knowledge Discovery, 19(3):293–319, 2009.
- Sotiris B Kotsiantis. Decision trees: a recent overview. Artificial Intelligence Review, 39(4):261–283, 2013.
- Interpretable decision sets: A joint framework for description and prediction. In Proceedings of the 22nd ACM SIGKDD, pages 1675–1684, 2016.
- Cmar: Accurate and efficient classification based on multiple class-association rules. In Proceedings 2001 IEEE international conference on data mining, pages 369–376. IEEE, 2001.
- Integrating classification and association rule mining. In KDD, volume 98, pages 80–86, 1998.
- Christoph Molnar. Interpretable machine learning. Lulu. com, 2020.
- Computing the multinomial stochastic complexity in sub-linear time. In PGM08, pages 209–216, 2008.
- Interpretable machine learning: definitions, methods, and applications. arXiv preprint arXiv:1901.04592, 2019.
- Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
- Hugo M Proença and Matthijs van Leeuwen. Interpretable multiclass classification by mdl-based rule lists. Information Sciences, 512:1372–1393, 2020.
- J. Ross Quinlan. Learning logical definitions from relations. Machine learning, 5:239–266, 1990.
- J Ross Quinlan. C4. 5: programs for machine learning. Elsevier, 2014.
- Jorma Rissanen. A universal prior for integers and estimation by minimum description length. The Annals of statistics, 11(2):416–431, 1983.
- Jorma J Rissanen. Fisher information and stochastic complexity. IEEE transactions on information theory, 42(1):40–47, 1996.
- Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, May 2019. ISSN 2522-5839. doi: 10.1038/s42256-019-0048-x. URL https://doi.org/10.1038/s42256-019-0048-x.
- Factorized normalized maximum likelihood criterion for learning bayesian network structures. In Proceedings of the 4th European workshop on probabilistic graphical models (PGM-08), pages 257–272. Citeseer, 2008.
- Diverse subgroup set discovery. Data Mining and Knowledge Discovery, 25(2):208–242, 2012.
- Lazy associative classification. In Sixth International Conference on Data Mining (ICDM’06), pages 645–654. IEEE, 2006.
- A bayesian framework for learning rule sets for interpretable classification. The Journal of Machine Learning Research, 18(1), 2017.
- Learning interpretable decision rule sets: A submodular optimization approach. Advances in Neural Information Processing Systems, 34, 2021.
- Scalable bayesian rule lists. In International Conference on Machine Learning, pages 3921–3930. PMLR, 2017.
- Lincen Yang and Matthijs van Leeuwen. Truly unordered probabilistic rule sets for multi-class classification. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 87–103. Springer, 2022.
- Cpar: Classification based on predictive association rules. In Proceedings of the 2003 SIAM international conference on data mining, pages 331–335. SIAM, 2003.
- Diverse rule sets. In Proceedings of the 26th ACM SIGKDD, pages 1532–1541, 2020.