Learning Interpretable Rules for Scalable Data Representation and Classification (2310.14336v3)
Abstract: Rule-based models, e.g., decision trees, are widely used in scenarios demanding high model interpretability for their transparent inner structures and good model expressivity. However, rule-based models are hard to optimize, especially on large data sets, due to their discrete parameters and structures. Ensemble methods and fuzzy/soft rules are commonly used to improve performance, but they sacrifice the model interpretability. To obtain both good scalability and interpretability, we propose a new classifier, named Rule-based Representation Learner (RRL), that automatically learns interpretable non-fuzzy rules for data representation and classification. To train the non-differentiable RRL effectively, we project it to a continuous space and propose a novel training method, called Gradient Grafting, that can directly optimize the discrete model using gradient descent. A novel design of logical activation functions is also devised to increase the scalability of RRL and enable it to discretize the continuous features end-to-end. Exhaustive experiments on ten small and four large data sets show that RRL outperforms the competitive interpretable approaches and can be easily adjusted to obtain a trade-off between classification accuracy and model complexity for different scenarios. Our code is available at: https://github.com/12wang3/rrl.
- F. Doshi-Velez and B. Kim, “Towards a rigorous science of interpretable machine learning,” arXiv preprint arXiv:1702.08608, 2017.
- Z. C. Lipton, “The mythos of model interpretability,” Commun. ACM, vol. 61, no. 10, pp. 36–43, 2018.
- L. Chu, X. Hu, J. Hu, L. Wang, and J. Pei, “Exact and consistent interpretation for piecewise linear neural networks: A closed form solution,” in SIGKDD, 2018, pp. 1244–1253.
- W. J. Murdoch, C. Singh, K. Kumbier, R. Abbasi-Asl, and B. Yu, “Interpretable machine learning: definitions, methods, and applications,” PNAS, vol. 116, no. 44, pp. 22 071–22 080, 2019.
- G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, “Lightgbm: A highly efficient gradient boosting decision tree,” in NeurIPS, 2017, pp. 3146–3154.
- L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5–32, 2001.
- O. Irsoy, O. T. Yıldız, and E. Alpaydın, “Soft decision trees,” in ICPR, 2012, pp. 1819–1822.
- B. Letham, C. Rudin, T. H. McCormick, D. Madigan et al., “Interpretable classifiers using rules and bayesian analysis: Building a better stroke prediction model,” The Annals of Applied Statistics, vol. 9, no. 3, pp. 1350–1371, 2015.
- T. Wang, C. Rudin, F. Doshi-Velez, Y. Liu, E. Klampfl, and P. MacNeille, “A bayesian framework for learning rule sets for interpretable classification,” JMLR, vol. 18, no. 1, pp. 2357–2393, 2017.
- H. Yang, C. Rudin, and M. Seltzer, “Scalable bayesian rule lists,” in ICML, 2017, pp. 3921–3930.
- N. Frosst and G. Hinton, “Distilling a neural network into a soft decision tree,” arXiv preprint arXiv:1711.09784, 2017.
- M. T. Ribeiro, S. Singh, and C. Guestrin, “Why should i trust you?: Explaining the predictions of any classifier,” in SIGKDD, 2016, pp. 1135–1144.
- Z. Wang, W. Zhang, N. Liu, and J. Wang, “Transparent classification with multilayer logical perceptrons and random binarization,” in AAAI, 2020, pp. 6331–6339.
- M. Courbariaux, Y. Bengio, and J.-P. David, “Binaryconnect: Training deep neural networks with binary weights during propagations,” in NeurIPS, 2015, pp. 3123–3131.
- W. W. Cohen, “Fast effective rule induction,” in MLP. Elsevier, 1995, pp. 115–123.
- D. Wei, S. Dash, T. Gao, and O. Gunluk, “Generalized linear rule models,” in ICML. PMLR, 2019, pp. 6687–6696.
- E. Angelino, N. Larus-Stone, D. Alabi, M. Seltzer, and C. Rudin, “Learning certifiably optimal rule lists for categorical data,” JMLR, vol. 18, no. 1, pp. 8753–8830, 2017.
- J. Lin, C. Zhong, D. Hu, C. Rudin, and M. Seltzer, “Generalized and scalable optimal sparse decision trees,” in ICML. PMLR, 2020, pp. 6150–6160.
- H. Lakkaraju, S. H. Bach, and J. Leskovec, “Interpretable decision sets: A joint framework for description and prediction,” in SIGKDD, 2016, pp. 1675–1684.
- T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in SIGKDD, 2016, pp. 785–794.
- S. Hara and K. Hayashi, “Making tree ensembles interpretable: A bayesian model selection approach,” in AISTATS, 2018, pp. 77–85.
- H. Ishibuchi and T. Yamamoto, “Rule weight specification in fuzzy rule-based classification systems,” TFS, vol. 13, no. 4, pp. 428–435, 2005.
- Y. Yang, I. G. Morillo, and T. M. Hospedales, “Deep neural decision trees,” arXiv preprint arXiv:1806.06988, 2018.
- C. Glanois, Z. Jiang, X. Feng, P. Weng, M. Zimmer, D. Li, W. Liu, and J. Hao, “Neuro-symbolic hierarchical rule induction,” in ICML, 2022, pp. 7583–7615.
- K. Cheng, J. Liu, W. Wang, and Y. Sun, “Rlogic: Recursive logical rule learning from knowledge graphs,” in SIGKDD, 2022, pp. 179–189.
- M. Zimmer, X. Feng, C. Glanois, Z. JIANG, J. Zhang, P. Weng, D. Li, J. HAO, and W. Liu, “Differentiable logic machines,” TMLR, 2023.
- S. Chaudhury, S. Swaminathan, D. Kimura, P. Sen, K. Murugesan, R. Uceda-Sosa, M. Tatsubori, A. Fokoue, P. Kapanipathi, A. Munawar, and A. Gray, “Learning symbolic rules over Abstract Meaning Representations for textual reinforcement learning,” in ACL, 2023, pp. 6764–6776.
- X. Duan, X. Wang, P. Zhao, G. Shen, and W. Zhu, “Deeplogic: Joint learning of neural perception and logical reasoning,” TPAMI, vol. 45, no. 4, pp. 4321–4334, 2023.
- Z.-H. Zhou and Y.-X. Huang, “Abductive learning,” in Neuro-Symbolic Artificial Intelligence: The State of the Art. IOS Press, 2021, pp. 353–369.
- W. Dai, Q. Xu, Y. Yu, and Z. Zhou, “Bridging machine learning and logical reasoning by abductive learning,” pp. 2811–2822, 2019.
- Q. Zhang, J. Ren, G. Huang, R. Cao, Y. N. Wu, and S.-C. Zhu, “Mining interpretable aog representations from convolutional networks via active question answering,” TPAMI, vol. 43, no. 11, pp. 3949–3963, 2020.
- B. Liu and R. Mazumder, “Fire: An optimization approach for fast interpretable rule extraction,” in SIGKDD, 2023, p. 1396–1405.
- I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, “Binarized neural networks,” in NeurIPS, 2016, pp. 4107–4115.
- Y. Bai, Y.-X. Wang, and E. Liberty, “Proxquant: Quantized neural networks via proximal operators,” arXiv preprint arXiv:1810.00861, 2018.
- E. Jang, S. Gu, and B. Poole, “Categorical reparameterization with gumbel-softmax,” 2017.
- A. Payani and F. Fekri, “Learning algorithms via neural logic networks,” arXiv preprint arXiv:1904.01554, 2019.
- Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependencies with gradient descent is difficult,” TNNS, vol. 5, no. 2, pp. 157–166, 1994.
- M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard et al., “{{\{{TensorFlow}}\}}: a system for {{\{{Large-Scale}}\}} machine learning,” in OSDI, 2016, pp. 265–283.
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” in NeurIPS, 2019, pp. 8026–8037.
- H. Qin, R. Gong, X. Liu, X. Bai, J. Song, and N. Sebe, “Binary neural networks: A survey,” Pattern Recognition, p. 107281, 2020.
- G. Hinton, O. Vinyals, J. Dean et al., “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, vol. 2, no. 7, 2015.
- D. Dua and C. Graff, “UCI machine learning repository,” 2017. [Online]. Available: http://archive.ics.uci.edu/ml
- H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms,” 2017.
- D. Anguita, A. Ghio, L. Oneto, X. Parra, J. L. Reyes-Ortiz et al., “A public domain dataset for human activity recognition using smartphones.” in Esann, vol. 3, 2013, p. 3.
- B. Rozemberczki, C. Allen, and R. Sarkar, “Multi-Scale attributed node embedding,” Journal of Complex Networks, vol. 9, no. 2, p. cnab014, 05 2021.
- R. C. Petersen, P. Aisen, L. A. Beckett, M. Donohue, A. Gamst, D. J. Harvey, C. Jack, W. Jagust, L. Shaw, A. Toga et al., “Alzheimer’s disease neuroimaging initiative (adni): clinical characterization,” Neurology, vol. 74, no. 3, pp. 201–209, 2010.
- Z. Wang, J. Wang, N. Liu, C. Liu, X. Li, L. Dong, R. Zhang, C. Mao, Z. Duan, W. Zhang et al., “Learning cognitive-test-based interpretable rules for prediction and early diagnosis of dementia using neural networks,” Journal of Alzheimer’s Disease, vol. 90, no. 2, pp. 609–624.
- J. Demšar, “Statistical comparisons of classifiers over multiple data sets,” JMLR, vol. 7, no. Jan, pp. 1–30, 2006.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2015.
- Y. Gorishniy, I. Rubachev, V. Khrulkov, and A. Babenko, “Revisiting deep learning models for tabular data,” in NeurIPS, 2021, pp. 18 932–18 943.
- G. Somepalli, M. Goldblum, A. Schwarzschild, C. B. Bruss, and T. Goldstein, “Saint: Improved neural networks for tabular data via row attention and contrastive pre-training,” arXiv preprint arXiv:2106.01342, 2021.
- V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in ICML, 2010, pp. 807–814.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.