GradTree: Learning Axis-Aligned Decision Trees with Gradient Descent (2305.03515v7)
Abstract: Decision Trees (DTs) are commonly used for many machine learning tasks due to their high degree of interpretability. However, learning a DT from data is a difficult optimization problem, as it is non-convex and non-differentiable. Therefore, common approaches learn DTs using a greedy growth algorithm that minimizes the impurity locally at each internal node. Unfortunately, this greedy procedure can lead to inaccurate trees. In this paper, we present a novel approach for learning hard, axis-aligned DTs with gradient descent. The proposed method uses backpropagation with a straight-through operator on a dense DT representation, to jointly optimize all tree parameters. Our approach outperforms existing methods on binary classification benchmarks and achieves competitive results for multi-class tasks. The method is available under: https://github.com/s-marton/GradTree
- Learning optimal decision trees using caching branch-and-bound search. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 3146–3153.
- PyDL8.5. https://github.com/aia-uclouvain/pydl8.5. Accessed 13.11.2022.
- A survey of evolutionary algorithms for decision-tree induction. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(3): 291–312.
- Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432.
- Optimal classification trees. Machine Learning, 106(7): 1039–1082.
- Sparsity in optimal randomized classification trees. European Journal of Operational Research, 284(1): 255–272.
- Classification and Regression Trees. Wadsworth. ISBN 0-534-98053-8.
- Node-gam: Neural generalized additive model for interpretable deep learning. arXiv preprint arXiv:2106.01613.
- SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16: 321–357.
- MurTree: Optimal Decision Trees via Dynamic Programming and Search. Journal of Machine Learning Research, 23(26): 1–47.
- UCI Machine Learning Repository.
- Freitas, A. A. 2002. Data mining and knowledge discovery with evolutionary algorithms. Springer Science & Business Media.
- Distilling a neural network into a soft decision tree. arXiv preprint arXiv:1711.09784.
- Soft decision trees. In Proceedings of the 21st international conference on pattern recognition (ICPR2012), 1819–1822. IEEE.
- Averaging weights leads to wider optima and better generalization. arXiv preprint arXiv:1803.05407.
- Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144.
- Hierarchical mixtures of experts and the EM algorithm. Neural computation, 6(2): 181–214.
- Learning Accurate Decision Trees with Bandit Feedback via Quantized Gradient Descent. Transactions of Machine Learning Research.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- Deep neural decision forests. In Proceedings of the IEEE international conference on computer vision, 1467–1475.
- Applied predictive modeling, volume 26. Springer.
- PolyLoss: A Polynomial Expansion Perspective of Classification Loss Functions. arXiv preprint arXiv:2204.12511.
- Generalized and scalable optimal sparse decision trees. In International Conference on Machine Learning, 6150–6160. PMLR.
- Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, 2980–2988.
- Loh, W.-Y. 2002. Regression tress with unbiased variable selection and interaction detection. Statistica sinica, 361–386.
- Loh, W.-Y. 2009. Improving the precision of classification trees. The Annals of Applied Statistics, 1710–1737.
- Quant-BnB: A Scalable Branch-and-Bound Method for Optimal Decision Trees with Continuous Features. In International Conference on Machine Learning, 15255–15277. PMLR.
- Molnar, C. 2020. Interpretable machine learning. Lulu. com.
- Efficient non-greedy optimization of decision trees. Advances in neural information processing systems, 28.
- Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12: 2825–2830.
- Sparse sequence-to-sequence models. arXiv preprint arXiv:1905.05702.
- Neural oblivious decision ensembles for deep learning on tabular data. arXiv preprint arXiv:1909.06312.
- Pysiak, K. 2021. GeneticTree. https://github.com/pysiakk/GeneticTree. Accessed 17.08.2022.
- Quinlan, J. R. 1993. C4.5: programs for machine learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. ISBN 1-55860-238-0.
- Adaptive neural trees. In International Conference on Machine Learning, 6166–6175. PMLR.
- One-Stage Tree: end-to-end tree builder and pruner. Machine Learning, 111(5): 1959–1985.
- Deep neural decision trees. arXiv preprint arXiv:1806.06988.
- Deep Neural Decision Trees. https://github.com/wOOL/DNDT. Accessed 13.11.2022.
- Learning binary decision trees by argmin differentiation. In International Conference on Machine Learning, 12298–12309. PMLR.
- Non-greedy algorithms for decision tree optimization: An experimental comparison. In 2021 International Joint Conference on Neural Networks (IJCNN), 1–8. IEEE.
- Sascha Marton (11 papers)
- Stefan Lüdtke (20 papers)
- Christian Bartelt (29 papers)
- Heiner Stuckenschmidt (34 papers)