Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 56 tok/s
Gemini 2.5 Pro 39 tok/s Pro
GPT-5 Medium 15 tok/s Pro
GPT-5 High 16 tok/s Pro
GPT-4o 99 tok/s Pro
Kimi K2 155 tok/s Pro
GPT OSS 120B 476 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Analyzing CART (1906.10086v7)

Published 24 Jun 2019 in stat.ML and cs.LG

Abstract: Decision trees with binary splits are popularly constructed using Classification and Regression Trees (CART) methodology. For binary classification and regression models, this approach recursively divides the data into two near-homogenous daughter nodes according to a split point that maximizes the reduction in sum of squares error (the impurity) along a particular variable. This paper aims to study the bias and adaptive properties of regression trees constructed with CART. In doing so, we derive an interesting connection between the bias and the mean decrease in impurity (MDI) measure of variable importance---a tool widely used for model interpretability---defined as the sum of impurity reductions over all non-terminal nodes in the tree. In particular, we show that the probability content of a terminal subnode for a variable is small when the MDI for that variable is large and that this relationship is exponential---confirming theoretically that decision trees with CART have small bias and are adaptive to signal strength and direction. Finally, we apply these individual tree bounds to tree ensembles and show consistency of Breiman's random forests. The context is surprisingly general and applies to a wide variety of multivariable data generating distributions and regression functions. The main technical tool is an exact characterization of the conditional probability content of the daughter nodes arising from an optimal split, in terms of the partial dependence function and reduction in impurity.

Citations (6)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

Authors (1)