Holistic Robust Data-Driven Decisions (2207.09560v4)
Abstract: The design of data-driven formulations for machine learning and decision-making with good out-of-sample performance is a key challenge. The observation that good in-sample performance does not guarantee good out-of-sample performance is generally known as overfitting. Practical overfitting can typically not be attributed to a single cause but is caused by several factors simultaneously. We consider here three overfitting sources: (i) statistical error as a result of working with finite sample data, (ii) data noise, which occurs when the data points are measured only with finite precision, and finally, (iii) data misspecification in which a small fraction of all data may be wholly corrupted. Although existing data-driven formulations may be robust against one of these three sources in isolation, they do not provide holistic protection against all overfitting sources simultaneously. We design a novel data-driven formulation that guarantees such holistic protection and is computationally viable. Our distributionally robust optimization formulation can be interpreted as a novel combination of a Kullback-Leibler and L\'evy-Prokhorov robust optimization formulation. In the context of classification and regression problems, we show that several popular regularized and robust formulations naturally reduce to a particular case of our proposed novel formulation. Finally, we apply the proposed HR formulation to two real-life applications and study it alongside several benchmarks: (1) training neural networks on healthcare data, where we analyze various robustness and generalization properties in the presence of noise, labeling errors, and scarce data, (2) a portfolio selection problem with real stock data, and analyze the risk/return tradeoff under the natural severe distribution shift of the application.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.