An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models (2404.15518v3)
Abstract: In traditional statistical learning, data points are usually assumed to be independently and identically distributed (i.i.d.) following an unknown probability distribution. This paper presents a contrasting viewpoint, perceiving data points as interconnected and employing a Markov reward process (MRP) for data modeling. We reformulate the typical supervised learning as an on-policy policy evaluation problem within reinforcement learning (RL), introducing a generalized temporal difference (TD) learning algorithm as a resolution. Theoretically, our analysis draws connections between the solutions of linear TD learning and ordinary least squares (OLS). We also show that under specific conditions, particularly when noises are correlated, the TD's solution proves to be a more effective estimator than OLS. Furthermore, we establish the convergence of our generalized TD algorithms under linear function approximation. Empirical studies verify our theoretical results, examine the vital design of our TD algorithm and show practical utility across various datasets, encompassing tasks such as regression and image classification with deep learning.
- TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.
- Aitken, A. C. Iv.—on least squares and linear combination of observations. Proceedings of the Royal Society of Edinburgh, pp. 42–48, 1936.
- Baltagi, B. H. Econometrics. Springer Books. Springer, 2008.
- Clustering with bregman divergences. Journal of machine learning research, 6(10), 2005.
- A finite time analysis of temporal difference learning with linear function approximation. In Conference on learning theory, pp. 1691–1692. PMLR, 2018.
- Bishop, C. M. Pattern Recognition and Machine Learning. Springer, 2006.
- Linear least-squares algorithms for temporal difference learning. Machine Learning, 1996.
- Company, T. Travel insurance data, 2021. URL https://www.kaggle.com/datasets/tejashvi14/travel-insurance-prediction-data/data.
- Event labeling combining ensemble detectors and background knowledge. Progress in Artificial Intelligence, pp. 1–15, 2013.
- Learning to reach goals via iterated supervised learning. In International Conference on Learning Representations, 2021.
- Closing the gap between td learning and supervised learning - a generalisation point of view. International Conference on Learning Representations, 2024.
- Greville, T. N. E. Note on the generalized inverse of a matrix product. Siam Review, 8(4):518–521, 1966.
- Structural credit assignment in neural networks using reinforcement learning. In Advances in Neural Information Processing Systems, pp. 30257–30270, 2021.
- Deep residual learning for image recognition. Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016.
- Relative loss bounds for single neurons. IEEE Transactions on Neural Networks, 10(6):1291–1304, 1999.
- Imitation learning: A survey of learning methods. ACM Comput. Surv., 2017.
- Joe Young (Owner), A. E. Rain in Australia, Copyright Commonwealth of Australia 2010, Bureau of Meteorology. https://www.kaggle.com/datasets/jsphyg/weather-dataset-rattle-package, 2020. URL http://www.bom.gov.au/climate/dwo/,http://www.bom.gov.au/climate/data.
- Generalized least squares. John Wiley & Sons, 2004.
- Adam: A method for stochastic optimization. International Conference on Learning Representations, 2015.
- Krizhevsky, A. Learning multiple layers of features from tiny images. Technical report, 2009.
- Addressing the curse of imbalanced training sets: One-sided selection. International Conference on Machine Learning, 1997.
- Mnist handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010.
- Weakly-supervised reinforcement learning for controllable behavior. In Advances in Neural Information Processing Systems, 2020.
- Lichman, M. UCI machine learning repository, 2015. URL http://archive.ics.uci.edu/ml.
- Deep reinforcement learning for imbalanced classification. Applied Intelligence, pp. 2488–2502, 2020.
- Interactive learning from policy-dependent human feedback. In International Conference on Machine Learning, pp. 2285–2294, 2017.
- Maei, H. R. Gradient temporal-difference learning algorithms. 2011.
- Generalized Linear Models, volume 37. CRC Press, 1989.
- Human-level control through deep reinforcement learning. Nature, 2015.
- Nelder, J. A. Log linear models for contingency tables: A generalization of classical least squares. Journal of the Royal Statistical Society. Series C (Applied Statistics), pp. 323–329, 1974.
- Generalized linear models. Journal of the Royal Statistical Society. Series A (General), pp. 370–384, 1972.
- Effective sketching methods for value function approximation. Conference on Uncertainty in Artificial Intelligence, 2017a.
- Accelerated gradient temporal difference learning. AAAI Conference on Artificial Intelligence, pp. 2464–2470, 2017b.
- An implicit function learning approach for parametric modal regression. Advances in Neural Information Processing Systems, 33:11442–11452, 2020.
- SGEMM GPU kernel performance. UCI Machine Learning Repository, 2018.
- Automatic differentiation in pytorch. 2017.
- Performative prediction. International Conference on Machine Learning, pp. 7599–7609, 2020.
- Puterman, M. L. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
- statsmodels: Econometric and statistical modeling with python. In 9th Python in Science Conference, 2010.
- Sutton, R. S. Learning to predict by the methods of temporal differences. Machine Learning, pp. 9–44, 1988.
- An emphatic approach to the problem of off-policy temporal-difference learning. Journal of Machine Learning Research, 2016.
- Szepesvari, C. Algorithms for Reinforcement Learning. Morgan & Claypool Publishers, 2010.
- An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42(5), 1997.
- Learning a bi-stochastic data similarity matrix. In 2010 IEEE International Conference on Data Mining, pp. 551–560. IEEE, 2010.
- Understanding and leveraging overparameterization in recursive value estimation. In International Conference on Learning Representations, 2022.
- Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. CoRR, 2017. URL http://arxiv.org/abs/1708.07747.