Differentiable Neural Networks with RePU Activation: with Applications to Score Estimation and Isotonic Regression (2305.00608v3)
Abstract: We study the properties of differentiable neural networks activated by rectified power unit (RePU) functions. We show that the partial derivatives of RePU neural networks can be represented by RePUs mixed-activated networks and derive upper bounds for the complexity of the function class of derivatives of RePUs networks. We establish error bounds for simultaneously approximating $Cs$ smooth functions and their derivatives using RePU-activated deep neural networks. Furthermore, we derive improved approximation error bounds when data has an approximate low-dimensional support, demonstrating the ability of RePU networks to mitigate the curse of dimensionality. To illustrate the usefulness of our results, we consider a deep score matching estimator (DSME) and propose a penalized deep isotonic regression (PDIR) using RePU networks. We establish non-asymptotic excess risk bounds for DSME and PDIR under the assumption that the target functions belong to a class of $Cs$ smooth functions. We also show that PDIR achieves the minimax optimal convergence rate and has a robustness property in the sense it is consistent with vanishing penalty parameters even when the monotonicity assumption is not satisfied. Furthermore, if the data distribution is supported on an approximate low-dimensional manifold, we show that DSME and PDIR can mitigate the curse of dimensionality.
- Approximations with deep neural networks in sobolev time-space. Analysis and Applications, 20(03):499–541, 2022.
- Approximation of smoothness classes by deep rectifier networks. SIAM Journal on Numerical Analysis, 59(6):3032–3051, 2021.
- Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge, 1999. ISBN 0-521-57353-X. doi: 10.1017/CBO9780511624216. URL https://doi.org/10.1017/CBO9780511624216.
- Multivariate simultaneous approximation. Constructive approximation, 18(4):569–577, 2002.
- Random projections of smooth manifolds. Found. Comput. Math., 9(1):51–77, 2009. ISSN 1615-3375. doi: 10.1007/s10208-007-9011-z. URL https://doi.org/10.1007/s10208-007-9011-z.
- Statistical Inference under Order Restrictions; the Theory and Application of Isotonic Regression. New York: Wiley, 1972.
- Almost linear vc dimension bounds for piecewise polynomial networks. Advances in neural information processing systems, 11, 1998.
- Spectrally-normalized margin bounds for neural networks. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/b22b257ad0519d4500539da3c8bcf4dd-Paper.pdf.
- Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks. Journal of Machine Learning Research, 20:Paper No. 63, 17, 2019. ISSN 1532-4435.
- On deep learning as a remedy for the curse of dimensionality in nonparametric regression. Ann. Statist., 47(4):2261–2285, 2019. ISSN 0090-5364. doi: 10.1214/18-AOS1747. URL https://doi.org/10.1214/18-AOS1747.
- Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput., 15(6):1373–1396, 2003.
- Pierre C Bellec. Sharp oracle inequalities for least squares estimators in shape restricted regression. The Annals of Statistics, 46(2):745–780, 2018.
- Simultaneous approximation of a smooth function and its derivatives by deep neural networks with piecewise-polynomial activations. arXiv:2206.09527, 2022.
- Generative modeling with denoising auto-encoders and langevin sampling. arXiv:2002.00107, 2020.
- Adaptive risk bounds in unimodal regression. Bernoulli, 25(1):1–25, 2019.
- On risk bounds in isotonic and other shape restricted regression problems. The Annals of Statistics, 43(4):1774–1800, 2015.
- On matrix estimation under monotonicity constraints. Bernoulli, 24(2):1072–1100, 2018.
- Efficient approximation of deep relu networks for functions on low dimensional manifolds. Advances in Neural Information Processing Systems, 2019.
- Nonparametric regression on low-dimensional manifolds using deep relu networks: Function approximation and statistical recovery. Information and Inference: A Journal of the IMA, 11(4):1203–1253, 2022.
- Wavegrad: Estimating gradients for waveform generation. arXiv:2009.00713, 2020.
- Realization of neural networks with one hidden layer. In Multivariate approximation: From CAGD to wavelets, pages 77–89. World Scientific, 1993.
- Neural networks for localized approximation. mathematics of computation, 63(208):607–623, 1994.
- Isotonic regression in multi-dimensional spaces and graphs. The Annals of Statistics, 48(6):3672–3698, 2020.
- Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34:8780–8794, 2021.
- Case-control isotonic regression for investigation of elevation in risk around a point source. Statistics in medicine, 18(13):1605–1613, 1999.
- Convergence rate analysis for deep ritz method. arXiv preprint arXiv:2103.13330, 2021.
- Cécile Durot. Sharp asymptotics for isotonic regression. Probability theory and related fields, 122(2):222–240, 2002.
- Cécile Durot. On the lpsubscript𝑙𝑝l_{p}italic_l start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-error of monotonicity constrained estimators. The Annals of Statistics, 35(3):1080–1104, 2007.
- Cécile Durot. Monotone nonparametric regression with random design. Mathematical methods of statistics, 17(4):327–341, 2008.
- Richard L Dykstra. An algorithm for restricted least squares regression. Journal of the American Statistical Association, 78(384):837–842, 1983.
- Charles Fefferman. Whitney’s extension problem for cmsuperscript𝑐𝑚c^{m}italic_c start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT. Annals of Mathematics., 164(1):313–359, 2006. ISSN 0003486X. URL http://www.jstor.org/stable/20159991.
- Testing the manifold hypothesis. Journal of the American Mathematical Society, 29(4):983–1049, 2016.
- On integrated l1subscript𝑙1l_{1}italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT convergence rate of an isotonic regression estimator for multivariate observations. IEEE Transactions on Information Theory, 66(10):6389–6402, 2020.
- Minimax risk bounds for piecewise constant models. arXiv preprint arXiv:1705.06386, 2017.
- Deep generative learning via variational gradient flow. In International Conference on Machine Learning, pages 2093–2101. PMLR, 2019.
- Deep generative learning via euler particle transport. In Mathematical and Scientific Machine Learning, pages 336–368. PMLR, 2022.
- Nonparametric estimation under shape constraints, volume 38. Cambridge University Press, 2014.
- Approximation rates for neural networks with encodable weights in smoothness spaces. Neural Networks, 134:107–130, 2021.
- Isotonic regression in general dimensions. The Annals of Statistics, 47(5):2440–2471, 2019.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
- Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res., 23:47–1, 2022.
- Local dimensionality reduction for non-parametric regression. Neural Processing Letters, 29(2):109, 2009.
- Simultaneous neural network approximation for smooth functions. Neural Networks, 154:152–164, 2022.
- Lars Hörmander. The analysis of linear partial differential operators I: Distribution theory and Fourier analysis. Springer, 2015.
- William George Horner. A new method of solving numerical equations of all orders, by continuous approximation. Philosophical Transactions of the Royal Society of London, (109):308–335, 1819.
- Nonparametric estimation and inference under shape restrictions. Journal of Econometrics, 201(1):108–126, 2017.
- Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(4), 2005.
- Robust compressed sensing mri with deep generative priors. Advances in Neural Information Processing Systems, 34:14938–14954, 2021.
- Smooth isotonic regression: A new method to calibrate predictive models. AMIA Summits on Translational Science Proceedings, 2011:16, 2011.
- Deep nonparametric regression on approximate manifolds: Nonasymptotic error bounds with polynomial prefactors. The Annals of Statistics, 51(2):691–716, 2023.
- Adaptation in log-concave density estimation. The Annals of Statistics, 46(5):2279–2306, 2018.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Approximation by combinations of relu and squared relu ridge functions with ℓ1superscriptℓ1\ell^{1}roman_ℓ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT and ℓ0superscriptℓ0\ell^{0}roman_ℓ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT controls. IEEE Transactions on Information Theory, 64(12):7649–7656, 2018.
- Diffwave: A versatile diffusion model for audio synthesis. arXiv:2009.09761, 2020.
- Fast, provable algorithms for isotonic regression in all l_p-norms. Advances in neural information processing systems, 28, 2015.
- Convergence for score-based generative modeling with polynomial complexity. arXiv:2206.06227, 2022.
- Better approximations of high dimensional smooth functions by deep neural networks with rectified power units. arXiv preprint arXiv:1903.05858, 2019.
- Powernet: Efficient representations of polynomials and smooth functions by deep neural networks with rectified power units. J. Math. Study, 53(2):159–191, 2020.
- Gradient estimators for implicit models. arXiv:1705.07107, 2017.
- A kernelized stein discrepancy for goodness-of-fit tests. In International conference on machine learning, pages 276–284. PMLR, 2016.
- Deep network approximation for smooth functions. SIAM Journal on Mathematical Analysis, 53(5):5465–5506, 2021a.
- Deep network approximation for smooth functions. SIAM Journal on Mathematical Analysis, 53(5):5465–5506, 2021b.
- Efficient regularized isotonic regression with application to gene–gene interaction search. The Annals of Applied Statistics, 6(1):253–283, 2012.
- Hrushikesh Narhar Mhaskar. Approximation properties of a multilayered feedforward artificial neural network. Advances in Computational Mathematics, 1(1):61–80, 1993.
- Symbolic music generation with diffusion models. arXiv:2103.16091, 2021.
- Foundations of machine learning. MIT press, 2018.
- Additive isotonic regression models in epidemiology. Statistics in medicine, 19(6):849–859, 2000.
- Deterministic pac-bayesian generalization bounds for deep networks via generalizing noise-resilience. arXiv preprint arXiv:1905.13344, 2019.
- Norm-based capacity control in neural networks. In Conference on learning theory, pages 1376–1401. PMLR, 2015.
- Optimal approximation of piecewise smooth functions using deep relu neural networks. Neural Networks, 108:296–330, 2018.
- Jean-Claude Picard. Maximal closure of a graph and applications to combinatorial problems. Management science, 22(11):1268–1272, 1976.
- Grad-tts: A diffusion probabilistic model for text-to-speech. In International Conference on Machine Learning, pages 8599–8608. PMLR, 2021.
- Combining isotonic regression and em algorithm to predict genetic risk under monotonicity constraint. The annals of applied statistics, 8(2):1182, 2014.
- Order Restricted Statistical Inference. New York: Wiley, 1988.
- Estimation of parameters subject to order restrictions on a circle with application to estimation of phase angles of cell cycle genes. Journal of the American Statistical Association, 104(485):338–347, 2009.
- Clustering via mode seeking by direct estimation of the gradient of a log-density. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 19–34. Springer, 2014.
- Johannes Schmidt-Hieber. Deep relu network approximation of functions on a manifold. arXiv:1908.00695, 2019.
- Johannes Schmidt-Hieber. Nonparametric regression using deep neural networks with ReLU activation function. Annals of Statistics, 48(4):1875–1897, 2020.
- Estimation of non-crossing quantile regression process with deep requ neural networks. arXiv:2207.10442, 2022.
- Deep network approximation characterized by number of neurons. Commun. Comput. Phys., 28(5):1768–1811, 2020. ISSN 1815-2406. doi: 10.4208/cicp.oa-2020-0149. URL https://doi.org/10.4208/cicp.oa-2020-0149.
- A spectral approach to gradient estimation for implicit distributions. In International Conference on Machine Learning, pages 4644–4653. PMLR, 2018.
- High-order approximation rates for shallow neural networks with cosine and reluk activation functions. Applied and Computational Harmonic Analysis, 58:1–26, 2022.
- Generative modeling by estimating gradients of the data distribution. Advances in Neural Information Processing Systems, 32, 2019.
- Improved techniques for training score-based generative models. Advances in neural information processing systems, 33:12438–12448, 2020.
- Sliced score matching: A scalable approach to density and score estimation. In Uncertainty in Artificial Intelligence, pages 574–584. PMLR, 2020.
- Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=PxTIG12RRHS.
- Least squares isotonic regression in two dimensions. Journal of Optimization Theory and Applications, 117(3):585–605, 2003.
- Density estimation in infinite dimensional exponential families. Journal of Machine Learning Research, 2017.
- Charles J Stone. Optimal global rates of convergence for nonparametric regression. The annals of statistics, pages 1040–1053, 1982.
- Quentin F Stout. Isotonic regression for multiple independent variables. Algorithmica, 71(2):450–470, 2015.
- Gradient-free hamiltonian monte carlo with efficient kernel exponential families. Advances in Neural Information Processing Systems, 28, 2015.
- Efficient and principled score estimation with nyström kernel exponential families. In International Conference on Artificial Intelligence and Statistics, pages 652–660. PMLR, 2018.
- Pascal Vincent. A connection between score matching and denoising autoencoders. Neural computation, 23(7):1661–1674, 2011.
- Improving generative adversarial networks with denoising feature matching. 2016.
- Data-dependent sample complexity of deep neural networks via lipschitz augmentation. Advances in Neural Information Processing Systems, 32, 2019.
- Simultaneous lp-approximation order for neural networks. Neural Networks, 18(7):914–923, 2005.
- Contraction and uniform convergence of isotonic regression. Electronic Journal of Statistics, 13(1):646–677, 2019.
- Dmitry Yarotsky. Error bounds for approximations with deep ReLU networks. Neural Networks, 94:103–114, 2017.
- Dmitry Yarotsky. Optimal approximation of continuous functions by very deep ReLU networks. In Conference on Learning Theory, pages 639–649. PMLR, 2018.
- Cun-Hui Zhang. Risk bounds in isotonic regression. The Annals of Statistics, 30(2):528–555, 2002.
- Nonparametric score estimators. In International Conference on Machine Learning, pages 11513–11522. PMLR, 2020.