Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 27 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 23 tok/s Pro
GPT-5 High 29 tok/s Pro
GPT-4o 70 tok/s Pro
Kimi K2 117 tok/s Pro
GPT OSS 120B 459 tok/s Pro
Claude Sonnet 4 34 tok/s Pro
2000 character limit reached

An Optimal Transport View on Generalization (1811.03270v1)

Published 8 Nov 2018 in stat.ML and cs.LG

Abstract: We derive upper bounds on the generalization error of learning algorithms based on their \emph{algorithmic transport cost}: the expected Wasserstein distance between the output hypothesis and the output hypothesis conditioned on an input example. The bounds provide a novel approach to study the generalization of learning algorithms from an optimal transport view and impose less constraints on the loss function, such as sub-gaussian or bounded. We further provide several upper bounds on the algorithmic transport cost in terms of total variation distance, relative entropy (or KL-divergence), and VC dimension, thus further bridging optimal transport theory and information theory with statistical learning theory. Moreover, we also study different conditions for loss functions under which the generalization error of a learning algorithm can be upper bounded by different probability metrics between distributions relating to the output hypothesis and/or the input data. Finally, under our established framework, we analyze the generalization in deep learning and conclude that the generalization error in deep neural networks (DNNs) decreases exponentially to zero as the number of layers increases. Our analyses of generalization error in deep learning mainly exploit the hierarchical structure in DNNs and the contraction property of $f$-divergence, which may be of independent interest in analyzing other learning models with hierarchical structure.

Citations (7)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.