Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 37 tok/s Pro
GPT-5 High 35 tok/s Pro
GPT-4o 125 tok/s Pro
Kimi K2 203 tok/s Pro
GPT OSS 120B 429 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Probability Distribution Learning and Its Application in Deep Learning (2406.05666v11)

Published 9 Jun 2024 in cs.LG, cs.IR, and stat.ML

Abstract: This paper aims to elucidate the theoretical mechanisms underlying deep learning from a probability distribution estimation perspective, with Fenchel-Young Loss serving as the loss function. In our approach, the learning error , which measures the discrepancy between the model's predicted distribution and the posterior expectation of the true unknown distribution given sampling, is formulated as the primary optimization objective. Therefore, the learning error can be regarded as the posterior expectation of the expected risk. As many important loss functions, such as Softmax Cross-Entropy Loss and Mean Squared Error Loss, are specific instances of Fenchel-Young Losses, this paper further theoretically demonstrates that Fenchel-Young Loss is a natural choice for machine learning tasks, thereby ensuring the broad applicability of the conclusions drawn in this work. In the case of using Fenchel-Young Loss, the paper proves that the model's fitting error is controlled by the gradient norm and structural error, thereby providing new insights into the mechanisms of non-convex optimization and various techniques employed in model training, such as over-parameterization and skip connections. Furthermore, it establishes model-independent bounds on the learning error, demonstrating that the correlation between features and labels (equivalent to mutual information) controls the upper bound of the model's generalization error. Ultimately, the paper validates the key conclusions of the proposed method through empirical results, demonstrating its practical effectiveness.

Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (3)

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 5 tweets and received 25 likes.

Upgrade to Pro to view all of the tweets about this paper:

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube