Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 77 tok/s
Gemini 2.5 Pro 33 tok/s Pro
GPT-5 Medium 25 tok/s Pro
GPT-5 High 27 tok/s Pro
GPT-4o 75 tok/s Pro
Kimi K2 220 tok/s Pro
GPT OSS 120B 465 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

Alpha-divergence loss function for neural density ratio estimation (2402.02041v4)

Published 3 Feb 2024 in stat.ML and cs.LG

Abstract: Density ratio estimation (DRE) is a fundamental machine learning technique for capturing relationships between two probability distributions. State-of-the-art DRE methods estimate the density ratio using neural networks trained with loss functions derived from variational representations of $f$-divergences. However, existing methods face optimization challenges, such as overfitting due to lower-unbounded loss functions, biased mini-batch gradients, vanishing training loss gradients, and high sample requirements for Kullback--Leibler (KL) divergence loss functions. To address these issues, we focus on $\alpha$-divergence, which provides a suitable variational representation of $f$-divergence. Subsequently, a novel loss function for DRE, the $\alpha$-divergence loss function ($\alpha$-Div), is derived. $\alpha$-Div is concise but offers stable and effective optimization for DRE. The boundedness of $\alpha$-divergence provides the potential for successful DRE with data exhibiting high KL-divergence. Our numerical experiments demonstrate the effectiveness of $\alpha$-Div in optimization. However, the experiments also show that the proposed loss function offers no significant advantage over the KL-divergence loss function in terms of RMSE for DRE. This indicates that the accuracy of DRE is primarily determined by the amount of KL-divergence in the data and is less dependent on $\alpha$-divergence.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

Authors (1)

X Twitter Logo Streamline Icon: https://streamlinehq.com