Papers
Topics
Authors
Recent
2000 character limit reached

Guaranteed Model Order Estimation and Sample Complexity Bounds for LDA (1312.2646v4)

Published 10 Dec 2013 in stat.ML

Abstract: The question of how to determine the number of independent latent factors (topics) in mixture models such as Latent Dirichlet Allocation (LDA) is of great practical importance. In most applications, the exact number of topics is unknown, and depends on the application and the size of the data set. Bayesian nonparametric methods can avoid the problem of topic number selection, but they can be impracticably slow for large sample sizes and are subject to local optima. We develop a guaranteed procedure for topic number recovery that does not necessitate learning the model's latent parameters beforehand. Our procedure relies on adapting results from random matrix theory. Performance of our topic number recovery procedure is superior to hLDA, a nonparametric method. We also discuss some implications of our results on the sample complexity and accuracy of popular spectral learning algorithms for LDA. Our results and procedure can be extended to spectral learning algorithms for other exchangeable mixture models as well as Hidden Markov Models.

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.