Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 153 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 20 tok/s Pro
GPT-5 High 28 tok/s Pro
GPT-4o 79 tok/s Pro
Kimi K2 198 tok/s Pro
GPT OSS 120B 428 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

"Look Ma, No Hands!" A Parameter-Free Topic Model (1409.2993v1)

Published 10 Sep 2014 in cs.LG, cs.CL, and cs.IR

Abstract: It has always been a burden to the users of statistical topic models to predetermine the right number of topics, which is a key parameter of most topic models. Conventionally, automatic selection of this parameter is done through either statistical model selection (e.g., cross-validation, AIC, or BIC) or Bayesian nonparametric models (e.g., hierarchical Dirichlet process). These methods either rely on repeated runs of the inference algorithm to search through a large range of parameter values which does not suit the mining of big data, or replace this parameter with alternative parameters that are less intuitive and still hard to be determined. In this paper, we explore to "eliminate" this parameter from a new perspective. We first present a nonparametric treatment of the PLSA model named nonparametric probabilistic latent semantic analysis (nPLSA). The inference procedure of nPLSA allows for the exploration and comparison of different numbers of topics within a single execution, yet remains as simple as that of PLSA. This is achieved by substituting the parameter of the number of topics with an alternative parameter that is the minimal goodness of fit of a document. We show that the new parameter can be further eliminated by two parameter-free treatments: either by monitoring the diversity among the discovered topics or by a weak supervision from users in the form of an exemplar topic. The parameter-free topic model finds the appropriate number of topics when the diversity among the discovered topics is maximized, or when the granularity of the discovered topics matches the exemplar topic. Experiments on both synthetic and real data prove that the parameter-free topic model extracts topics with a comparable quality comparing to classical topic models with "manual transmission". The quality of the topics outperforms those extracted through classical Bayesian nonparametric models.

Citations (6)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.