Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TopicAdapt- An Inter-Corpora Topics Adaptation Approach (2310.04978v1)

Published 8 Oct 2023 in cs.CL and cs.LG

Abstract: Topic models are popular statistical tools for detecting latent semantic topics in a text corpus. They have been utilized in various applications across different fields. However, traditional topic models have some limitations, including insensitivity to user guidance, sensitivity to the amount and quality of data, and the inability to adapt learned topics from one corpus to another. To address these challenges, this paper proposes a neural topic model, TopicAdapt, that can adapt relevant topics from a related source corpus and also discover new topics in a target corpus that are absent in the source corpus. The proposed model offers a promising approach to improve topic modeling performance in practical scenarios. Experiments over multiple datasets from diverse domains show the superiority of the proposed model against the state-of-the-art topic models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Coordinated topic modeling. arXiv preprint arXiv:2210.08559.
  2. David Andrzejewski and Xiaojin Zhu. 2009. Latent dirichlet allocation with topic-in-set knowledge. In Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing, pages 43–48.
  3. Automatic labelling of topics with neural embeddings. arXiv preprint arXiv:1612.05340.
  4. Latent dirichlet allocation. Journal of machine Learning research, 3(Jan):993–1022.
  5. Applications of topic models, volume 11. now Publishers Incorporated.
  6. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  7. Topic modeling in embedding spaces. Transactions of the Association for Computational Linguistics, 8:439–453.
  8. A large-scale evaluation and analysis of personalized search strategies. In Proceedings of the 16th international conference on World Wide Web, pages 581–590.
  9. George Foster and Roland Kuhn. 2007. Mixture-model adaptation for smt. In Proceedings of the Second Workshop on Statistical Machine Translation, pages 128–135.
  10. Keyword assisted embedded topic model. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, pages 372–380.
  11. Towards autoencoding variational inference for aspect-based opinion summary. Applied Artificial Intelligence, 33(9):796–816.
  12. Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 50–57.
  13. Incorporating lexical priors into topic models. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 204–213.
  14. An introduction to variational methods for graphical models. Machine learning, 37(2):183–233.
  15. Disclda: Discriminative learning for dimensionality reduction and classification. Advances in neural information processing systems, 21.
  16. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
  17. Jon Mcauliffe and David Blei. 2007. Supervised topic models. Advances in neural information processing systems, 20.
  18. Automatic labeling of multinomial topic models. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 490–499.
  19. Discriminative topic mining via category-name guided text embedding. In Proceedings of The Web Conference 2020, pages 2121–2132.
  20. Neural variational inference for text processing. In International conference on machine learning, pages 1727–1736. PMLR.
  21. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 2227–2237, New Orleans, Louisiana. Association for Computational Linguistics.
  22. Improving language understanding by generative pre-training.
  23. Labeled lda: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 conference on empirical methods in natural language processing, pages 248–256.
  24. Coordinated topic modeling. arXiv e-prints, pages arXiv–2210.
  25. Evan Sandhaus. 2008. The new york times annotated corpus. Linguistic Data Consortium, Philadelphia, 6(12):e26752.
  26. An overview of microsoft academic service (mas) and applications. In Proceedings of the 24th international conference on world wide web, pages 243–246.
  27. Akash Srivastava and Charles Sutton. 2016. Neural variational inference for topic models. ArXiv Preprint, 1(1):1–12.
  28. Xing Wei and W Bruce Croft. 2006. Lda-based document models for ad-hoc retrieval. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 178–185.
  29. Hierarchical attention networks for document classification. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pages 1480–1489.

Summary

We haven't generated a summary for this paper yet.