Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Domain-topic models with chained dimensions: charting an emergent domain of a major oncology conference (1912.13349v3)

Published 31 Dec 2019 in cs.DL, physics.data-an, and physics.soc-ph

Abstract: This paper presents a contribution to the study of bibliographic corpora in the context of science mapping. Starting from a graph representation of documents and their textual dimension, we observe that stochastic block models (SBMs) can provide a simultaneous clustering of documents and words that we call a domain-topic model. Previous work by (Gerlach et al., 2018) investigated the resulting topics, or word clusters, while ours focuses on the study of the document clusters, which we call domains. To enable the synthetic description and interactive navigation of domains, we introduce measures and interfaces relating both types of clusters, which reflect the structure of the graph and the model. We then present a procedure that, starting from the document clusters, extends the block model to also cluster arbitrary metadata attributes of the documents. We call this procedure a domain-chained model, and our previous measures and interfaces can be directly transposed to read the metadata clusters. We provide an example application to a corpus that is relevant to current STS research, and an interesting case for our approach: the 1995-2017 collection of abstracts presented at ASCO, the main annual oncology research conference. Through a sequence of domain-topic and domain-chained models, we identify and describe a particular group of domains in ASCO that have notably grown through the last decades, and which we relate to the establishment of "oncopolicy" as a major concern in oncology.

Citations (1)

Summary

We haven't generated a summary for this paper yet.