Domain-topic models with chained dimensions: charting an emergent domain of a major oncology conference (1912.13349v3)
Abstract: This paper presents a contribution to the study of bibliographic corpora in the context of science mapping. Starting from a graph representation of documents and their textual dimension, we observe that stochastic block models (SBMs) can provide a simultaneous clustering of documents and words that we call a domain-topic model. Previous work by (Gerlach et al., 2018) investigated the resulting topics, or word clusters, while ours focuses on the study of the document clusters, which we call domains. To enable the synthetic description and interactive navigation of domains, we introduce measures and interfaces relating both types of clusters, which reflect the structure of the graph and the model. We then present a procedure that, starting from the document clusters, extends the block model to also cluster arbitrary metadata attributes of the documents. We call this procedure a domain-chained model, and our previous measures and interfaces can be directly transposed to read the metadata clusters. We provide an example application to a corpus that is relevant to current STS research, and an interesting case for our approach: the 1995-2017 collection of abstracts presented at ASCO, the main annual oncology research conference. Through a sequence of domain-topic and domain-chained models, we identify and describe a particular group of domains in ASCO that have notably grown through the last decades, and which we relate to the establishment of "oncopolicy" as a major concern in oncology.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.