Clustering and Community Detection in Directed Networks: A Survey (1308.0971v1)

Published 5 Aug 2013 in cs.SI, cs.IR, physics.bio-ph, physics.comp-ph, and physics.soc-ph

Abstract: Networks (or graphs) appear as dominant structures in diverse domains, including sociology, biology, neuroscience and computer science. In most of the aforementioned cases graphs are directed - in the sense that there is directionality on the edges, making the semantics of the edges non symmetric. An interesting feature that real networks present is the clustering or community structure property, under which the graph topology is organized into modules commonly called communities or clusters. The essence here is that nodes of the same community are highly similar while on the contrary, nodes across communities present low similarity. Revealing the underlying community structure of directed complex networks has become a crucial and interdisciplinary topic with a plethora of applications. Therefore, naturally there is a recent wealth of research production in the area of mining directed graphs - with clustering being the primary method and tool for community detection and evaluation. The goal of this paper is to offer an in-depth review of the methods presented so far for clustering directed networks along with the relevant necessary methodological background and also related applications. The survey commences by offering a concise review of the fundamental concepts and methodological base on which graph clustering algorithms capitalize on. Then we present the relevant work along two orthogonal classifications. The first one is mostly concerned with the methodological principles of the clustering algorithms, while the second one approaches the methods from the viewpoint regarding the properties of a good cluster in a directed network. Further, we present methods and metrics for evaluating graph clustering results, demonstrate interesting application domains and provide promising future research directions.

Citations (656)

View on Semantic Scholar

Summary

The paper introduces a taxonomy that categorizes various methods for clustering directed networks by addressing edge directionality.
It systematically reviews four methodological approaches including naive transformation, directionality preservation, extended objective functions, and alternative probabilistic models.
The paper highlights broad applications across social, biological, and neuroscience networks while outlining future directions for scalability and unified frameworks.

Clustering and Community Detection in Directed Networks: A Survey

Overview

The paper offers a comprehensive survey on the methods and approaches for clustering and community detection within directed networks. Directed networks, characterized by asymmetrical relationships between nodes, are prevalent across many fields such as sociology, biology, neuroscience, and computer science. The survey systematically reviews the literature, addressing the methodological basis for directed graph clustering and the various applications associated with it.

Methodological Classifications

The authors propose a taxonomy of clustering methods for directed networks:

Naive Graph Transformation: This approach involves converting directed graphs to undirected ones by ignoring edge directionality, often leading to the loss of crucial semantic information.
Transformations Maintaining Directionality: Algorithms here transform directed graphs into undirected versions while retaining directional information through methods like weight adjustments or converting into bipartite graphs. These methods allow the use of clustering techniques designed for undirected networks.
Extending Objective Functions and Methodologies: This category explores extending undirected graph measures like modularity and spectral clustering to accommodate directed edges. Techniques include adapting Laplacian matrices and leveraging spectral properties to improve clustering accuracy in directed scenarios.
Alternative Approaches: These include novel methodologies such as information-theoretic approaches, probabilistic models, and blockmodeling. These methods utilize statistical inference and probabilistic modeling to derive community structures in directed networks.

Clustering Definitions

The paper distinguishes between two primary cluster definitions within directed networks:

Density-based Clusters: Traditional clusters defined by high intra-cluster edge density relative to inter-cluster connections.
Pattern-based Clusters: Nodes are grouped beyond density criteria, such as citation patterns or flow-based structures where specific interaction patterns like information flow define clusters.

Experimental Comparisons

The survey outlines the diverse clustering methodologies, emphasizing the difference in their approach to handling edge directionality and their applicability across different domains:

Density-based Methods: Preferred when edges reflect pairwise relationships.
Pattern-based Methods: Suitable for understanding thematic coherence or information flow within a network.

The authors do not suggest a single preferable method but emphasize selecting one that fits the specific characteristics of the dataset and the application context.

Applications Across Domains

Clustering in directed networks finds applicability in:

Social and Information Networks: Identifying communities or thematic groups within social media, citation networks, and the web graph.
Biological Networks: Analyzing metabolic, gene regulatory, and neural networks where directional interactions are natural.
Neuroscience: Understanding brain structures by analyzing directed interactions within neuronal networks.

Future Directions

The survey highlights significant areas for future research:

Theoretical Development: Establishing a formal and unified framework for clustering in directed networks to standardize evaluations and comparisons.
Algorithm Scalability: Improving algorithm efficiency for large-scale directed networks, leveraging frameworks like MapReduce.
Handling Dynamic Networks: Developing methods for evolving networks that adapt to temporal changes in community structures.
Exploring New Data Types: Extending clustering methodologies to accommodate signed or probabilistic networks, capturing richer interaction semantics.

Concluding Remarks

Directed graph clustering remains a vibrant field with extensive applicability. The research underscores the need for continued exploration and development of methodologies that address the unique challenges posed by directionality in graphs, promising enhanced insights into complex structures across disciplines.

PDF Markdown