Attribute Truss Community Search (1609.00090v3)

Published 1 Sep 2016 in cs.DB and cs.DS

Abstract: Recently, community search over graphs has attracted significant attention and many algorithms have been developed for finding dense subgraphs from large graphs that contain given query nodes. In applications such as analysis of protein protein interaction (PPI) networks, citation graphs, and collaboration networks, nodes tend to have attributes. Unfortunately, previously developed community search algorithms ignore these attributes and result in communities with poor cohesion w.r.t. their node attributes. In this paper, we study the problem of attribute-driven community search, that is, given an undirected graph $G$ where nodes are associated with attributes, and an input query $Q$ consisting of nodes $V_q$ and attributes $W_q$, find the communities containing $V_q$, in which most community members are densely inter-connected and have similar attributes. We formulate our problem of finding attributed truss communities (ATC), as finding all connected and close k-truss subgraphs containing $V_q$, that are locally maximal and have the largest attribute relevance score among such subgraphs. We design a novel attribute relevance score function and establish its desirable properties. The problem is shown to be NP-hard. However, we develop an efficient greedy algorithmic framework, which finds a maximal $k$-truss containing $V_q$, and then iteratively removes the nodes with the least popular attributes and shrinks the graph so as to satisfy community constraints. We also build an elegant index to maintain the known $k$-truss structure and attribute information, and propose efficient query processing algorithms. Extensive experiments on large real-world networks with ground-truth communities shows the efficiency and effectiveness of our proposed methods.

Authors (2)

Xin Huang (222 papers)
Laks V. S. Lakshmanan (58 papers)

Citations (207)

View on Semantic Scholar

Summary

The paper presents a novel framework for attribute-driven community search by defining Attributed Truss Communities (ATC) that integrate structural density with attribute similarity.
It proposes a multi-step greedy algorithm with a unique index structure to optimize a challenging, non-monotone attribute score function in community detection.
Extensive experiments on real-world networks show that the method outperforms traditional techniques in efficiency and effectiveness, enhancing community cohesion analysis.

Overview of "Attribute-Driven Community Search" Paper

The academic paper titled "Attribute-Driven Community Search" by Xin Huang and Laks V.S. Lakshmanan addresses the increasing demand for community search within graphs where nodes possess attributes. The authors focus on applications such as protein-protein interaction (PPI) networks, citation graphs, and social networks where node attributes play a crucial role in enhancing community cohesion. These applications reveal a gap as conventional community search methods tend to neglect node attributes, resulting in communities that might not exhibit substantive attribute-based cohesiveness.

Problem Statement and Formulation

The central problem tackled in the paper is termed as "attribute-driven community search." Given an undirected graph with attributes associated with nodes, and an input query consisting of a set of nodes and attributes, the objective is to find communities containing the query nodes where members are densely inter-connected and share similar attributes. The authors coin the solution as finding "Attributed Truss Communities" (ATC), defined as connected and k-truss subgraphs containing the query nodes, optimized for maximal attribute relevance score.

Algorithmic Solution and Challenges

The authors present an algorithmic framework grounded in a greedy strategy to efficiently find ATCs. They delineate this as a multi-step process beginning with the identification of a maximal k-truss community from which nodes are iteratively removed, based on their attribute contribution to optimize attribute score. Recognizing the NP-hard nature of the problem, the paper introduces an elegant index structure to maintain the k-truss information and node attributes, facilitating efficient query processing.

The paper's empirical contribution includes proposing a novel attribute score function, which considers the popularity of attributes within a subgraph by a voting mechanism, thus balancing attribute homogeneity and coverage. A crucial insight offered by the authors is that the attribute score function is non-monotone, non-submodular, and non-supermodular, presenting significant challenges for traditional approximation techniques.

Experimental Evaluation

Extensive experiments on real-world networks demonstrate the paper's solutions significantly outperform existing methods in both efficiency and effectiveness. These experiments underscore the superior capability of attribute-driven techniques to unearth communities that are both structurally coherent and attribute-cohesive, aligning closely with ground-truth communities.

Implications and Future Directions

The paper lays a foundation for more nuanced community detection methods that incorporate attributes, opening avenues for profound applications in biological networks, social media analysis, and data-driven applications. The concept of ATC may be further extended to accommodate more complex node and query structures, including heterogeneous and weighted graphs. Future research can delve into refining the attribute score function and developing novel approximation methods to address the complexity challenges highlighted. Additionally, adaptive methods for dynamic graphs and optimization of index structures present fruitful areas of exploration.

Overall, this paper contributes significantly to the discourse on graph-based community detection by integrating node attribute considerations, thereby enhancing the semantic richness and interpretability of detected communities.

PDF Markdown