Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 148 tok/s
Gemini 2.5 Pro 44 tok/s Pro
GPT-5 Medium 27 tok/s Pro
GPT-5 High 38 tok/s Pro
GPT-4o 85 tok/s Pro
Kimi K2 210 tok/s Pro
GPT OSS 120B 442 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Towards Unbiased BFS Sampling (1102.4599v1)

Published 22 Feb 2011 in cs.SI, cs.NI, and stat.ME

Abstract: Breadth First Search (BFS) is a widely used approach for sampling large unknown Internet topologies. Its main advantage over random walks and other exploration techniques is that a BFS sample is a plausible graph on its own, and therefore we can study its topological characteristics. However, it has been empirically observed that incomplete BFS is biased toward high-degree nodes, which may strongly affect the measurements. In this paper, we first analytically quantify the degree bias of BFS sampling. In particular, we calculate the node degree distribution expected to be observed by BFS as a function of the fraction f of covered nodes, in a random graph RG(pk) with an arbitrary degree distribution pk. We also show that, for RG(pk), all commonly used graph traversal techniques (BFS, DFS, Forest Fire, Snowball Sampling, RDS) suffer from exactly the same bias. Next, based on our theoretical analysis, we propose a practical BFS-bias correction procedure. It takes as input a collected BFS sample together with its fraction f. Even though RG(pk) does not capture many graph properties common in real-life graphs (such as assortativity), our RG(pk)-based correction technique performs well on a broad range of Internet topologies and on two large BFS samples of Facebook and Orkut networks. Finally, we consider and evaluate a family of alternative correction procedures, and demonstrate that, although they are unbiased for an arbitrary topology, their large variance makes them far less effective than the RG(pk)-based technique.

Citations (159)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Questions

We haven't generated a list of open questions mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.