Papers
Topics
Authors
Recent
2000 character limit reached

On the Complexity of Sorted Neighborhood (1501.01696v1)

Published 8 Jan 2015 in cs.CC and cs.DS

Abstract: Record linkage concerns identifying semantically equivalent records in databases. Blocking methods are employed to avoid the cost of full pairwise similarity comparisons on $n$ records. In a seminal work, Hernandez and Stolfo proposed the Sorted Neighborhood blocking method. Several empirical variants have been proposed in recent years. In this paper, we investigate the complexity of the Sorted Neighborhood procedure on which the variants are built. We show that achieving maximum performance on the Sorted Neighborhood procedure entails solving a sub-problem, which is shown to be NP-complete by reducing from the Travelling Salesman Problem. We also show that the sub-problem can occur in the traditional blocking method. Finally, we draw on recent developments concerning approximate Travelling Salesman solutions to define and analyze three approximation algorithms.

Citations (3)

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.