Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Tabu Search based clustering algorithm and its parallel implementation on Spark (1702.01396v3)

Published 5 Feb 2017 in cs.DC and cs.SI

Abstract: The well-known K-means clustering algorithm has been employed widely in different application domains ranging from data analytics to logistics applications. However, the K-means algorithm can be affected by factors such as the initial choice of centroids and can readily become trapped in a local optimum. In this paper, we propose an improved K-means clustering algorithm that is augmented by a Tabu Search strategy, and which is better adapted to meet the needs of big data applications. Our design is further enhanced to take advantage of parallel processing based on the Spark framework. Computational experiments demonstrate the superiority of our Tabu Search based clustering algorithm over a widely used version of the K-means approach embodied in Spark MLlib, comparing the algorithms in terms of scalability, accuracy, and effectiveness.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yinhao Lu (1 paper)
  2. Buyang Cao (1 paper)
  3. Cesar Rego (1 paper)
  4. Fred Glover (18 papers)
Citations (38)

Summary

We haven't generated a summary for this paper yet.