Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
GPT-5.1
GPT-5.1 29 tok/s
Gemini 2.5 Flash 127 tok/s Pro
Gemini 2.5 Pro 51 tok/s Pro
Kimi K2 184 tok/s Pro
Claude Sonnet 4.5 33 tok/s Pro
2000 character limit reached

An effective web document clustering for information retrieval (1211.1107v1)

Published 6 Nov 2012 in cs.IR

Abstract: The size of web has increased exponentially over the past few years with thousands of documents related to a subject available to the user. With this much amount of information available, it is not possible to take the full advantage of the World Wide Web without having a proper framework to search through the available data. This requisite organization can be done in many ways. In this paper we introduce a combine approach to cluster the web pages which first finds the frequent sets and then clusters the documents. These frequent sets are generated by using Frequent Pattern growth technique. Then by applying Fuzzy C- Means algorithm on it, we found clusters having documents which are highly related and have similar features. We used Gensim package to implement our approach because of its simplicity and robust nature. We have compared our results with the combine approach of (Frequent Pattern growth, K-means) and (Frequent Pattern growth, Cosine_Similarity). Experimental results show that our approach is more efficient then the above two combine approach and can handles more efficiently the serious limitation of traditional Fuzzy C-Means algorithm, which is sensitiveto initial centroid and the number of clusters to be formed.

Citations (4)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.