Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 174 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 38 tok/s Pro
GPT-5 High 34 tok/s Pro
GPT-4o 91 tok/s Pro
Kimi K2 205 tok/s Pro
GPT OSS 120B 438 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

A Novel String Distance Function based on Most Frequent K Characters (1401.6596v1)

Published 25 Jan 2014 in cs.DS and cs.IR

Abstract: This study aims to publish a novel similarity metric to increase the speed of comparison operations. Also the new metric is suitable for distance-based operations among strings. Most of the simple calculation methods, such as string length are fast to calculate but does not represent the string correctly. On the other hand the methods like keeping the histogram over all characters in the string are slower but good to represent the string characteristics in some areas, like natural language. We propose a new metric, easy to calculate and satisfactory for string comparison. Method is built on a hash function, which gets a string at any size and outputs the most frequent K characters with their frequencies. The outputs are open for comparison and our studies showed that the success rate is quite satisfactory for the text mining operations.

Citations (14)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.