Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 44 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 13 tok/s Pro
GPT-5 High 15 tok/s Pro
GPT-4o 86 tok/s Pro
Kimi K2 208 tok/s Pro
GPT OSS 120B 447 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

Faster Algorithms for Text-to-Pattern Hamming Distances (2310.13174v3)

Published 19 Oct 2023 in cs.DS

Abstract: We study the classic Text-to-Pattern Hamming Distances problem: given a pattern $P$ of length $m$ and a text $T$ of length $n$, both over a polynomial-size alphabet, compute the Hamming distance between $P$ and $T[i\, .\, . \, i+m-1]$ for every shift $i$, under the standard Word-RAM model with $\Theta(\log n)$-bit words. - We provide an $O(n\sqrt{m})$ time Las Vegas randomized algorithm for this problem, beating the decades-old $O(n \sqrt{m \log m})$ running time [Abrahamson, SICOMP 1987]. We also obtain a deterministic algorithm, with a slightly higher $O(n\sqrt{m}(\log m\log\log m){1/4})$ running time. Our randomized algorithm extends to the $k$-bounded setting, with running time $O\big(n+\frac{nk}{\sqrt{m}}\big)$, removing all the extra logarithmic factors from earlier algorithms [Gawrychowski and Uzna\'{n}ski, ICALP 2018; Chan, Golan, Kociumaka, Kopelowitz and Porat, STOC 2020]. - For the $(1+\epsilon)$-approximate version of Text-to-Pattern Hamming Distances, we give an $\tilde{O}(\epsilon{-0.93}n)$ time Monte Carlo randomized algorithm, beating the previous $\tilde{O}(\epsilon{-1}n)$ running time [Kopelowitz and Porat, FOCS 2015; Kopelowitz and Porat, SOSA 2018]. Our approximation algorithm exploits a connection with $3$SUM, and uses a combination of Fredman's trick, equality matrix product, and random sampling; in particular, we obtain new results on approximate counting versions of $3$SUM and Exact Triangle, which may be of independent interest. Our exact algorithms use a novel combination of hashing, bit-packed FFT, and recursion; in particular, we obtain a faster algorithm for computing the sumset of two integer sets, in the regime when the universe size is close to quadratic in the number of elements. We also prove a fine-grained equivalence between the exact Text-to-Pattern Hamming Distances problem and a range-restricted, counting version of $3$SUM.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (78)
  1. Karl R. Abrahamson. Generalized string matching. SIAM J. Comput., 16(6):1039–1051, 1987. doi:10.1137/0216067.
  2. Pattern matching in the Hamming distance with thresholds. Inf. Process. Lett., 111(14):674–677, 2011. doi:10.1016/j.ipl.2011.04.004.
  3. Efficient matching of nonrectangular shapes. Ann. Math. Artif. Intell., 4:211–224, 1991. doi:10.1007/BF01531057.
  4. A lower-variance randomized algorithm for approximate string matching. Inf. Process. Lett., 113(18):690–692, 2013. doi:10.1016/j.ipl.2013.06.005.
  5. Faster algorithms for string matching with k𝑘kitalic_k mismatches. J. Algorithms, 50(2):257–275, 2004. doi:10.1016/S0196-6774(03)00097-X.
  6. Faster knapsack algorithms via bounded monotone min-plus-convolution. In Proc. 49th International Colloquium on Automata, Languages, and Programming (ICALP), volume 229, pages 31:1–31:21, 2022. doi:10.4230/LIPIcs.ICALP.2022.31.
  7. Elliptic curve fast Fourier transform (ECFFT) part I: Low-degree extension in time O(n log n) over all finite fields. In Proc. ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 700–737, 2023. doi:10.1137/1.9781611977554.ch30.
  8. Subquadratic algorithms for 3SUM. Algorithmica, 50(4):584–596, 2008. doi:10.1007/s00453-007-9036-3.
  9. Fast and compact regular expression matching. Theor. Comput. Sci., 409(3):486–496, 2008. doi:10.1016/j.tcs.2008.08.042.
  10. Sparse nonnegative convolution is equivalent to dense nonnegative convolution. In Proc. 53rd Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 1711–1724, 2021. doi:10.1145/3406325.3451090.
  11. Deterministic and Las Vegas algorithms for sparse nonnegative convolution. In Proc. 2022 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 3069–3090, 2022. doi:10.1137/1.9781611977073.119.
  12. Top-k𝑘kitalic_k-convolution and the quest for near-linear output-sensitive subset sum. In Proc. 52nd Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 982–995, 2020. doi:10.1145/3357713.3384308.
  13. Fast n𝑛nitalic_n-fold boolean convolution via additive combinatorics. In Proc. 48th International Colloquium on Automata, Languages, and Programming (ICALP), volume 198, pages 41:1–41:17, 2021. doi:10.4230/LIPIcs.ICALP.2021.41.
  14. A fine-grained perspective on approximating subset sum and partition. In Proc. 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1797–1815, 2021. doi:10.1137/1.9781611976465.108.
  15. Faster regular expression matching. In Proc. 36th International Colloquium on Automata, Languages, and Programming (ICALP), pages 171–182, 2009. doi:10.1007/978-3-642-02927-1_16.
  16. The k-mismatch problem revisited. In Proc. 27th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2039–2052, 2016. doi:10.1137/1.9781611974331.ch142.
  17. Approximating text-to-pattern Hamming distances. In Proc. 52nd Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 643–656, 2020. doi:10.1145/3357713.3384266.
  18. Approximate string matching: A simpler faster algorithm. SIAM J. Comput., 31(6):1761–1782, 2002. doi:10.1137/S0097539700370527.
  19. On the change-making problem. In Proc. 3rd SIAM Symposium on Simplicity in Algorithms (SOSA), pages 38–42, 2020. doi:10.1137/1.9781611976014.7.
  20. Reducing 3SUM to convolution-3SUM. In Proc. 3rd Symposium on Simplicity in Algorithms (SOSA), pages 1–7, 2020. doi:10.1137/1.9781611976014.1.
  21. Timothy M. Chan. More algorithms for all-pairs shortest paths in weighted graphs. SIAM J. Comput., 39(5):2075–2089, 2010. doi:10.1137/08071990X.
  22. Timothy M. Chan. Approximation schemes for 0-1 knapsack. In Proc. 1st Symposium on Simplicity in Algorithms (SOSA), volume 61, pages 5:1–5:12, 2018. doi:10.4230/OASIcs.SOSA.2018.5.
  23. Timothy M. Chan. More logarithmic-factor speedups for 3SUM, (median,+)-convolution, and some geometric 3SUM-hard problems. ACM Trans. Algorithms, 16(1):7:1–7:23, 2020. doi:https://doi.org/10.1145/3363541.
  24. Clustered integer 3SUM via additive combinatorics. In Proc. 47th Annual ACM Symposium on Theory of Computing (STOC), pages 31–40, 2015. doi:10.1145/2746539.2746568.
  25. Raphaël Clifford. Matrix multiplication and pattern matching under Hamming norm, 2009. URL: https://web.archive.org/web/20160818144748/http://www.cs.bris.ac.uk/Research/Algorithms/events/BAD09/BAD09/Talks/BAD09-Hammingnotes.pdf.
  26. A nearly quadratic-time FPTAS for knapsack. CoRR, abs/2308.07821, 2023. arXiv:2308.07821, doi:10.48550/arXiv.2308.07821.
  27. A subquadratic sequence alignment algorithm for unrestricted scoring matrices. SIAM J. Comput., 32(6):1654–1673, 2003. doi:10.1137/S0097539702402007.
  28. Pattern matching for spatial point sets. In Proc. 39th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 156–165, 1998. doi:10.1109/SFCS.1998.743439.
  29. Fredman’s trick meets dominance product: Fine-grained complexity of unweighted APSP, 3SUM counting, and more. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing (STOC), pages 419–432. ACM, 2023. doi:10.1145/3564246.3585237.
  30. Martin Dietzfelbinger. Universal hashing and k𝑘kitalic_k-wise independent random variables via integer arithmetic without primes. In Proc. 13th Annual Symposium on Theoretical Aspects of Computer Science (STACS), volume 1046, pages 569–580, 1996. doi:10.1007/3-540-60922-9_46.
  31. Approximating knapsack and partition via dense subset sums. In Proc. 2023 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2961–2979, 2023. doi:10.1137/1.9781611977554.ch113.
  32. Faster matrix multiplication via asymmetric hashing. CoRR, abs/2210.10173, 2022. To appear in FOCS 2023. arXiv:2210.10173, doi:10.48550/arXiv.2210.10173.
  33. Exploiting word-level parallelism for fast convolutions and their applications in approximate string matching. Eur. J. Comb., 34(1):38–51, 2013. doi:10.1016/j.ejc.2012.07.013.
  34. String matching and other products. In Complexity of Computation, RM Karp (editor), SIAM-AMS Proceedings, volume 7, pages 113–125, 1974.
  35. Michael L. Fredman. New bounds on the complexity of the shortest path problem. SIAM J. Comput., 5(1):83–89, 1976. doi:10.1137/0205006.
  36. Martin Fürer. How fast can we multiply large integers on an actual computer? In Proc. 11th Latin American Symposium on Theoretical Informatics (LATIN), volume 8392, pages 660–670. Springer, 2014. doi:10.1007/978-3-642-54423-1_57.
  37. Improved string matching with k𝑘kitalic_k mismatches. SIGACT News, 17(4):52–54, 1986. doi:10.1145/8307.8309.
  38. Threesomes, degenerates, and love triangles. J. ACM, 65(4):22:1–22:25, 2018. doi:10.1145/3185378.
  39. Szymon Grabowski. New tabulation and sparse dynamic programming based techniques for sequence similarity problems. Discret. Appl. Math., 212:96–103, 2016. doi:10.1016/j.dam.2015.10.040.
  40. Dominance product and high-dimensional closest pair under L∞subscript𝐿L_{\infty}italic_L start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT. In Proc. 28th International Symposium on Algorithms and Computation (ISAAC), volume 92, pages 39:1–39:12, 2017. doi:10.4230/LIPIcs.ISAAC.2017.39.
  41. Towards unified approximate pattern matching for Hamming and L11{}_{\mbox{1}}start_FLOATSUBSCRIPT 1 end_FLOATSUBSCRIPT distance. In Proc. 45th International Colloquium on Automata, Languages, and Programming (ICALP), volume 107, pages 62:1–62:13, 2018. doi:10.4230/LIPIcs.ICALP.2018.62.
  42. Faster polynomial multiplication over finite fields. J. ACM, 63(6):52:1–52:23, 2017. doi:10.1145/3005344.
  43. Piotr Indyk. Faster algorithms for string matching problems: Matching the convolution bound. In Proc. 39th Annual Symposium on Foundations of Computer Science (FOCS), pages 166–173, 1998. doi:10.1109/SFCS.1998.743440.
  44. Ce Jin. An improved FPTAS for 0-1 knapsack. In Proc. 46th International Colloquium on Automata, Languages, and Programming (ICALP), volume 132, pages 76:1–76:14, 2019. doi:10.4230/LIPIcs.ICALP.2019.76.
  45. The one-way communication complexity of Hamming distance. Theory Comput., 4(1):129–135, 2008. doi:10.4086/toc.2008.v004a006.
  46. Howard J. Karloff. Fast algorithms for approximately counting mismatches. Inf. Process. Lett., 48(2):53–60, 1993. doi:10.1016/0020-0190(93)90177-B.
  47. Breaking the variance: Approximating the Hamming distance in O~⁢(1/ε)~𝑂1𝜀\tilde{O}(1/\varepsilon)over~ start_ARG italic_O end_ARG ( 1 / italic_ε ) time per alignment. In Proc. IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS), pages 601–613, 2015. doi:10.1109/FOCS.2015.43.
  48. A simple algorithm for approximating the text-to-pattern Hamming distance. In Proc. 1st Symposium on Simplicity in Algorithms (SOSA), volume 61, pages 10:1–10:5, 2018. doi:10.4230/OASIcs.SOSA.2018.10.
  49. Higher lower bounds from the 3SUM conjecture. In Proc. 27th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1272–1287, 2016. doi:10.1137/1.9781611974331.ch89.
  50. Novel polynomial basis with fast fourier transform and its application to reed-solomon erasure codes. IEEE Trans. Inf. Theory, 62(11):6284–6299, 2016. doi:10.1109/TIT.2016.2608892.
  51. Approximate pattern matching with the L1subscript𝐿1L_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and L∞subscript𝐿L_{\infty}italic_L start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT metrics. Algorithmica, 60(2):335–348, 2011. doi:10.1007/s00453-009-9345-9.
  52. Monochromatic triangles, intermediate matrix products, and convolutions. In Proc. 11th Innovations in Theoretical Computer Science Conference (ITCS), volume 151, pages 53:1–53:18, 2020. doi:10.4230/LIPIcs.ITCS.2020.53.
  53. François Le Gall and Florent Urrutia. Improved rectangular matrix multiplication using powers of the Coppersmith-Winograd tensor. In Proc. 29th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1029–1046, 2018. doi:10.1137/1.9781611975031.67.
  54. Hamming distance completeness. In Proc. 30th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 128, pages 14:1–14:17, 2019. doi:10.4230/LIPIcs.CPM.2019.14.
  55. Efficient string matching with k𝑘kitalic_k mismatches. Theor. Comput. Sci., 43:239–249, 1986. doi:10.1016/0304-3975(86)90178-7.
  56. Fast parallel and serial approximate string matching. J. Algorithms, 10(2):157–169, 1989. doi:10.1016/0196-6774(89)90010-2.
  57. Xiao Mao. (1-ε𝜀\varepsilonitalic_ε)-approximation of knapsack in nearly quadratic time. CoRR, abs/2308.07004, 2023. arXiv:2308.07004, doi:10.48550/arXiv.2308.07004.
  58. Jiří Matoušek. Computing dominances in Ensuperscript𝐸𝑛E^{n}italic_E start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Inf. Process. Lett., 38(5):277–278, 1991. doi:10.1016/0020-0190(91)90071-O.
  59. A faster algorithm computing string edit distances. J. Comput. Syst. Sci., 20(1):18–31, 1980. doi:10.1016/0022-0000(80)90002-1.
  60. A subquadratic approximation scheme for partition. In Proc. 30th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 70–88, 2019. doi:10.1137/1.9781611975482.5.
  61. Gene Myers. A four russians algorithm for regular expression pattern matching. J. ACM, 39(2):432–448, apr 1992. doi:10.1145/128749.128755.
  62. Mihai Pătraşcu. Towards polynomial lower bounds for dynamic problems. In Proc. 42nd ACM Symposium on Theory of Computing (STOC), pages 603–610, 2010. doi:10.1145/1806689.1806772.
  63. Victor Shoup. New algorithms for finding irreducible polynomials over finite fields. In Proc. 29th Annual Symposium on Foundations of Computer Science (FOCS), pages 283–290, 1988. doi:10.1109/SFCS.1988.21944.
  64. Approximating approximate pattern matching. In Proc. 30th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 128, pages 15:1–15:13, 2019. doi:10.4230/LIPIcs.CPM.2019.15.
  65. Efficient approximate and dynamic matching of patterns using a labeling paradigm (extended abstract). In Proc. 37th Annual Symposium on Foundations of Computer Science (FOCS), pages 320–328, 1996. doi:10.1109/SFCS.1996.548491.
  66. Tadao Takaoka. Subcubic cost algorithms for the all pairs shortest path problem. Algorithmica, 20(3):309–318, 1998. doi:10.1007/PL00009198.
  67. Mikkel Thorup. Randomized sorting in O⁢(n⁢log⁡log⁡n)𝑂𝑛𝑛O(n\log\log n)italic_O ( italic_n roman_log roman_log italic_n ) time and linear space using addition, shift, and bit-wise boolean operations. J. Algorithms, 42(2):205–230, 2002. doi:10.1006/jagm.2002.1211.
  68. Przemysław Uznański. Approximating text-to-pattern distance via dimensionality reduction. In Proc. 31st Annual Symposium on Combinatorial Pattern Matching (CPM), volume 161, pages 29:1–29:11, 2020. doi:10.4230/LIPIcs.CPM.2020.29.
  69. Przemysław Uznański. Recent advances in text-to-pattern distance algorithms. In Beyond the Horizon of Computability - 16th Conference on Computability in Europe (CiE), volume 12098, pages 353–365, 2020. doi:10.1007/978-3-030-51466-2_32.
  70. Virginia Vassilevska Williams. Problem 2 on problem set 2 of CS367, October 15, 2015. URL: http://theory.stanford.edu/~virgi/cs367/hw2.pdf.
  71. Finding, minimizing, and counting weighted subgraphs. In Proc. 41st Annual ACM Symposium on Theory of Computing (STOC), pages 455–464, 2009. doi:10.1137/09076619X.
  72. Virginia Vassilevska Williams and R. Ryan Williams. Subcubic equivalences between path, matrix, and triangle problems. J. ACM, 65(5):27:1–27:38, 2018. doi:10.1145/3186893.
  73. New bounds for matrix multiplication: from alpha to omega. CoRR, abs/2307.07970, 2023. To appear in SODA 2024. arXiv:2307.07970, doi:10.48550/arXiv.2307.07970.
  74. Joachim von zur Gathen and Jürgen Gerhard. Modern Computer Algebra. Cambridge University Press, 2013.
  75. Improved approximation schemes for (un-)bounded subset-sum and partition. CoRR, abs/2212.02883, 2022. arXiv:2212.02883, doi:10.48550/arXiv.2212.02883.
  76. R. Ryan Williams. Faster all-pairs shortest paths via circuit complexity. SIAM J. Comput., 47(5):1965–1985, 2018. doi:10.1137/15M1024524.
  77. David P. Woodruff. Optimal space lower bounds for all frequency moments. In Proc. 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 167–175, 2004. URL: https://dl.acm.org/doi/10.5555/982792.982817.
  78. Raphael Yuster. Efficient algorithms on sets of permutations, dominance, and real-weighted APSP. In Proc. 20th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 950–957, 2009. URL: https://dl.acm.org/doi/10.5555/1496770.1496873.
Citations (5)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube