Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On Codes for the Noisy Substring Channel (2102.01412v4)

Published 2 Feb 2021 in cs.IT and math.IT

Abstract: We consider the problem of coding for the substring channel, in which information strings are observed only through their (multisets of) substrings. Due to existing DNA sequencing techniques and applications in DNA-based storage systems, interest in this channel has renewed in recent years. In contrast to existing literature, we consider a noisy channel model where information is subject to noise before its substrings are sampled, motivated by in-vivo storage. We study two separate noise models, substitutions or deletions. In both cases, we examine families of codes which may be utilized for error-correction and present combinatorial bounds on their sizes. Through a generalization of the concept of repeat-free strings, we show that the added required redundancy due to this imperfect observation assumption is sublinear, either when the fraction of errors in the observed substring length is sufficiently small, or when that length is sufficiently long. This suggests that no asymptotic cost in rate is incurred by this channel model in these cases. Moreover, we develop an efficient encoder for such constrained strings in some cases. Finally, we show how a similar encoder can be used to avoid formation of secondary-structures in coded DNA strands, even when accounting for imperfect structures.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. G. M. Church, Y. Gao, and S. Kosuri, “Next-generation digital information storage in DNA,” Science, vol. 337, no. 6102, pp. 1628–1628, 2012.
  2. F. Balado, “Capacity of DNA data embedding under substitution mutations,” IEEE Trans. Inf. Theory, vol. 59, no. 2, pp. 928–941, Feb. 2013.
  3. P. C. Wong, K.-k. Wong, and H. Foote, “Organic data memory using the DNA approach,” Commun. ACM, vol. 46, no. 1, pp. 95–98, Jan. 2003.
  4. S. L. Shipman, J. Nivala, J. D. Macklis, and G. M. Church, “CRISPR-Cas encoding of a digital movie into the genomes of a population of living bacteria,” Nature, vol. 547, p. 345, Jul. 2017.
  5. M. Arita and Y. Ohashi, “Secret signatures inside genomic DNA,” Biotechnology Progress, vol. 20, no. 5, pp. 1605–1607, 2004.
  6. D. Heider and A. Barnekow, “DNA-based watermarks using the DNA-Crypt algorithm,” BMC Bioinf., vol. 8, no. 1, pp. 176–185, May 2007.
  7. M. Liss, D. Daubert, K. Brunner, K. Kliche, U. Hammes, A. Leiherer, and R. Wagner, “Embedding permanent watermarks in synthetic genes,” PLoS ONE, vol. 7, no. 8, p. e42465, 2012.
  8. D. C. Jupiter, T. A. Ficht, J. Samuel, Q.-M. Qin, and P. de Figueiredo, “DNA watermarking of infectious agents: Progress and prospects,” PLoS pathogens, vol. 6, no. 6, p. e1000950, 2010.
  9. C. T. Clelland, V. Risca, and C. Bancroft, “Hiding messages in DNA microdots,” Nature, vol. 399, no. 6736, pp. 533–534, 1999.
  10. R. Heckel, G. Mikutis, and R. N. Grass, “A characterization of the DNA data storage channel,” Scientific reports, vol. 9, no. 1, p. 9663, Jul. 2019.
  11. O. Sabary, Y. Orlev, R. Shafir, L. Anavy, E. Yaakobi, and Z. Yakhini, “SOLQC: Synthetic oligo library quality control tool,” Bioinformatics, vol. 37, no. 5, pp. 720–722, Mar. 2021.
  12. H. M. Kiah, G. J. Puleo, and O. Milenkovic, “Codes for DNA sequence profiles,” IEEE Trans. Inf. Theory, vol. 62, no. 6, pp. 3125–3146, Jun. 2016.
  13. A. Lenz, P. H. Siegel, A. Wachter-Zeh, and E. Yaakobi, “Coding over sets for DNA storage,” IEEE Trans. Inf. Theory, vol. 66, no. 4, pp. 7682–7696, Apr. 2020.
  14. J. Sima, N. Raviv, and J. Bruck, “Robust indexing - optimal codes for DNA storage,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Los Angeles, CA, USA, Jun. 2020, pp. 717–722.
  15. ——, “On coding over sliced information,” IEEE Trans. Inf. Theory, vol. 67, no. 5, pp. 2793–2807, May 2021.
  16. E. Ukkonen, “Approximate string-matching with q-grams and maximal matches,” Theoretical Computer Science, vol. 92, no. 1, pp. 191–211, Jan. 1992.
  17. J. Acharya, H. Das, O. Milenkovic, A. Orlitsky, and S. Pan, “String reconstruction from substring compositions,” SIAM J. Discrete Math., vol. 29, no. 3, pp. 1340–1371, 2015.
  18. I. Shomorony, T. A. Courtade, and D. Tse, “Fundamental limits of genome assembly under an adversarial erasure model,” IEEE Trans. Mol., Bio. and Multi-Scale Commun., vol. 2, no. 2, pp. 199–208, Dec. 2016.
  19. Z. Chang, J. Chrisnata, M. F. Ezerman, and H. M. Kiah, “Rates of DNA sequence profiles for practical values of read lengths,” IEEE Trans. Inf. Theory, vol. 63, no. 11, pp. 7166–7177, Nov. 2017.
  20. R. Gabrys and O. Milenkovic, “Unique reconstruction of coded strings from multiset substring spectra,” IEEE Trans. Inf. Theory, vol. 65, no. 12, pp. 7682–7696, Dec. 2019.
  21. O. Elishco, R. Gabrys, M. Médard, and E. Yaakobi, “Repeat-free codes,” IEEE Trans. Inf. Theory, vol. 67, no. 9, pp. 5749–5764, Sep. 2021.
  22. S. Marcovich and E. Yaakobi, “Reconstruction of strings from their substrings spectrum,” IEEE Trans. Inf. Theory, vol. 67, no. 7, pp. 4369–4384, Jul. 2021.
  23. J. Chrisnata, H. M. Kiah, S. Rao Karingula, A. Vardy, E. Yaakobi, and H. Yao, “On the number of distinct k𝑘kitalic_k-decks: Enumeration and bounds,” Advances in Mathematics of Communications, vol. 17, no. 4, pp. 960–978, Aug. 2023.
  24. A. Lenz, P. H. Siegel, A. Wachter-Zeh, and E. Yaakobi, “Anchor-based correction of substitutions in indexed sets,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Paris, France, Jul. 2019, pp. 757–761.
  25. N. Raviv, M. Schwartz, and E. Yaakobi, “Rank-modulation codes for DNA storage with shotgun sequencing,” IEEE Trans. Inf. Theory, vol. 65, no. 1, pp. 50–64, Jan. 2019.
  26. N. Beeri and M. Schwartz, “Improved rank-modulation codes for DNA storage with shotgun sequencing,” IEEE Trans. Inf. Theory, vol. 68, no. 6, pp. 3719–3730, Jun. 2022.
  27. S. Jain, F. Farnoud, M. Schwartz, and J. Bruck, “Noise and uncertainty in string-duplication systems,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Aachen, Germany, Jun. 2017, pp. 3120–3124.
  28. N. Alon, J. Bruck, F. Farnoud, and S. Jain, “Duplication distance to the root for binary sequences,” IEEE Trans. Inf. Theory, vol. 63, no. 12, pp. 7793–7803, Dec. 2017.
  29. F. Farnoud, M. Schwartz, and J. Bruck, “Estimation of duplication history under a stochastic model for tandem repeats,” BMC Bioinf., vol. 20, no. 1, pp. 64–74, Feb. 2019.
  30. O. Milenkovic and N. Kashyap, “On the design of codes for DNA computing,” in Proc. Int. Workshop on Coding and Cryptography (WCC), 2005, Bergen, Norway, ser. Lecture Notes in Computer Science, Ø. Ytrehus, Ed., vol. 3969.   Berlin, Heidelberg: Springer-Verlag Berlin Heidelberg, 2006, pp. 100–119.
  31. ——, “DNA codes that avoid secondary structures,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Adelaide, SA, Australia, Sep. 2005, pp. 288–292.
  32. K. G. Benerjee and A. Banerjee, “On homopolymers and secondary structures avoiding, reversible, reversible-complement and GC-balanced DNA codes,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Espoo, Finland, Jun. 2022, pp. 204–209.
  33. T. T. Nguyen, K. Cai, H. M. Kiah, D. T. Dao, and K. A. Schouhamer Immink, “On the design of codes for DNA computing: Secondary structure avoidance codes,” arXiv preprint arXiv:2302.13714v1, Feb. 2023. [Online]. Available: https://arxiv.org/abs/2302.13714v1
  34. D. Bar-Lev, A. Kobovich, O. Leitersdorf, and E. Yaakobi, “Universal framework for parametric constrained coding,” arXiv preprint arXiv:2212.09314v1, Apr. 2023. [Online]. Available: https://arxiv.org/abs/2304.01317v1
  35. J. Spencer, “Asymptotic lower bounds for Ramsey functions,” Discrete Math., vol. 20, pp. 69–76, 1977.
  36. T. M. Cover, “Enumerative source encoding,” IEEE Trans. Inf. Theory, vol. 19, no. 1, pp. 73–77, Jan. 1973.
  37. Y. Yehezkeally, D. Bar-Lev, S. Marcovich, and E. Yaakobi, “Generalized unique reconstruction from substrings,” IEEE Trans. Inf. Theory, vol. 69, no. 9, pp. 5648–5659, Sep. 2023.
  38. M. Levy and E. Yaakobi, “Mutually uncorrelated codes for DNA storage,” IEEE Trans. Inf. Theory, vol. 65, no. 6, pp. 3671–3691, Jun. 2019.
Citations (7)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com