Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 27 tok/s Pro
GPT-5 High 22 tok/s Pro
GPT-4o 84 tok/s Pro
Kimi K2 195 tok/s Pro
GPT OSS 120B 433 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Lossy Compression of Quality Values via Rate Distortion Theory (1207.5184v1)

Published 21 Jul 2012 in q-bio.GN, cs.IT, math.IT, and q-bio.QM

Abstract: Motivation: Next Generation Sequencing technologies revolutionized many fields in biology by enabling the fast and cheap sequencing of large amounts of genomic data. The ever increasing sequencing capacities enabled by current sequencing machines hold a lot of promise as for the future applications of these technologies, but also create increasing computational challenges related to the analysis and storage of these data. A typical sequencing data file may occupy tens or even hundreds of gigabytes of disk space, prohibitively large for many users. Raw sequencing data consists of both the DNA sequences (reads) and per-base quality values that indicate the level of confidence in the readout of these sequences. Quality values account for about half of the required disk space in the commonly used FASTQ format and therefore their compression can significantly reduce storage requirements and speed up analysis and transmission of these data. Results: In this paper we present a framework for the lossy compression of the quality value sequences of genomic read files. Numerical experiments with reference based alignment using these quality values suggest that we can achieve significant compression with little compromise in performance for several downstream applications of interest, as is consistent with our theoretical analysis. Our framework also allows compression in a regime - below one bit per quality value - for which there are no existing compressors.

Citations (9)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.