Papers
Topics
Authors
Recent
Search
2000 character limit reached

Scalable De Novo Genome Assembly Using Pregel

Published 13 Jan 2018 in cs.DC | (1801.04453v1)

Abstract: De novo genome assembly is the process of stitching short DNA sequences to generate longer DNA sequences, without using any reference sequence for alignment. It enables high-throughput genome sequencing and thus accelerates the discovery of new genomes. In this paper, we present a toolkit, called PPA-assembler, for de novo genome assembly in a distributed setting. The operations in our toolkit provide strong performance guarantees, and can be assembled to implement various sequencing strategies. PPA-assembler adopts the popular {\em de Bruijn graph} based approach for sequencing, and each operation is implemented as a program in Google's Pregel framework for big graph processing. Experiments on large real and simulated datasets demonstrate that PPA-assembler is much more efficient than the state-of-the-arts and provides good sequencing quality.

Citations (6)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.