Question

what is the benefit of the de bruijn algorithm for short read assembly?

4

Entering edit mode

8.1 years ago

seta ★ 1.9k

Hi all,

First of sorry if the question is a basic for you. Could you please explain to me why the short reads are broken into shorter fragments (K-mer) and then found the overlap k-mer by the de Bruijn algorithm-based assembler software for transcriptome assembly, instead of using the entire read for finding the overlap segments? Please let me kindly know what is the benefit of such algorithms in relative to overlap algorithm used by CAP3?

Thanks

Assembly de Bruijn algorithm K-mer • 2.7k views

ADD COMMENT • link updated 2.6 years ago by Michael 55k • written 8.1 years ago by seta ★ 1.9k

1

Entering edit mode

Time... It takes a lot more time to do it the old fashion way (with overlap), every read has to be checked with every other read.

ADD REPLY • link 8.1 years ago by Benn 8.3k

1

Entering edit mode

If you like tutorials in-addition to papers Homolog-blog covers really cool things about denovo assembly in general. Take a look :)

ADD REPLY • link 8.1 years ago by Rohit ★ 1.5k

0

Entering edit mode

This is a slightly older paper but in case you have not seen it, it would be useful.

ADD REPLY • link 8.1 years ago by GenoMax 147k

0

Entering edit mode

You might file following papers useful.

Comparison of De Novo Genome Assembly Software

Sequence assembly demystified

ADD REPLY • link 8.1 years ago by Sej Modha 5.3k

score 1 · Answer 1 · 2022-04-28

The advantages are time- and space complexity. Time complexity for building the graph is O(n * k) with n: number of bases sequenced and k: -kmer length. Space complexity for storing the graph is O(g * k) with g: genome-or transcriptome size which is often g << n (with sufficiently high coverage).

Of course, this needs to be compared to another algorithm which would be any Overlap-Layout-Consensus algorithm design. Such an algorithm would essentially have quadratic complexity (both storage and time) over the number of reads for generating and storing the overlap graph.