Question

K-Mer Counting And Constructing Bwt Index In String Graph Assembler (Sga)

1

Entering edit mode

11.9 years ago

ugly.betty77 ★ 1.1k

While going through the code of string graph assembler (SGA) and especially Heng Li's implementation of BCR algorithm for fast indexing of sequence libraries (https://raw.github.com/lh3/ropebwt/master/bcr.c), I came to realize that constructing Burrows Wheeler transform in SGA has many similarities with k-mer counting in de Bruijn graph-based genome assembler. In my naive understanding of BWT (but not BCR code), constructing BWT index is likely to take more time in regions with highly repetitive k-mers. Does anyone has further insight into the process? Are these kind of code and algorithm-related questions appropriate for Biostar?

• 4.4k views

ADD COMMENT • link updated 11.5 years ago by Ketil 4.2k • written 11.9 years ago by ugly.betty77 ★ 1.1k

3

Entering edit mode

Yes, bioinformatics code and algorithm questions are entirely appropriate. It helps if questions are specific, "further insight into the process" is rather vague.

ADD REPLY • link 11.9 years ago by Neilfws 49k

score 1 · Answer 1 · 2013-10-27

1

Entering edit mode

11.5 years ago

Ketil 4.2k

Although I'm not an expert on this, constructing the BWT is essentially the same as constructing a suffix array. Since there are linear time algorithms for this, I don't think repetitive regions will necessarily be slower. But it is possible that there are other algorithms that are used because they are faster in practice, but slow down in specific cases (much like quicksort).

ADD COMMENT • link 11.5 years ago by Ketil 4.2k

0

Entering edit mode

"Since there are linear time algorithms for this"

Thank you Ketil. I do not think the matter is that simple. Constructing BWT for a large read library is not trivial and a number of papers came out last year, mostly from Anthony Cox and colleagues. Heng Li released his ropebwt2 algorithm a week back related to the same topic, but I have not got time to study it in enough detail to comment.

ADD REPLY • link 11.5 years ago by ugly.betty77 ★ 1.1k