I'm not sure I fully understood De Bruin graph when it comes to genome assembly issue.
I know that assembly by De Bruin graphs is based on kmers instead of overlapping sequences. But when the graph is done, finding the eulerian path inside (i.e., the contig) is related to the fact that reads overlap, isn't it? I mean would it be possible to assemble non overlapping reads with a De Bruin graph?
Have you tried looking at the bioinformatics algorithms courses on Coursera? It seems pretty good if you want to understand graph based assembly. They've split it up a bit since I did it, but this one covers sequence assembly using De Bruijn graphs, including lectures and problem sets where you build your own basic assembler. They also address some of the more complex issues around repeats, errors, etc. and how they complicate the graph structures, but they don't expect you to do that yourself!
I'm not quite sure what you mean about assembling non-overlapping reads though, how would you do that at all? With or without graph based methods, without an overlap there's no information about what order your reads should be in and assembly would be basically impossible. Unless you're talking about aligning against a reference?
ADD COMMENT
• link
updated 5.0 years ago by
Ram
44k
•
written 9.1 years ago by
13en
▴
90
0
Entering edit mode
thanks for the link. I totally agree with you about "With or without graph based methods, without an overlap there's no information about what order your reads should be in and assembly would be basically impossible", but I got a discussion with a De Bruijn (DB) graph assembly tool developer and we didn't understood each other about overlapping reads in the context of DB graph, but whatever. I just wanted to be sure assembly was impossible without any overlap information.
They way I see it, there's no overlap in De Bruijn graphs, just information about shared k-mers (although I suppose you could think of this as overlap). Remember, the reads are split into k-mers for De Bruijn graphs. The graph is based on those k-mers, not the reads.
thanks for the link. I totally agree with you about "With or without graph based methods, without an overlap there's no information about what order your reads should be in and assembly would be basically impossible", but I got a discussion with a De Bruijn (DB) graph assembly tool developer and we didn't understood each other about overlapping reads in the context of DB graph, but whatever. I just wanted to be sure assembly was impossible without any overlap information.
They way I see it, there's no overlap in De Bruijn graphs, just information about shared k-mers (although I suppose you could think of this as overlap). Remember, the reads are split into k-mers for De Bruijn graphs. The graph is based on those k-mers, not the reads.
There are overlaps in the de Bruijn graph. Assuming nodes are k-mers, edges correspond to all the exact overlaps of length (k-1) between two nodes.