I have got overlapping MiSeq 2x250bp reads (after merging single-end 400-450bp). The genome size is ~20Mb. I think de Bruijn graph based assemblers is not the way to proceed with such dataset, isn't it?
Have you had some experienced assembling this kind of data? Maybe some 'good-old-times' (overlap-based) assembler can handle it better?
What is the sequencing library insert size? If 500-600 or below, you may try to find overlaps within pair of reads with Quake. Also you may get better overlaps if you error correct prior to Quake.
I did, so what I'm playing with is single reads of 350-450bp (100x) and paired reads (2x250bp) that didn't merge correctly (50x).
What organism do you have that is 20Mb? Small end of the eukaryotes? If you have good coverage (this is key) then any of the suites of assemblers will do. I like velvet for a genome this size. Someone else might like another assembler. Depending on your sequencing depth and gene space you'll probably have to do some post assembly clean up. I like PAGIT for that.
it's average fungal genome. thing is, the genome is quite heterozygous, so de Bruijn graph assemblers (Velvet, SOAP, ABySS) are having hard times and shattering it a lot... I'm more into older-style assembler like Newbler or Celera. Anyone tried it with MiSeq?
I work with fungi too. Sounds like 20Mb is in yeast territory, so it's on the smaller size. I am working on assembly of a few in the 40 to 60 Mb range and they also have high heterozygosity. We still use de Bruijn style assemblers mainly, but I also use Newbler on occasion. I think coverage is key. My suggestion is to try Newbler and see how the assembly compares to a de Bruijn like Velvet. Good luck and let me know if you want to commiserate with me about it!
Thanks Josh. I have quickly tried SOAPdenovo sometime ago and it performed below my expectations... this is why I want to try something old style. Maybe I will give a try to ALLPaths as BROAD made it in overlapping reads in mind... Anyway, I will keep you posted.
@Leszek, I don't think you can use ALLPATHS this way (with just one library), to my knowledge. Unless, there is some hack I don't know about. With a genome this size you should be able to benchmark numerous methods in a reasonable amount of time. I agree with Josh in the approach, I'd run Newbler and VelvetOptimser and see how they compare, given your read lengths.