Assembling a genome de novo. I have:
- 10X coverage with PAC-BIO reads
- 100X coverage with Illumina short reads (150 bp paired-end reads)
- 20X coverage with long MiSeq reads (max length 800 bp)
Given what I have to work with, what would be the best strategy to assemble the genome and why?
Thank you,
Joe
edit - genome size ~ 1Gb
You should specify the genome type. Some tools will not be able to work on big genomes.
We have similar sets of data and I was wondering what you have decided to use at the end? Will also appreciate if you tell about your experience. Thanks
I ended up using DBG2OLC
What lead me there: https://github.com/PacificBioscience...Bio-Long-Reads
The publication: http://arxiv.org/ftp/arxiv/papers/1410/1410.2801.pdf
The code: http://sourceforge.net/projects/dbg2olc/
I'm quite pleased with the results of DBG2OLC.
I corresponded with the authors, managed to closely replicate the results from their paper, and made some pretty decent draft assemblies of my own with minimal data. Fast performance and good results.