I am interested in de novo assembly of illumina reads belong to an insect with a genome size about 300Mbp.Can anyone help me with the assembler program that I should use? Any manual?
I am interested in de novo assembly of illumina reads belong to an insect with a genome size about 300Mbp.Can anyone help me with the assembler program that I should use? Any manual?
You usually need to run a few different assemblers and see what works best with your data. If you have 2x150bp reads from a single PCR-free library based on gel-free fragment selection, you could try DISCOVAR de novo. Although the DDN authors recommend 250 base reads, reads as short as 150 bases may work. Other options might be SPAdes, SGA, ABySS 2, Meraculous2, and MaSuRCA.
In my experience you can get medium-sized insect genome assemblies with good gene content and contiguity by correcting reads with BFC, assembling contigs with SPAdes (turning off its error correction module, BayesHammer), scaffolding with SGA, and fixing errors with Pilon. If your insect genome is actually much larger than 300Mbp, using SPAdes is probably not a good idea. Platanus is another option, specially if the genome is highly heterozygous, although in my experience you get very poor results with a single paired-end library; you would need reads from at least one mate-pair library. ALLPATHS-LG is another alternative if the paired-end reads overlap and you have at least one mate-pair library. If perhaps you can sequence long reads, you could try a hybrid assembly with SPAdes or other assemblers, too.
spades (http://bioinf.spbau.ru/spades) might be a better option
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Best de novo assembler for insect genome ?
Minia is supposed to be a good choice if you have access to limited compute resources.
Please provide some more details on your data: coverage, is it genomic reads after all, PE insert size etc.?
Hi
When you what to perform assembly few parameters should be considered 1) what genome library (SE/PE/Mate-pair) 2)Insert size 3) Read length 4) Which sequencing platform 5)Quality of your raw reads.
In case if you have low coverage data you go for an assembler which works well for low coverage data.
You can go for popular K-mer construction deburjin graph based assemblers velvet, SOAPdenovo which are very popular and robust softwares. Gives you better N50 statistic.
All are command line, simple to use.
Thank you very much. I did not receive the data yet. So, I don't know the error rate. Reads are 150bp PE and 100X coverage. Illumina non human HiSeq platform.