I am conducting de novo assembly of ~33Mb genome using 454 and Illumina reads. I cannot use MIRA, since I have ~80M Illumina reads (would require ~160Gb memory). So far I have found that it's usually most efficient to first assemble reads with Newbler and Velvet, respectively, and then combine the results using some third assembly program. I have been using CAP3 for the last step but I'm not satisfied with the results.
Statistics for the intermediate and final assemblies can be seen below. The problem is that CAP3 results are worse compared to the intermediate ones. It seems that CAP3 throws most of the contigs away. Two questions:
- Should I use some specific options for CAP3 when conducting the final assembly
- Are there any ready-made pipeline for doing this kind of 'integration' more effectively?
Statistics for the CAP3 output:
Number of contigs 826
Total size of contigs 5220088
Longest contig 37928
Mean contig size 6320
Median contig size 3734
N50 contig length 12593
L50 contig count 130
Statistics for Newbler output:
Number of contigs 1942
Total size of contigs 32110351
Longest contig 170575
Mean contig size 16535
Median contig size 8447
N50 contig length 37018
L50 contig count 272
Statistics for Velvet output:
Number of contigs 4939
Total size of contigs 34602711
Longest contig 134827
Mean contig size 7006
Median contig size 3463
N50 contig length 15446
L50 contig count 662