I have an Illumina 150bp whole-genome library of a plant species (genome size ~7.5Gb), using the library I :
1) assembled genome contigs (using SPades, SOAP and platanus)
and
2) map the same library to the contigs from 1) using bwa mem
I discover, using splitsam.sh
from BBMap to give me the total no. of reads, no. of mapped reads and no. of unmapped reads,
the % of unmapped reads in the total no. of reads ranged from 50-90%.
While I am surprised with the high percentage of unmapped reads, I am wondering what is typical % of unmapped reads to genome contigs, particularly for plant genome if possible.
Thank you very much!
What was the percentage of reads that were aligned with
bwa mem
? That should have told you how many reads did not align. If you are not getting good alignments of original reads to the assembly them it is possible that the assembly may be incorrect.You could use
bbmap.sh
to map the reads and the stats file should give you a clear idea of how many reads are mapping.I used
samtools flagstat .bam
for mapping results of the contigs from SPades and gave me the following output :Only 16.51% (which is agreed using
splitsam.sh
) of total reads were mapped... I believe it is a too little?