Entering edit mode
2.9 years ago
onkar
▴
10
I have a few resequencing data (Illumina DNA Seq) for various cultivars/varieties of same plant.
I had mapped the reads with a published contig level assembled genome and got high mapping percentage (81-95%). Now I have a new chromosome level genome assembly (same variety), but mapping the resequenced files to this genome gives very low mapping percentage (46-58%).
Any idea why this is happening and what may have gone wrong.
The mapping stats are as below for one sample.
With Old Reference Genome:
145054599 + 0 in total (QC-passed reads + QC-failed reads)
1611499 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
142412496 + 0 mapped (98.18% : N/A)
143443100 + 0 paired in sequencing
71721550 + 0 read1
71721550 + 0 read2
131411528 + 0 properly paired (91.61% : N/A)
139931536 + 0 with itself and mate mapped
869461 + 0 singletons (0.61% : N/A)
7957102 + 0 with mate mapped to a different chr
3840456 + 0 with mate mapped to a different chr (mapQ>=5)
With New Reference
146109857 + 0 in total (QC-passed reads + QC-failed reads)
2666757 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
81681317 + 0 mapped (55.90% : N/A)
143443100 + 0 paired in sequencing
71721550 + 0 read1
71721550 + 0 read2
60751250 + 0 properly paired (42.35% : N/A)
66110132 + 0 with itself and mate mapped
12904428 + 0 singletons (9.00% : N/A)
4092218 + 0 with mate mapped to a different chr
3161215 + 0 with mate mapped to a different chr (mapQ>=5)
What program and code are you using to map the reads?
BWA
I also tried with CLC genomics workbench and Bowtie. Results are similar in both cases. The stats here correspond to BWA mapping followed by samtools flagstat for stats
Anyone! any thoughts??
There are several possibilities. The chromosome level assembly may have dropped some sequences or this is related to repeat sequences?
One starting point would be to compare the two genome assemblies and see what is different or look at what reads are no longer mapping, maybe these reads are highly duplicated sequences that mapped to a sequence in the previous assembly?