Variation in mapping percentage with genome
0
0
Entering edit mode
2.9 years ago
onkar ▴ 10

I have a few resequencing data (Illumina DNA Seq) for various cultivars/varieties of same plant.

I had mapped the reads with a published contig level assembled genome and got high mapping percentage (81-95%). Now I have a new chromosome level genome assembly (same variety), but mapping the resequenced files to this genome gives very low mapping percentage (46-58%).

Any idea why this is happening and what may have gone wrong.

The mapping stats are as below for one sample.

With Old Reference Genome:

145054599 + 0 in total (QC-passed reads + QC-failed reads)
1611499 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
142412496 + 0 mapped (98.18% : N/A)
143443100 + 0 paired in sequencing
71721550 + 0 read1
71721550 + 0 read2
131411528 + 0 properly paired (91.61% : N/A)
139931536 + 0 with itself and mate mapped
869461 + 0 singletons (0.61% : N/A)
7957102 + 0 with mate mapped to a different chr
3840456 + 0 with mate mapped to a different chr (mapQ>=5)

With New Reference

146109857 + 0 in total (QC-passed reads + QC-failed reads)
2666757 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
81681317 + 0 mapped (55.90% : N/A)
143443100 + 0 paired in sequencing
71721550 + 0 read1
71721550 + 0 read2
60751250 + 0 properly paired (42.35% : N/A)
66110132 + 0 with itself and mate mapped
12904428 + 0 singletons (9.00% : N/A)
4092218 + 0 with mate mapped to a different chr
3161215 + 0 with mate mapped to a different chr (mapQ>=5)
samtools Mapping BWA • 1.1k views
ADD COMMENT
0
Entering edit mode

What program and code are you using to map the reads?

ADD REPLY
0
Entering edit mode

BWA

I also tried with CLC genomics workbench and Bowtie. Results are similar in both cases. The stats here correspond to BWA mapping followed by samtools flagstat for stats

ADD REPLY
0
Entering edit mode

Anyone! any thoughts??

ADD REPLY
1
Entering edit mode

There are several possibilities. The chromosome level assembly may have dropped some sequences or this is related to repeat sequences?

One starting point would be to compare the two genome assemblies and see what is different or look at what reads are no longer mapping, maybe these reads are highly duplicated sequences that mapped to a sequence in the previous assembly?

ADD REPLY

Login before adding your answer.

Traffic: 1843 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6