Overlapping reads, how do we counter them while variant calling, haplotype calling, does it affect mapping?
0
0
Entering edit mode
5 weeks ago
ayeraselvan ▴ 10

I have Illumina paired end reads. My plan is on to do the mapping to the reference, sort them, remove the duplicates and use GATK for variant calling and haplotype calling. But I am concerned about the overlapping reads with my paired end reads.

When working with Illumina paired-end reads for variant calling and haplotype phasing, should I merge overlapping reads using PEAR before mapping to the reference genome (using bwa-mem2)? PEAR produces assembled single reads and unassembled paired-end reads, which would then be mapped separately, sorted, and deduplicated before merging and variant calling. Is this approach necessary, or can I proceed without merging the overlapping reads and map the paired-end reads directly?"

As far as I have read from GATK, it does consider the overlapping reads by dividing the base quality of the overlapping reads into half? I am not sure, can you provide me some inputs on this

reads • 499 views
ADD COMMENT
0
Entering edit mode

what is your expected fragment size? why would you assume you had any overlapping reads?

ADD REPLY
0
Entering edit mode

All my reads are in the length of 150 bp, when I used pear to check whether any reads are being merging/overlapping. I found all most 10 percent were overlapping.

ADD REPLY
0
Entering edit mode

from the PEAR paper https://pmc.ncbi.nlm.nih.gov/articles/PMC3933873/

By merging paired-end reads, the overlapping region between them can also be deployed for correcting sequencing errors and potentially yield sequences of higher quality.

ok but this benefit would only extend to the overlapping region (10-20bp?) of 10% of your reads

first processing step in a plethora of sequence analysis pipelines.

uhhh I don't think so. merging overlaps is very useful for de novo sequence assembly with short reads but it is pretty rare for resequencing.

ADD REPLY
0
Entering edit mode

Hi Jeremy, thank you for the comment. Since this is a resequencing project, I want to make sure that the presence of the overlapping reads wont affect the variant calling, as far as I read about GAT4, it does reduce the base quality by half in for the regions which are overlapping, does this imply that I can proceed on with mapping with bwa-mem2 without using PEAR? Thank you :)

ADD REPLY
0
Entering edit mode

i think that is to account for the fact you are looking at the same DNA fragment twice. If you show up to vote for president twice, the polling station needs to halve each of your votes, or ignore one.

https://gatk.broadinstitute.org/hc/en-us/community/posts/360060145512-does-GATK4-correct-variant-calls-from-overlapping-paired-reads

ADD REPLY

Login before adding your answer.

Traffic: 1676 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6