I have Illumina paired end reads. My plan is on to do the mapping to the reference, sort them, remove the duplicates and use GATK for variant calling and haplotype calling. But I am concerned about the overlapping reads with my paired end reads.
When working with Illumina paired-end reads for variant calling and haplotype phasing, should I merge overlapping reads using PEAR before mapping to the reference genome (using bwa-mem2)? PEAR produces assembled single reads and unassembled paired-end reads, which would then be mapped separately, sorted, and deduplicated before merging and variant calling. Is this approach necessary, or can I proceed without merging the overlapping reads and map the paired-end reads directly?"
As far as I have read from GATK, it does consider the overlapping reads by dividing the base quality of the overlapping reads into half? I am not sure, can you provide me some inputs on this
what is your expected fragment size? why would you assume you had any overlapping reads?
All my reads are in the length of 150 bp, when I used pear to check whether any reads are being merging/overlapping. I found all most 10 percent were overlapping.
from the PEAR paper https://pmc.ncbi.nlm.nih.gov/articles/PMC3933873/
ok but this benefit would only extend to the overlapping region (10-20bp?) of 10% of your reads
uhhh I don't think so. merging overlaps is very useful for de novo sequence assembly with short reads but it is pretty rare for resequencing.
Hi Jeremy, thank you for the comment. Since this is a resequencing project, I want to make sure that the presence of the overlapping reads wont affect the variant calling, as far as I read about GAT4, it does reduce the base quality by half in for the regions which are overlapping, does this imply that I can proceed on with mapping with bwa-mem2 without using PEAR? Thank you :)
i think that is to account for the fact you are looking at the same DNA fragment twice. If you show up to vote for president twice, the polling station needs to halve each of your votes, or ignore one.
https://gatk.broadinstitute.org/hc/en-us/community/posts/360060145512-does-GATK4-correct-variant-calls-from-overlapping-paired-reads