Question

Remove duplicate reads before variant calling?

1

Entering edit mode

5.5 years ago

O.rka ▴ 750

I have a very large fastq file that I want to do variant calling on. Would it make sense to remove duplicate reads before mapping using bowtie2? After this I was going to use freebayes.

What effect will removing duplicate reads have on my resulting variant file compared to keeping all of the reads?

Will I have lower confidence variant calls?

snp • 2.7k views

ADD COMMENT • link updated 5.5 years ago by i.sudbery 21k • written 5.5 years ago by O.rka ▴ 750

2

Entering edit mode

define "very large fastq file". IMO it's not a good idea to remove duplicate reads before alignment. You should mark duplicate reads (e.g. using picard markduplicates) on the aligned file (bam) and then call variants with the "markduplicated" bam file.

ADD REPLY • link 5.5 years ago by Nicolas Rosewick 11k

score 2 · Answer 1 · 2020-01-16

Removing duplicates before alignment (using something like tally) will save you time on the mapping, and shouldn't affect the results, but you will still have to remove duplicates after alignment using Picard MarkDuplicates because two reads can be PCR duplicates without having exactly the same sequence.

TO be honest, unless your fastq is VERY big, alignment is rarely the most time consuming part of varient calling, and it might make sense to alignment everything and the MarkDuplicates, since you have to do this anyway.