I have a very large fastq file that I want to do variant calling on. Would it make sense to remove duplicate reads before mapping using bowtie2? After this I was going to use freebayes.
What effect will removing duplicate reads have on my resulting variant file compared to keeping all of the reads?
Will I have lower confidence variant calls?
define "very large fastq file". IMO it's not a good idea to remove duplicate reads before alignment. You should mark duplicate reads (e.g. using picard markduplicates) on the aligned file (bam) and then call variants with the "markduplicated" bam file.