Hello,
I'm using Samtools to call variants and I am using Picard MarkDuplicates to mark duplicates in my bam file. My question is: Is it enough to just mark the duplicates using Picard and then use the marked bam file to call variants using Samtools or do the duplicates actually need to be removed from the bam file using the REMOVE_DUPLICATES=true argument in Picard? In other words, does Samtools pileup recognize the marked duplicates? Also, for calling the consensus sequence using Samtools, do I need to worry about duplicates or can I just use the original bam file without duplicates marked?
Thanks in advance for your help. (This group is great and I've learned a lot from it).
The 1000 genomes procedure seems to be duplicate removal by default and they have low coverage. It would be good to really see a specific comparison...