I ran FastQC and found 33 % duplication levels in my sample.It is single end data.The average coverage is 10x.So, I used samtools rmdup and picard mark duplicates and my duplication levels dropped to 1 %.I have few questions regarding removing duplicates:
1.Do both samtools and picard remove duplicates based on position alone?How is picard mark duplicates different from rmdup?(they give very similar results though).Just curious to know which one is better.
2.I am not sure if it advisable to remove duplicates from single end data and how do the above programs treat them.
3.When I run samtools rmdup it prints
[bam_rmdupse_core] 3566092 / 20492754 = 0.1740
My final dedup .bam has 20979669 reads.I don't get what value we are considering for denominator in the above case i.e.value 20492754.Any comments/suggestions appreciated.
also see this thread on seqanswers