samtools markdup vs samtools rmdup
0
0
Entering edit mode
4.8 years ago
a.abnousi ▴ 30

I know samtools rmdup is obsolete and markdup should be used instead. My old pipeline used rmdup and now I'm trying to upgrade it to use markdup.

When comparing the results between these two, using default settings, rmdup removes more reads on my test dataset (188M vs 185M remaining). I'm checking the manual, it looks like markdup by default removes PCR duplicates and not optical duplicates, I think that's what rmdup does too. (rmdup does not have an option for dealing with optical reads).

Where does this difference come from? How can I reproduce results similar to samtools rmdup using samtools markdup.

Thanks!

samtools markdup

samtools rmdup

samtools rmdup markdup • 5.7k views
ADD COMMENT
0
Entering edit mode

If it is not documented then it is unlikely that rmdup did that. Still, why bothering with something like this? I recommend just using markdup (since it is the currently recommended tool within samtools) and then proceed with the analysis. One can spend a lot of time on these lowlevel things but eventually there is no benefit in overthinking it.

ADD REPLY
0
Entering edit mode

I was thinking the same recently... since I need to do variant calling, I was wondering whether we should remove duplicate reads or just mark them? and will it affect the variant calling? if I just markdup, will duplicated we ignored?

ADD REPLY
1
Entering edit mode

Yes, a proper variant calling tool will ignore duplicates if these are marked as such.

ADD REPLY

Login before adding your answer.

Traffic: 2013 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6