samtools markdup vs PICARD's MarkDuplicates when removing duplicated reads
2
2
Entering edit mode
5.4 years ago
nanoide ▴ 120

Hi there,

So I'm currently analyzing some ATAC-seq data. The duplicated reads were removed using first samtools fixmate -m and then samtools markdup -rs. I'm facing many discarded reads and I cannot repeat this step anymore, maybe in the future. I was wondering, are there any known differences between this methods and other such as PICARD's MarkDuplicates?

Would be worth trying other methods for removing duplicated reads, or the % should be the same?

Any advice would be appreciated.

Thanks

ATAC-seq samtools Markduplicates • 9.4k views
ADD COMMENT
6
Entering edit mode
5.4 years ago
ATpoint 86k

As far as I know, the methods perform similar for most applications and differences mainly affect edge cases such as supplementary alignments. Another option would be samblaster which I use in my ATAC-seq pipeline. In the end it probably does not matter. The advantage of samblaster/samtools over Picard is that it uses way less memory and can be used in Unix pipes. In ATAC-seq it is not uncommon to have some duplication, probably due to mitochondrial contamination. I would not worry too much about that and rather see if the downstream analysis indicates good quality (number of callable peaks, Fraction of Reads Per peak, good signal-to-noise ration when inspecting reads in a genome browser = distinct peaks without much noise).

ADD COMMENT
0
Entering edit mode

Thanks for the insights! Regards

ADD REPLY
4
Entering edit mode
5.4 years ago
predeus ★ 2.1k

Unless something has changed dramatically, you should use Picard and not samtools to mark duplicates in an aligned file. Even Heng Li (the author of samtools) said that he does not recommend using samtools markdup. The topic was discussed quite a lot, for example, here: http://seqanswers.com/forums/showthread.php?t=6854

If you need to remove the duplicates, make sure you set the appropriate flag in Picard MarkDuplicates.

ADD COMMENT
6
Entering edit mode

This is a discussion from 2010 about samtools rmdup not markdup. rmdup is now deprecated with markdup a being a recent replacement. By best knowledge (correct me if I am wrong) there is still a good benchmark missing for markdup vs picard, but as said above, I would be surprised if for a standard paired-end dataset it would made a notable difference.

ADD REPLY
1
Entering edit mode

Good call - like I said, "unless something has changed"!

I agree, the results should be very comparable.

ADD REPLY
0
Entering edit mode

Thank you both for the comments, regards

ADD REPLY

Login before adding your answer.

Traffic: 1717 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6