Duplicates in ATAC-seq data
2
1
Entering edit mode
7.4 years ago
rbronste ▴ 420

Wondering about opinions on duplicate removal for such things as foot printing in ATAC data. It seems as though Picard at least is extremely aggressive in this regard and removes large amounts of duplicates vs samtools rmdup in my experience. Though Macs2 peaks called with small number of remaining reads (something below 1M) still gives you hundreds of thousands of significant peaks. Confusing.

ATAC-seq picard samtools rmdup duplicates • 4.9k views
ADD COMMENT
1
Entering edit mode
7.4 years ago
James Ashmore ★ 3.5k

When I analyse ATAC-seq data I normally remove duplicates using Picard (also remember to change the OPTICAL_DUPLICATE_PIXEL_DISTANCE based on the flowcell). If you have fewer than 1M reads I'd be skeptical about the peaks MACS2 is calling. Without seeing the data I can imagine that the signal is very striated across the genome and it's these tiny spikes which MACS2 is calling as a peak.

ADD COMMENT
0
Entering edit mode
7.4 years ago

As I recall, we've normally been removing duplicates when using ATAC-seq data. Regarding samtools vs. picard, I think even the samtools authors would say "use picard" for paired-end data.

ADD COMMENT

Login before adding your answer.

Traffic: 2135 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6