Question

Duplicates in ATAC-seq data

1

Entering edit mode

7.8 years ago

rbronste ▴ 420

Wondering about opinions on duplicate removal for such things as foot printing in ATAC data. It seems as though Picard at least is extremely aggressive in this regard and removes large amounts of duplicates vs samtools rmdup in my experience. Though Macs2 peaks called with small number of remaining reads (something below 1M) still gives you hundreds of thousands of significant peaks. Confusing.

ATAC-seq picard samtools rmdup duplicates • 5.2k views

ADD COMMENT • link updated 7.8 years ago by James Ashmore ★ 3.5k • written 7.8 years ago by rbronste ▴ 420

score 1 · Answer 1 · 2017-07-19

When I analyse ATAC-seq data I normally remove duplicates using Picard (also remember to change the OPTICAL_DUPLICATE_PIXEL_DISTANCE based on the flowcell). If you have fewer than 1M reads I'd be skeptical about the peaks MACS2 is calling. Without seeing the data I can imagine that the signal is very striated across the genome and it's these tiny spikes which MACS2 is calling as a peak.

score 0 · Answer 2 · 2017-07-19

0

Entering edit mode

7.8 years ago

Devon Ryan 105k

As I recall, we've normally been removing duplicates when using ATAC-seq data. Regarding samtools vs. picard, I think even the samtools authors would say "use picard" for paired-end data.

ADD COMMENT • link 7.8 years ago by Devon Ryan 105k