Question

PCR duplicate removal in single-end atac-seq data

0

Entering edit mode

13 months ago

sarahmanderni ▴ 130

Hi,

How important is it to remove the PCR duplicates (using example samtools and picard) before performing macs peak calling for SINGLE end atac-seq data? I need to carefully decide as the samples are patient derived samples with few cells so we didnt have the usual number of cells required to load for sequencing (5000 to 25000 cells per sample) and now I am thinking what if there hasnt been enough material to sequence and just ended up with PCR duplicates. Other issue is of course this is single-end data. Thanks!

ATAC-seq pcr-duplicate • 1.1k views

ADD COMMENT • link updated 13 months ago by ATpoint 88k • written 13 months ago by sarahmanderni ▴ 130

score 2 · Accepted Answer · 2024-06-13

I think it is always important to remove duplicates for peak calling as otherwise you might be calling a lot of "signal" that is actually just a pipeup of PCR artifacts. Depending on the peak caller the software might remove duplicates automatically (macs3 for example can do that), but I personally always make a dedicated bam file with duplicates (and other filtering criteria I find reasonable) removed.

I understand that few cells might lead to a little reduction in quality, but actually in my hands (in the last years since we do ATAC-seq) the assay is actually very robust against fluctuations in input material, as long as cells were viable, and protocol was performed correctly. I definitely recommend against including more noise into the analysis as a compensation for experimental shortcomings. This in the end just accumulates uncertainty, which with suboptimal experimental setup is anyway always an issue. I don't think it helps.

The fact that it is single-end is just unfortunate and frankly a bad design decision, as the observation of the typical ATAC-seq banding pattern (beyond the Bioanalyzer/TapeStation quality control), is a valuable QC metric -- especially when input material is low and experimental outcome is uncertain. Companies such as Novogene offer very cost-effective PE150bp sequencing these days, and at least in our hands we never found any provider that (for plain sequencing of this type) could beat that price, plus you get the full paired-end information.

My recommendation is to remove duplicates, then make a bigwig track and just look at the data in the IGV. If there is a good separation between peaks and noise it's fine. Else, you might want to decide whether the data can give you anything. Feel free to post the tracks then I can give feedback. Another important QC metric is FRiPs FRIP score ATAC-seq