Hi
I am new to ATAC-Seq and ChIP-Seq and I have a question on removing duplicates when it comes to paired-end ATAC-Seq and paired-end ChIP-Seq pipelines.
I have seen some papers where clumpify is used in the first step to remove duplicates followed by bwa-mem and passing the bam files (after shifting for ATAC-Seq) to MACS2 with no usage of Picard MarkDuplicates.
https://www.cell.com/cell-genomics/pdf/S2666-979X(23)00019-8.pdf
Single nucleus multiomics identifies ZEB1 and MAFB as candidate regulators of Alzheimer’s disease-specific cis-regulatory elements
ChIP-seq analysis
Prior to analysis, reads were processed to remove optical duplicates with clumpify (BBMap v38.20; https://sourceforge.net/projects/
bbmap/) [dedupe = t optical = t dupedist = 2500]
I have seen this post - Did you remove ChIP-seq duplicates - where Picard MarkDuplicates is used
Hence just curious - is there benefits with one approach or the other (Approach 1 - using clumpify as step 1 and no MarkDuplicates; Approach 2 - use aligned bam files and then do MarkDuplicates)
If using clumpify, is the above correct - dedupe=t optical=t dupedist=2500
- Removing fastq duplicates GenoMax suggests hdist=0 for making it strict
- Yes .. BBMap can do that! - Part III clumpify (mark (and dedupe) duplicates without alignment), mutate (create mutant genomes) and other miscellaneous tools should I use dedupe=t optical=f or dedupe=t optical=t
I would like to seek your advice and guidance on the above for both ATAC-Seq and ChiP-Seq.
- Would anything change if it was single-end instead of paired-end as alluded to here: Did you remove ChIP-seq duplicates
Thanks in advance.
Using
clumpify.sh
will allow you to do duplicate removal in an alignment independent manner. It will also allow you to remove justoptical
(really clustering) duplicates, which is what the paper you linked above seems to be doing. It will also make the size of the input files smaller thus reducing time required for alignment.Keep this statement from @Ian sudbery's answer linked above in mind
Thanks GenoMax for your answer.