I am using Diffbind to call differential peaks on an ATAC seq dataset of condition A vs B each having 3 replicates.
The total number of peaks post union is - 106700 and of these 33197 (~30%) come out as significant even with a stringent FDR cutoff of 0.01.
Is this high proportion of significant peaks expected? I do understand it depends on the data but still the number seems very high. Should any additional steps should be added to the protocol below to increase the specificity?
The protocol I followed was as follows:-
Peaks were called for replicate individually with macs2 using the -format BAMPE to use the fragment size information in BAM for extension(reads are paired-end). The p-value cutoff here was kept lenient at 0.01 to allow more peaks early on.
DESeq2 was then used within Diffbind for normalization and finding the differential peaks.
The
--summits
option was not used in Diffbind as size is not a particular concern.
Thanks!.
Some details about the experiment itself, so what are the conditions, which organism, which treatment etc., would help.
Thanks for the reply!. The experiment is a gene knockout experiment and the organism here is Mus musculus(using mm10 as reference). There is no input/control. It is comparison of wild type vs the knockout each having 3 replicates.
Which gene was knocked out? Depending on the gene function, e.g. a histone deacetylase or methyltransferase, it might make sense that you see large-scale effects on chromatin accessability. Did the quality control steps look ok, so do you see the nucleosomal pattern in the insert sizes of the paired-end sequencing, and do the replicates correlate well among each other?
Thanks again. Apologies was out of office.
If your aim is to call different open regions I'd first be concerned about using BAMPE when you call peaks. When you use BAMPE, MACS2 will create a coverage vector from the fragment (created by filling in the space between the two mapped ends). However, pairs can be mapped either side of a nucleosome so when you pileup the fragment, you're actually calling peaks in both closed and open regions.
Thanks for the reply! I am new to peak based analysis in general. I chose BAMPE over a specific shift/extsize based on the MACS2 authors recommendation here
The other parameters that I have seen being used for ATAC are
--shift -75/-100 extsize 150/200
. But will these not then result in further shifting the reads which are mapped on either side of a nucleosome even more away from the centre and going into closed regions nearby? I realize it may be a naive question but I have not been able to get a good visualization of the part of the nucleosome the reads are coming from in ATAC.Also what parameters would you recommend ideally for a differential ATAC analysis.