Question

Diffbind - Very high proportion of significant peaks

0

Entering edit mode

7.0 years ago

kautilya ▴ 430

I am using Diffbind to call differential peaks on an ATAC seq dataset of condition A vs B each having 3 replicates.

The total number of peaks post union is - 106700 and of these 33197 (~30%) come out as significant even with a stringent FDR cutoff of 0.01.

Is this high proportion of significant peaks expected? I do understand it depends on the data but still the number seems very high. Should any additional steps should be added to the protocol below to increase the specificity?

The protocol I followed was as follows:-

Peaks were called for replicate individually with macs2 using the -format BAMPE to use the fragment size information in BAM for extension(reads are paired-end). The p-value cutoff here was kept lenient at 0.01 to allow more peaks early on.
DESeq2 was then used within Diffbind for normalization and finding the differential peaks.
The--summits option was not used in Diffbind as size is not a particular concern.

Thanks!.

diffbind atac-seq deseq2 • 4.9k views

ADD COMMENT • link 7.0 years ago by kautilya ▴ 430

1

Entering edit mode

Some details about the experiment itself, so what are the conditions, which organism, which treatment etc., would help.

ADD REPLY • link 7.0 years ago by ATpoint 87k

0

Entering edit mode

Thanks for the reply!. The experiment is a gene knockout experiment and the organism here is Mus musculus(using mm10 as reference). There is no input/control. It is comparison of wild type vs the knockout each having 3 replicates.

ADD REPLY • link 7.0 years ago by kautilya ▴ 430

1

Entering edit mode

Which gene was knocked out? Depending on the gene function, e.g. a histone deacetylase or methyltransferase, it might make sense that you see large-scale effects on chromatin accessability. Did the quality control steps look ok, so do you see the nucleosomal pattern in the insert sizes of the paired-end sequencing, and do the replicates correlate well among each other?

ADD REPLY • link 7.0 years ago by ATpoint 87k

0

Entering edit mode

Thanks again. Apologies was out of office.

The gene which was knocked out is the part of an nucleosome remodeling complex so i think the large-scale effects can be expected.
The insert sizes of at least 4 of 6 samples shows a clear increase around 170-200 bp and another small one at ~400bp. This corroborates with excepted mono/di-nucleosomal peaks. Example
Also the replicate concordance using IDR is around 30% if directly compared and ~65% when a pooled peak list for a condition is used, at a cutoff of 0.01. I am still unclear on what are the expected ranges for good IDR scores though.

ADD REPLY • link 7.0 years ago by kautilya ▴ 430

1

Entering edit mode

If your aim is to call different open regions I'd first be concerned about using BAMPE when you call peaks. When you use BAMPE, MACS2 will create a coverage vector from the fragment (created by filling in the space between the two mapped ends). However, pairs can be mapped either side of a nucleosome so when you pileup the fragment, you're actually calling peaks in both closed and open regions.

ADD REPLY • link 7.0 years ago by James Ashmore ★ 3.5k

0

Entering edit mode

Thanks for the reply! I am new to peak based analysis in general. I chose BAMPE over a specific shift/extsize based on the MACS2 authors recommendation here

The other parameters that I have seen being used for ATAC are --shift -75/-100 extsize 150/200. But will these not then result in further shifting the reads which are mapped on either side of a nucleosome even more away from the centre and going into closed regions nearby? I realize it may be a naive question but I have not been able to get a good visualization of the part of the nucleosome the reads are coming from in ATAC.

Also what parameters would you recommend ideally for a differential ATAC analysis.

ADD REPLY • link 7.0 years ago by kautilya ▴ 430

score 4 · Answer 1 · 2018-04-10

I have certainly see this level of differentially Some questions:

Do you expect the conditions to change the open chromatin landscape quite a bit?
How are the differential sites divided between the two conditions? Do they mostly show increases in one condition, or are there many changes in both directions?

It may help to look at the MA plots using dba.plotMA(). You can also compare with bNormalized=FALSE to see if the impact of nomalization. If the MA plots show a string shift in one direction, that usually indicates that one of the conditions has a big impact on chromatin state.

You can also look at some of the sites in more detail to see if they are convincingly different. A couple ways to do this:

Get the report with dba.report() and set bCounts=TRUE to see the fold change and the distribution of read counts.
Look at all six conditions in a browser (eg. IGV). Check some of the differential sites to see what the pileups look like.

Cheers-

Rory