Question

DiffBind - normalization methods with biologically-relevant differences in signal-to-noise

0

Entering edit mode

6.5 years ago

reskejak ▴ 40

I have been using DiffBind for differential-accessibility analysis with ATAC data and encountered the seemingly infamous normalization issue: our results are very different when normalizing by full library read depth as opposed to depth of reads within consensus peaks (from my understanding of how bFullLibrarySize param setting affects count normalization).

Below are two MA plots from bFullLibrarySize=TRUE and =FALSE using DESeq2, and we can see that the results are quite different. It would appear the background density normalization suggests to utilize the =FALSE method. We do observe variability of signal-to-noise between samples (FRiP ranges from 0.08 to 0.30 between samples). However, this may be indicative of the biology between experimental groups. This variability is also a reason why we have not relied on edgeR calculations, as I recall reading that Rory stated it is not an appropriate method for high inter-sample signal-to-noise variability within experiments. Does anyone have suggestions for interpreting these results, or which to favor?

I have a number of other flow cells which we also see variability from the outputs of bFullLibrarySize=TRUE vs. FALSE, so we are seeking insight for which to implement in scenarios with varying signal-to-noise intensities. I'm also planning to try csaw in the near future, so I can compare results to that method as well.

bFullLibrarySize=TRUE

bFullibrarySize=FALSE bFullLibrarySize=FALSE

DiffBind ATAC signal normalization ChIP-Seq • 3.8k views

ADD COMMENT • link updated 6.5 years ago by Devon Ryan 104k • written 6.5 years ago by reskejak ▴ 40

score 3 · Answer 1 · 2018-06-17

3

Entering edit mode

6.5 years ago

Devon Ryan 104k

To be frank, neither of the options presented by diffBind are very robust, we've switched to the CSAW package for analyzing differential accessibility, since it's more flexible and provides more robust methods.

The assumptions behind the bFullLibrarySize option are as follows:

True: There is no great difference in experimental efficiency between groups (and ideally not between samples, but as long as the inter-sample variability is similar within groups you should be OK).
False: There is no global change in signal within peaks between groups.

If you have a case where there's an efficiency difference between groups AND you suspect there may be a global shift in accessibility then neither of the settings are appropriate and you'll need to come up with a different way to normalize the samples (e.g., spiking in a foreign DNA source in all of the libraries and using that for normalization).

You'll need to assess for yourself whether any of the above assumptions fit your biological experiment.

ADD COMMENT • link 6.5 years ago by Devon Ryan 104k

0

Entering edit mode

I recently started using DiffBind for differential accessibility analysis of ATAC-seq. Could you elaborate more on why diffBind options are not robust and in which aspects csaw is better? I am still a newbie to this :)

ADD REPLY • link 5.1 years ago by mickey_95 ▴ 110

0

Entering edit mode

Just out of interest, refreshing an old thread, when seeing this MA-plot from above (the 2nd, normalized one), would you say this (what looks like a) trended bias starting at logCPM ~ 6 is worth worrying about, e.g. to be corrected with loess as suggested in csaw?

I saw some of these trended bias in some datasets recently, still not sure whether I should bother with it or rather stick with default TMM if the bias is not too strong (means creating large FCs in the high logCPM range). Given this bias above was not biological it would probably make the regions on the topright (FC > 0, with large logCPM therefore high power) significant, while probably being a technical artifact (if we assume this is technical and not biological). Any thoughts?

ADD REPLY • link 4.2 years ago by ATpoint 85k

1

Entering edit mode

I don't expect correcting this to cause too much of a change in the results, but it'd be worth having a look and testing that empirically.

ADD REPLY • link 4.2 years ago by Devon Ryan 104k