Question

Huge difference between differential peaks with EdgeR and DESeq2 using Diffbind

0

Entering edit mode

7.5 years ago

daniele.avancini ▴ 70

Hi all,

I'm analyzing H3K27 peaks in two different cell populations and I want to get the differential peaks between the two using Diffbind. Peaks were called using MAC2 with the --broad flag. I run the analysis through DiffBind using the code:

K27<-dba(sampleSheet = "sampleSheet_K27.csv")
K27<-dba.count(K27)
K27<-dba.contrast(K27,categories = DBA_CONDITION, minMembers = 2)
K27<-dba.analyze(K27, method = DBA_ALL_METHODS)

The analysis goes well but I noticed that there is a big difference between the differential regions identified by using EdgeR and DESeq2: none in the case of EdgeR and 1984 using DESeq2. I expect to see differences between the two methods principally because of their different normalization processes but this difference, in my humble opinion, seems to be too high to be caused only by normalization.

Do you have any idea where this difference may rely on? Is there something I do wrong? Or do I take the result from DESeq2?

Thanks in advance.

EDIT: below the MA plot as requested using the code:

> par(mfrow=c(3,1))
> dba.plotMA(K27,bNormalized=FALSE)
> dba.plotMA(K27, method=DBA_DESEQ2)
> dba.plotMA(K27, method=DBA_EDGER)

MA plots

ChIP-Seq R Diffbind edgeR DESeq2 • 5.8k views

ADD COMMENT • link 7.5 years ago by daniele.avancini ▴ 70

0

Entering edit mode

Can you post an MA plot of the samples you're comparing (using both edgeR and DESeq2). By memory edgeR and DESeq2 have different normalisation strategies, and I believe one of them is better suited to data where there is a global shift in fold change towards one condition. You'll be able to see this on an MA plot.

ADD REPLY • link 7.5 years ago by James Ashmore ★ 3.5k

score 6 · Accepted Answer · 2017-10-16

I suspect you are actually correct in your initial suspicion that the difference is in normalization. We most often see this behavior when there is a lot more signal in one sample group than the other.

You can check for this by comparing three plots:

> par(mfrow=c(3,1))
> dba.plotMA(K27,bNormalized=FALSE)
> dba.plotMA(K27, method=DBA_DESEQ2)
> dba.plotMA(K27, method=DBA_EDGER)

You may see that in the non-normalized plot, most of the fold changes are above or below the zero line, and that the normalization in the edgeR analysis is more extensive than for the DESeq2 analysis. The default edgeR normalization, TMM, assumes that most of the enriched regions don't change in their degree of enrichment between the two sample groups. When you invoke the DESe2 analysis with the default of bFullLibrarySize=TRUE, only a simple normalization for sequencing depth is carried out. It is for this reason that DiffBind changes the default method to DESeq2 with bFullLibrarySize=TRUE a few releases back.

If you do see that the fold changes in the non-normalized data mostly have the same sign (affinity shifts all in one direction), then the edgeR TMM normalization is not appropriate and you should not use those results. The DESeq2 results will be more reliable.

It may also be worth trying a variant of the analysis by setting the summits parameter in the call to dba.count(). For a broad area of enrichment, you may want to use a number like summits=500. This will consider only the most-enriched 1000bp regions of each "peak". While the entire region is not considered, comparing the same 1000bp window in each region lessens the impact of background reads in broad peak-calling.

-R