Question

Strange results from Diffbind-Deseq2 differential peak binding of cfChip-seq with blocking

0

Entering edit mode

22 months ago

Oladele • 0

Hi All,

I am working on cfChip-seq with the goal of comparing differentially bound peaks between two timepoints. Please note that same patients (n = 9) donated samples on the two occasions. Our quick PCA analysis showed a lot of variation coming from patient (i.e samples from the same patients seem to cluster together). So we decided to perform the diffbind (v2, Deseq2) analysis with/without blocking with hopes of removing patient-specific significant peaks.

However, I surprisingly observed a drastic increase in diffbound peaks (fdr<0.05, FC≥2) when I blocked on patient (~9000 peaks) vs without blocking (~2000 peaks). This is difficult to explain and I wonder if anyone had similar experience previously. Any suggestion on how to handle this will be appreciated.

Here is a section of my code:

DBdatacontrast <- dba.contrast(DBdataCounts, minMembers=2, categories = DBA_CONDITION, block=DBA_TREATMENT)
DBAnalysisDeseq2 <- dba.analyze(DBdatacontrast, method = DBA_DESEQ2)

Thanks!

Diffbind cfChip-seq Blocking Deseq2 • 850 views

ADD COMMENT • link updated 22 months ago by Ram 44k • written 22 months ago by Oladele • 0

Ram · Answer 1 · 2023-01-24

It is certainly possible to see a substantial increase in confidentially identifying binding changes when applying the "correct" model. In your case, when you do not include the matched samples in the model, you are seeing many fewer significantly differentially bound sites because of the unmodeled similarities between the matched samples in different conditions. When you do include this factor in the model, more consistent changes between timepoints are identified once the sources of variation due to matched patient samples is accounted for. The increase in confidences score (lower FDRs) is an indication that the two-factor model is a better fit for your experimental setup.

Can I ask why you are using such an old version of DiffBind? DiffBind_2.x.y is several years out of date at this point. More recent versions are more likely to accurately model your data. In addition, instead of using the block feature, you can now set up the exact model you want explicitly. In your case, something like:

DBdatacontrast < dba.contrast(DBdataCounts, design="~Treatment + Condition", 
                              contrast=c("Condition","Latertime", "Earliertime") )