Strange results from Diffbind-Deseq2 differential peak binding of cfChip-seq with blocking
1
0
Entering edit mode
22 months ago
Oladele • 0

Hi All,

I am working on cfChip-seq with the goal of comparing differentially bound peaks between two timepoints. Please note that same patients (n = 9) donated samples on the two occasions. Our quick PCA analysis showed a lot of variation coming from patient (i.e samples from the same patients seem to cluster together). So we decided to perform the diffbind (v2, Deseq2) analysis with/without blocking with hopes of removing patient-specific significant peaks.

However, I surprisingly observed a drastic increase in diffbound peaks (fdr<0.05, FC≥2) when I blocked on patient (~9000 peaks) vs without blocking (~2000 peaks). This is difficult to explain and I wonder if anyone had similar experience previously. Any suggestion on how to handle this will be appreciated.

Here is a section of my code:

DBdatacontrast <- dba.contrast(DBdataCounts, minMembers=2, categories = DBA_CONDITION, block=DBA_TREATMENT)
DBAnalysisDeseq2 <- dba.analyze(DBdatacontrast, method = DBA_DESEQ2)

Thanks!

Diffbind cfChip-seq Blocking Deseq2 • 847 views
ADD COMMENT
1
Entering edit mode
22 months ago
Rory Stark ★ 2.1k

It is certainly possible to see a substantial increase in confidentially identifying binding changes when applying the "correct" model. In your case, when you do not include the matched samples in the model, you are seeing many fewer significantly differentially bound sites because of the unmodeled similarities between the matched samples in different conditions. When you do include this factor in the model, more consistent changes between timepoints are identified once the sources of variation due to matched patient samples is accounted for. The increase in confidences score (lower FDRs) is an indication that the two-factor model is a better fit for your experimental setup.

Can I ask why you are using such an old version of DiffBind? DiffBind_2.x.y is several years out of date at this point. More recent versions are more likely to accurately model your data. In addition, instead of using the block feature, you can now set up the exact model you want explicitly. In your case, something like:

DBdatacontrast < dba.contrast(DBdataCounts, design="~Treatment + Condition", 
                              contrast=c("Condition","Latertime", "Earliertime") )
ADD COMMENT
0
Entering edit mode

Hi Rory, Thanks for the prompt and detailed response. Yes, our lab has been using V2 for a while and have been reluctant changing to V3. But thank you so much for bringing this up as we will definely be exploring V3 now.

ADD REPLY

Login before adding your answer.

Traffic: 2684 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6