Hi all I am using DiffBind (R/3.4.1, DiffBind_2.4.8) to identify differentially accessible peaks from ATACseq (peaks called using MACS2 --no model, with broad region calling off) of a given cell type for three different conditions which I shall call A (2 samples), B (5 samples) and C (2 samples). I wish to compare ‘condition’ A to B as well as compare B to C. My code is as follows:
B_C<-dba(sampleSheet = " B_C.csv")
B_C <-dba.count(B_C)
B_C <-dba.contrast(B_C,categories = DBA_CONDITION, minMembers = 2)
B_C <-dba.analyze(B_C, method = DBA_ALL_METHODS)
And
A_B<-dba(sampleSheet = " A_B.csv")
A_B <-dba.count(A_B)
A_B <-dba.contrast(A_B,categories = DBA_CONDITION, minMembers = 2)
A_B <-dba.analyze(A_B, method = DBA_ALL_METHODS)
I have two questions/concerns:
1) The B-C comparison analysis yields sensible results but the results of the A-B comparison analysis seem spurious – the affinity has shifted exclusively in one direction, with all peaks differentially accessible in the condition A compared to B (regardless of whether EdgeR or DESeq2 is used). Even peaks which feature in 4 out of the 5 B samples but neither of the 2 A samples come up as differentially accessible in A compared to B. Do you have any suggestions as to what might be causing this discrepancy and how I can overcome this problem?
2) Is there a way in which to compare differential accessibility between the A condition and those peaks in the B group which are themselves differentially accessible compared to C i.e. A versus B[!C]? I haven’t been able to identify a way in which to do this from the DiffBind manual. It would be possible to take the DiffBind results of B versus C and then implement DiffBind a second time using these results in comparison to condition A which, however, would preclude the use of replicates as part of the second DiffBind run. An alternative solution I have thought about would be to carry out A-B and B-C separately (as outlined in ‘question 1’ above), combine the regions from both of these analyses into a single BED file and then perform clustering e.g. using the k means algorithm to identify clusters in which A is enriched compared to B or vice versa but B is enriched compared to C. Your thoughts would be welcome.
Thanks in advance.
Hi Rory
Thank you for your prompt response. Your solution for my second question sounds excellent. As regards question 1), by "affinity shift in one direction" I do indeed mean that all fold values have the same sign. The problem persists even when summits=250 in the call to dba.contrast. Setting bCounts=TRUE in the call to dba.report reveals that the B samples have a count of 0 in all but a few of the ~140,000 peaks, with the two A samples having a count of 0.28 and 1 respectively, again for every peak; this clearly doesn't seem right. (In contrast, for the B-C comparison the counts for a given peak range from 0 to the hundreds.) Interestingly, when bNormalized=FALSE, the fold values all switch sign. Is there any other info I can provide you to help figure out the underlying issue? (The data are unpublished, so I'm unable to send you the DBA object.)
Thanks again.
Can you confirm which version you are using from
sessionInfo()
?I'm limited in what I can do without examining the object, but here's a few things to look at.
Something is going on in counting the A sample peaks. Look at them un-normalized (
bNormalized=FALSE
). If they are all 1 then nothing is being counted (the 0.28 is a normalized value). You can also look at a non-normalized MA plot.Possible reasons for this include:
Inconsistent chromosome naming. You can check this by looking at
this will list all the chromosome names - there should only be one name for each chromosome.
bSubControl=FALSE
in the call todba.analyze()
, or use alternative scores indba.count()
(score=DBA_SCORE_READS
).Hope this helps!
Hi Rory
Thank you! I will get back to you once I've tested your suggestions.
Hi Rory, could you please help me with a question related to dba.contrast available at: How to make complex contrasts in DiffBind dba.contrast