DiffBind - Erroneous Affinity Shift In One Direction
1
0
Entering edit mode
6.8 years ago

Hi all I am using DiffBind (R/3.4.1, DiffBind_2.4.8) to identify differentially accessible peaks from ATACseq (peaks called using MACS2 --no model, with broad region calling off) of a given cell type for three different conditions which I shall call A (2 samples), B (5 samples) and C (2 samples). I wish to compare ‘condition’ A to B as well as compare B to C. My code is as follows:

B_C<-dba(sampleSheet = " B_C.csv")

B_C <-dba.count(B_C)

B_C <-dba.contrast(B_C,categories = DBA_CONDITION, minMembers = 2)

B_C <-dba.analyze(B_C, method = DBA_ALL_METHODS)

And

A_B<-dba(sampleSheet = " A_B.csv")

A_B <-dba.count(A_B)

A_B <-dba.contrast(A_B,categories = DBA_CONDITION, minMembers = 2)

A_B <-dba.analyze(A_B, method = DBA_ALL_METHODS)

I have two questions/concerns:

1) The B-C comparison analysis yields sensible results but the results of the A-B comparison analysis seem spurious – the affinity has shifted exclusively in one direction, with all peaks differentially accessible in the condition A compared to B (regardless of whether EdgeR or DESeq2 is used). Even peaks which feature in 4 out of the 5 B samples but neither of the 2 A samples come up as differentially accessible in A compared to B. Do you have any suggestions as to what might be causing this discrepancy and how I can overcome this problem?

2) Is there a way in which to compare differential accessibility between the A condition and those peaks in the B group which are themselves differentially accessible compared to C i.e. A versus B[!C]? I haven’t been able to identify a way in which to do this from the DiffBind manual. It would be possible to take the DiffBind results of B versus C and then implement DiffBind a second time using these results in comparison to condition A which, however, would preclude the use of replicates as part of the second DiffBind run. An alternative solution I have thought about would be to carry out A-B and B-C separately (as outlined in ‘question 1’ above), combine the regions from both of these analyses into a single BED file and then perform clustering e.g. using the k means algorithm to identify clusters in which A is enriched compared to B or vice versa but B is enriched compared to C. Your thoughts would be welcome.

Thanks in advance.

ChIP-Seq diffbind • 2.8k views
ADD COMMENT
1
Entering edit mode
6.8 years ago
Rory Stark ★ 2.1k

Hello, I am the author of DiffBind.

1) There are a number of things that could account for the A-B differences. You say all the shifts are in one direction, which I assume means that the fold-changes of all the differentially bound (DB) sites have the same sign? It may be instructive to look at the read counts for the DB sites, by setting bCounts=TRUE in the call to dba.report().You can try this normalized or non-normalized (bNormalized=FALSE). This may shed some light as to what is going on. You can also look at the MA plots using dba.plotMA() to confirm the shift is visible in the data.

One possibility is that the accessible regions are longer in one conditions, leading to the merged peaks being very wide, with relatively few read counts in the other condition. In this case it would be wroth running an analysis that re-centers the sites and standardized their lengths by setting e.g. summits=250 in the call to dba.count(). You will then do analysis on 501bp regions centered around the point of maximal pileup which may prove a better comparison.

You can also send me the DBA object after analysis to have a look at what may be going on. rory.stark [@] cruk.cam.ac.uk

2) The way I would do the second comparison is as follows:

  1. Run the B vs C analysis, then retrieve the DB sites by calling dba.report().
  2. Reload all the peaks and retrieve the condition A consensus using dba() and dba.peakset(bRetrieve=TRUE).
  3. Add the DB sites and the A consensus sites to a temporary DBA object and retrieve their union.
  4. Reload all the peaks and re-run dba.count() with consensus peakset set to the union
  5. Run A vs P analysis on this binding matrix

Here is a complete script using the sample data. In this case, the first contrast is between MCF7 Responsive and all Resistant, and the second contrast is between Resistant and Responsive samples using the union of those peaks. Note that the minOverlap parameter in the call to dba() determines how many of the "A" condition peaks you keep (minOverlap=1 will keep all of them):

# Run first contrast and retrieve DB sites (B vs C)
data(tamoxifen_counts)
tamoxifen <- dba.contrast(tamoxifen,group1=3:5,group2=tamoxifen$masks$Resistant)
tamoxifen <- dba.analyze(tamoxifen)
DBsites   <- dba.report(tamoxifen)

# Re-load peaks and retrieve non-MCF7 Responsive sites (A)
data(tamoxifen_peaks)
RespNonMCF7 <- dba(tamoxifen,
                   mask=tamoxifen$masks$Responsive & !tamoxifen$masks$MCF7, 
                   minOverlap=2)
RespNonMCF7Sites <- dba.peakset(RespNonMCF7,bRetrieve=TRUE)

# Add A sites and DB (B vs C) sites to temp DBA object and get union
tamDBA <- dba.peakset(tamoxifen,peaks=RespNonMCF7Sites,consensus=TRUE )
tamDBA <- dba.peakset(tamDBA,peaks=DBsites,consensus=TRUE)
tamDBA <- dba(tamDBA,mask=tamDBA$masks$Consensus, minOverlap=1)
consSites <- dba.peakset(tamDBA, bRetrieve=TRUE)

#  Re-load all original peaks and run analysis using union as consensus
data(tamoxifen_peaks)
tamoxifen <- dba.count(tamoxifen, peaks=consSites)
tamoxifen <- dba.contrast(tamoxifen, group1=tamoxifen$masks$Resistant,
                          group2=tamoxifen$masks$Responsive,
                          name1="Resistant", name2="Responsive")
tamoxifen <- dba.analyze(tamoxifen)
ADD COMMENT
0
Entering edit mode

Hi Rory

Thank you for your prompt response. Your solution for my second question sounds excellent. As regards question 1), by "affinity shift in one direction" I do indeed mean that all fold values have the same sign. The problem persists even when summits=250 in the call to dba.contrast. Setting bCounts=TRUE in the call to dba.report reveals that the B samples have a count of 0 in all but a few of the ~140,000 peaks, with the two A samples having a count of 0.28 and 1 respectively, again for every peak; this clearly doesn't seem right. (In contrast, for the B-C comparison the counts for a given peak range from 0 to the hundreds.) Interestingly, when bNormalized=FALSE, the fold values all switch sign. Is there any other info I can provide you to help figure out the underlying issue? (The data are unpublished, so I'm unable to send you the DBA object.)

Thanks again.

ADD REPLY
0
Entering edit mode

Can you confirm which version you are using from sessionInfo()?

I'm limited in what I can do without examining the object, but here's a few things to look at.

Something is going on in counting the A sample peaks. Look at them un-normalized (bNormalized=FALSE). If they are all 1 then nothing is being counted (the 0.28 is a normalized value). You can also look at a non-normalized MA plot.

Possible reasons for this include:

  • Bad bam files (no aligned reads of sufficient mapping quality)
  • Inconsistent chromosome naming. You can check this by looking at

    myDBA$chrmap

this will list all the chromosome names - there should only be one name for each chromosome.

  • Signal in the control track. By default the control reads are subtracted from the ATAC reads, so they could wipe out the ATAC signal. You can check by setting bSubControl=FALSE in the call to dba.analyze(), or use alternative scores in dba.count() (score=DBA_SCORE_READS).

Hope this helps!

ADD REPLY
0
Entering edit mode

Hi Rory

Thank you! I will get back to you once I've tested your suggestions.

ADD REPLY
0
Entering edit mode

Hi Rory, could you please help me with a question related to dba.contrast available at: How to make complex contrasts in DiffBind dba.contrast

ADD REPLY

Login before adding your answer.

Traffic: 2687 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6