Question

DiffBind missing peaks and FRiP

0

Entering edit mode

2.8 years ago

Christine ▴ 20

I noticed that when I first read in my sample sheet, I get a DBA object with 1226 sites (consensus peaks), yet after running dba.count() on this same object, it is down to 1188 sites. I did not apply any blacklists/greylists.

Also, the FRiP column is missing after I run dba.count(). Does anyone have ideas on what could have gone wrong?

There are 5 biological replicates of wildtype and 6 of mutant, but only one sequencing run was done per biological replicate (so no technical replicates). Peaks were called with MACS2 --broad. I made .bed files from the first 6 columns of the .broadPeak output.

> WS
11 Samples, 1226 sites in matrix (13233 total):
    ID Condition Replicate Intervals
1  705       Mut         1      1284
2  709        WT         1       319
3  719       Mut         1      8909
4  720       Mut         1      1994
5  721        WT         1         6
6  722        WT         1        15
7  728        WT         1        49
8  729       Mut         1        38
9  739       Mut         1         0
10 744        WT         1      2542
11 748       Mut         1         1
> plot(WS)
> WS <- dba.count(WS)
Computing summits...
Re-centering peaks...
> WS
11 Samples, 1188 sites in matrix:
    ID Condition Replicate    Reads
1  705       Mut         1 18209602
2  709        WT         1 17207846
3  719       Mut         1 14440635
4  720       Mut         1 19380340
5  721        WT         1 14392339
6  722        WT         1 16246924
7  728        WT         1 18220300
8  729       Mut         1 16485420
9  739       Mut         1 15516176
10 744        WT         1 16693032
11 748       Mut         1 15745849

DiffBind ChIP-seq • 3.4k views

ADD COMMENT • link updated 20 months ago by Rory Stark ★ 2.1k • written 2.8 years ago by Christine ▴ 20

1

Entering edit mode

dba.count will filter peaks who's "cpm < 1" (dba.count(filter = 1)) by default, this may decreasing peak numbers.

ADD REPLY • link 2.8 years ago by MatthewP ★ 1.4k

score 1 · Answer 1 · 2022-02-22

1

Entering edit mode

2.8 years ago

Rory Stark ★ 2.1k

As MatthewP suggested, filtering can remove peak intervals with low overlapping read counts. Try calling dba.count() with filter=0 to compare how many peaks are filtered in this way.

Less commonly, it is possible for peaks to be merged when re-centering around summits. This happens if peaks overlap after being recentered (and having 401bp widths using the default setting of summits=200.) In this case the overlapping peaks are merged into a single peak which is them re-centered around the consensus summit in that region. You can try running dba.count() with summits=FALSE to see is any peaks are being merged in this way.

Running dba.count() with both filter=0 and summits=FALSE should result in the same number of peaks before and after counting (using the default consensus of minOverlap=2).

I have checked in a fix for the intermittent disappearing FRiP issue, which should be in effect from DiffBind_3.4.11 onwards.

ADD COMMENT • link 2.8 years ago by Rory Stark ★ 2.1k

0

Entering edit mode

Thank you so much! I got the same number of peaks with the parameters filters=0 and summits=FALSE.

Regarding the FRiP issue, is there a workaround that I could use to retrieve these values until DiffBind_3.4.11 is released? It is a bit strange because I am able to get the FRiP column when running the tamoxifen vignette, so I just want to be sure it is not due to some issue in my own data.

ADD REPLY • link 2.8 years ago by Christine ▴ 20

1

Entering edit mode

DiffBind_3.4.11 should be available within the week, but here is a "backdoor" workaround in the meantime:

WS <- dba.count(WS)
WS$SN <- DiffBind:::pv.Signal2Noise(WS)

(Don't tell anyone, it's a secret)

ADD REPLY • link 2.8 years ago by Rory Stark ★ 2.1k

1

Entering edit mode

That's a quick turnaround, thank you so much - looking forward to the new release!

Unfortunately I keep getting NULL when I run that workaround line, but was thinking if I do

WS.norm <- dba.normalize(WS, normalize=DBA_NORM_LIB, library=DBA_LIBSIZE_PEAKREADS, background=FALSE)
norm <- dba.normalize(WS.norm, bRetrieve = TRUE)

If I'm understanding correctly, lib.sizes should then give me the number of reads falling within the peaks, and I can divide by the total read count...?

Also, if using spikein=TRUE, does lib.sizes then include the Drosophila reads? Looks like lib.sizes is gaining reads somehow when I run with spikein=TRUE.

ADD REPLY • link 2.7 years ago by Christine ▴ 20

0

Entering edit mode

This is a clever way to get the reads in peak numbers, and a better workaround than my suggestion using an internal function!

Yes, the lib.sizes values include the spike-in reads as well as the primary reads. Note that if you use normalize=DBA_NORM_LIB with spike-in, only the spike-in library sizes will be used.

ADD REPLY • link 20 months ago by Rory Stark ★ 2.1k

score 0 · Answer 2 · 2023-03-31

0

Entering edit mode

20 months ago

gyspace • 0

Hi, I also can not get FRiP score through dba.count. Have you solved the problem?