Diffbind dba.count doesn't give FRiP score
2
0
Entering edit mode
20 months ago
gyspace • 0

Hi I am completely new to DiffBind, R and programming in general. I want to use diffbind to analyze peaks called with macs2. When I use dba.count(), I can not get FRiP score at all. I googled this question, but didn't work.

Here is the format of sample.csv:

SampleID Replicate bamReads Peaks PeakCaller

Here is the code:

library(DiffBind)

sample1 <- dba(sampleSheet="Z:/sample.csv")
sample1

10 Samples, 931 sites in matrix (15048 total):
ID Replicate Intervals
1  cfDNA21         1      2425
2  cfDNA22         1      2769
3  cfDNA23         1      1887
4  cfDNA24         1       587
5  cfDNA25         1      1269
6  cfDNA26         1      1082
7  cfDNA27         1       994
8  cfDNA28         1      2529
9  cfDNA29         1      2427
10 cfDNA30         1      1637

tam.counts <- dba.count(sample1)
(Sample: Z:/cfDNA30.bam125    Reads will be counted as Paired-end.
 Warning messages:1: In serialize(data, node$con) :'package:stats' may not be available when loading)

tam.counts

10 Samples, 443 sites in matrix:

ID Replicate     Reads
1  cfDNA21         1 1267699.0
2  cfDNA22         1  751270.0
3  cfDNA23         1  410182.5
4  cfDNA24         1  264162.5
5  cfDNA25         1  239933.0
6  cfDNA26         1  172226.0
7  cfDNA27         1  196008.0
8  cfDNA28         1  753888.5
9  cfDNA29         1 1561326.5
10 cfDNA30         1  340597.0

There is no FRiP column, I dont know why. Someone encountered the same problem, but it was not successfully solved. Can anyone answer this question?

Diffbind FRiP • 2.5k views
ADD COMMENT
1
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. If your code has long lines with a single command, break those lines into multiple lines with proper escape sequences so they're easier to read and still run when copy-pasted. I've done it for you this time.
code_formatting

ADD REPLY
0
Entering edit mode

Thank you very much! I will try next time.

ADD REPLY
2
Entering edit mode
19 months ago
Rory Stark ★ 2.1k

I've had a look at some sample data and I can see what is happening.

The issue is that only a tiny fraction of reads overlap any consensus. All of the calculated FRiP scores round down to 0.00 so they are not reported.

In your original post, we can see that while there are more than 15,000 peaks, less than 1,000 of them overlap in any 2 of the 10 samples, indicating that there is not much of a consistent signal in these samples. Only a few reads overlap the peaks in any one sample; more than 99.99% in each bam file do not map to any of the consensus peaks.

A number of things could be causing this. I ran a quick ChIPQC report on the sample data and it showed that a high proportion of the reads in the bam file have very low mapq scores; about 70% of the scores are zero (usually indicating multi-mapped, but could also be low-quality sequencing) and 75% overall are filtered out given the default mapQCth=15. Would you expect to have a very high rate of non-unique mapping in this dataset? Also, no reads are marked as duplicated, which either means that no duplicate marking was done, or that they have been removed in some way (as there are usually at least some duplicates).

There could also be an issue upstream in the peak calling phase, or it could just be that there is no real enrichment in these samples and they are essentially reporting a "background" (like we would expect to see in an Input control).

Hope this helps!

ADD COMMENT
0
Entering edit mode

Thank you for your help! It is very useful! After bw2 mapping, there were about 10% reads aligned concordantly >1 times in sample1. Then I used picard MarkDuplicates REMOVE_DUPLICATES=true I=sample.bam O=sample.rmdup.bam M=rmdup.metrc.csvto remove dups. It seems too many reads with MAPQ>0 were removed by picard? Maybe I need to count the distribution of MAPQ. I used macs2 callpeak -t sample.bam -g hs -f BAMPE -n sample --outdir ./callpeak/ 2>./callpeak/sample.callpeak.log to call peaks and I checked some of peaks in IGV. I found that some of reads in the positions of the peaks had MAPQ=0. If I want to set default mapQCth=15 to mapQCth=0, should I use dba.count(mapQCth=0)? By the way, what should I do If I want to analyze the difference in peaks between two sets of samples with diffbind? Do I need to put the information of the two sets of samples into one sheet to create one dba, or do I need to create two dbas seperately?

ADD REPLY
0
Entering edit mode

You can set mapQCth when calling dba.count() as you suggest, or alternatively you can set it as a configuration parameter:

sample1$config$minQCth <- 0

prior to calling dba.count().

With multiple sample groups, you can create a single DBA object with an appropriate design formula, and test the different contrasts against the single model.

ADD REPLY
1
Entering edit mode
20 months ago
Rory Stark ★ 2.1k

It would be helpful to know what version of DiffBind are you running (output of sessionInfo()). There have been some fixes in this area in more recent versions.

ADD COMMENT
0
Entering edit mode

DiffBind 3.8.4. It should be the lasted version.

ADD REPLY
1
Entering edit mode

I can't reproduce this in the current version.

One possibility is that all of the FRiP values are equal to 1, meaning all of your reads overlap at least one consensus peak. In this case, the FRiP values are not reported when you print out the DBA object. You can check for this as follows:

dba.show(tam.counts)$FRiP

The value should either be NULL or a vector of length 10 containing all 1 values. If it is all 1s, that is the issue there. If it is NULL, I can give you some ways to calculate the FRiP yourself, and if you can give me access to some of your data (samplesheet and peak/bam files for 2-3 samples), I can track down what is happening.

ADD REPLY
0
Entering edit mode

Thank you! I tried the command and dba.show() returned NULL. So how can I send my sample data?

ADD REPLY
0
Entering edit mode

If you can make it available on a server (like Dropbox, GDrive, iCloud) and email me a link.

ADD REPLY
0
Entering edit mode

I found your email in https://bioinfotraining.bio.cam.ac.uk/staff/rory-stark-phd, is the email right? I sent a google drive link to you. If the link is unavaliable, please let me know.

ADD REPLY

Login before adding your answer.

Traffic: 1817 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6