Error in DiffBind. Please HELP!
2
0
Entering edit mode
3.4 years ago
a.hayat20 • 0

Hi all,

I have set up this csv file for my differential analysis but I continue an error which I do not know how to solve. This is the set for DiffBind: enter image description here

This is the error I get:

Error in Summary.factor(c(1L, 47442L, 58553L, 69664L, 80775L, 91886L,  : 
  ‘max’ not meaningful for factors

Any help would be very much appreciated!

Thanks

Analysis NGS ATAC R DiffBind • 3.4k views
ADD COMMENT
0
Entering edit mode

Code?

ADD REPLY
0
Entering edit mode

Thanks! Sorry, here is the code:

db.object = dba(sampleSheet="diffbind_low_high_21072021.csv"). I have previously used this with no problem but seems like I cannot get away now.. Not sure exactly what is going wrong.

ADD REPLY
1
Entering edit mode
3.4 years ago
Rory Stark ★ 2.1k

Looks like you are using _summits.bed files from MACS. Two issues here:

  1. The summits files only contain a single base for each peak, not a region, so they really aren't the right ones to use. The peaks_bed or _peaks.xls, are usually used, or the narrowPeak or broadPeak files.
  2. You haven't included a specification of the peak file format with the PeakCallerand/or PeakFormat columns in the samplesheet. The default format is raw, which has the score in the fourth column, while a bed format has the score in the fifth column.
ADD COMMENT
0
Entering edit mode

Dear Rory,

Thanks for your response! I changed the summit files to NarrowPeak files and peaks.xls but still come up with this error:

Low_HER2 Low_HER2 72H DMSO 1 raw Error in if (file.info(peaks)$size > 0) { : missing value where TRUE/FALSE needed

I did not fully understand your second comment. Could you please explain in a little bit more detail for novice like me please?

Thanks so much!

ADD REPLY
1
Entering edit mode

You need to tell DiffBind the format of the peak files so it knows where to find the score. If you are using the _peaks.xls files, you should include a column in your sample sheet labelled PeakCaller with the values set to macs. If you are using the narrowPeak files, the values should be narrow. You can see an example (using bed format) as follows:

samples <- read.csv(paste(system.file('extra',package='DiffBind'),
                          "/tamoxifen.csv",sep=""))

Based on the error you are seeing, it looks like the peak files in your sample sheet can't actually be found. Try this:

samples <- read.csv("diffbind_low_high_21072021.csv")
file.info(samples$Peaks)

If any of the reported values are FALSE, you need to fix the sample sheet (or change the working directory) so the peak files can be read.

ADD REPLY
0
Entering edit mode

Thank you so much for your help. I used the .xls files and labelled the column PeakCaller. This worked!

My next code in the pipeline is:

db.object = dba.count(db.object, bParallel=TRUE, fragmentSize=0, score=DBA_SCORE_RPKM_FOLD)

But I get the following error:

Error in pv.counts(DBA, peaks = peaks, minOverlap = minOverlap, defaultScore = score, : Some read files could not be accessed. See warnings for details. In addition: Warning message: not accessible

It looked me as if it could not find my bam files or that they were not indexed based on my research. But I am sure the working directory is set correctly and that files are indexed. So I am not sure what might be happening.

Thank you!

ADD REPLY
0
Entering edit mode

DiffBind will create index files if they are not here (and you have permission to write in the directory), so this is probably not the issues.

If you set bParallel=FALSE, it will print out the message for each file, so you can more easily narrow down which files have a problem. You can also run this line to confirm that the files are really there:

file.access(db.object$class[10, mode = 4)
file.access(db.object$class[11, mode = 4) # if you have control files

If any of the files report a value of -1, the file is not accessible from your current working directory.

ADD REPLY
0
Entering edit mode

Hey Rory, is there some sort of standard formatting guidance that people could tap into? I'm currently searching through Biostars and any online tutorial I can get my hands on trying to figure out how to compile a SampleSheet correctly for use with DiffBind, and it's not really outlined plainly in any of the workshop notes or online vignettes. I'm contemplating pausing the Differential Binding Analysis of ChIP--seq Experiments workshop to try and write down the headings visible on your screen - which is a possibility, but if there's some documentation about formating that csv file, that would be ideal.

ADD REPLY
0
Entering edit mode
2.5 years ago
Rory Stark ★ 2.1k

The columns in the sample sheet are documented in the help page for the dba() function, in the section explaining the use of the sampleSheet parameter:

?dba

There is a copy of the example sample sheet here:

samplesheet.path <- system.file("extra/tamoxifen.csv", package="DiffBind")

The two most common issues people have setting up a sample sheet for an experiment are as follows.

The first issue is using relative paths in the sample sheet that are are accessible from the working directory when calling dba().

You can check this by verifying that the files are in the correct place:

basedir <- system.file("extra", package="DiffBind")
samplesheet.path <- file.path(basedir, "tamoxifen.csv")
samples <- read.csv(samplesheet.path)
file.exists(file.path(getwd(),samples$Peaks))

Actually the files are relative to basedir, not the home directory:

file.exists(file.path(basedir, samples$Peaks))

You can fix this in a few ways. One is by specifying the "home" directory for the data:

myDBA <- dba(sampleSheet="tamoxifen.csv", dir=basedir)

Another is to change the working directory:

setwd(basedir)
myDBA <- dba(sampleSheet=samplesheet.path)

The final way to fix this is to use full (absolute) file paths in the samplesheet.

The second issue if having a mis-match between the format of the peak files and the format specified using the PeakCaller column (and possibly the PeakFormat and ScoreCol columns if present). The important thing is to ensure that the column where DiffBind is expecting to see peak scores is actually a numeric value. For example, if the PeakCaller values are bed, the fifth column of the peak file should be a numeric score.

ADD COMMENT

Login before adding your answer.

Traffic: 2303 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6