The columns in the sample sheet are documented in the help page for the dba()
function, in the section explaining the use of the sampleSheet
parameter:
?dba
There is a copy of the example sample sheet here:
samplesheet.path <- system.file("extra/tamoxifen.csv", package="DiffBind")
The two most common issues people have setting up a sample sheet for an experiment are as follows.
The first issue is using relative paths in the sample sheet that are are accessible from the working directory when calling dba()
.
You can check this by verifying that the files are in the correct place:
basedir <- system.file("extra", package="DiffBind")
samplesheet.path <- file.path(basedir, "tamoxifen.csv")
samples <- read.csv(samplesheet.path)
file.exists(file.path(getwd(),samples$Peaks))
Actually the files are relative to basedir
, not the home directory:
file.exists(file.path(basedir, samples$Peaks))
You can fix this in a few ways. One is by specifying the "home" directory for the data:
myDBA <- dba(sampleSheet="tamoxifen.csv", dir=basedir)
Another is to change the working directory:
setwd(basedir)
myDBA <- dba(sampleSheet=samplesheet.path)
The final way to fix this is to use full (absolute) file paths in the samplesheet.
The second issue if having a mis-match between the format of the peak files and the format specified using the PeakCaller
column (and possibly the PeakFormat
and ScoreCol
columns if present). The important thing is to ensure that the column where DiffBind
is expecting to see peak scores is actually a numeric value. For example, if the PeakCaller
values are bed
, the fifth column of the peak file should be a numeric score.
Code?
Thanks! Sorry, here is the code:
db.object = dba(sampleSheet="diffbind_low_high_21072021.csv")
. I have previously used this with no problem but seems like I cannot get away now.. Not sure exactly what is going wrong.