I have 3 chip-seq biological replicates, each with its own input control. I was interested in using diffBind to performs some IDR-style analysis - e.g. take only the peaks that come up in more than one sample.
experiment <- dba(sampleSheet="exp_samples.csv")
pdf('overlap_venn.pdf')
dba.plotVenn(experiment, experiment$masks$macs)
dev.off()
I got a nice Venn diagram:
I wanted to perform a sanity check and had a look at the peaks occurring only in A (sample1):
experiment.OL = dba.overlap(experiment, experiment$masks$macs)
a_peaks <- as.data.frame(experiment.OL$onlyA, row.names = NULL, optional = FALSE)
> dim(a_peaks)
[1] 500 6
> head(a_peaks)
seqnames start end width strand score
1 GL172637.1 196748 197840 1093 * 0.09639324
..
I decided to check the first peak and, sure enough I found it in the xls file produced for this sample by macs14:
chr start end length summit tags =-10*LOG10(pvalue) fold_enrichment FDR(%)
GL172637.1 196748 197840 1093 299 38 62.55 48.57 100
However, what confuses me is the score computed by diffBind (0.09639324). In the diffBind's documentation it says "The scores associated with each site are derived from the peak caller condence score, and are a measure of condence in the peak call (occupancy), not a measure of how strong or distinct the peak is." I don't know how to interpret this. Also, it is not clear to me how is this score related to its FDR (100%). Can anyone help?
I have 4 samples each with individual scores in the range of 100s and also a consenus peakset of all 4 samples where i tried to to convert the scores as per the above into decimal range, but it still does not fit. Am i missing something?