For ChIPseq analysis, I was using DiffBind until now but want to switch to deseq2 as I want to control for multiple covariates, which is not currently offered by DiffBind package AFAIK(only one blocking factor at a time, even though I concatenate the blocking factors). I have the raw counts generated from the bams with reference to MACS2 peaks, and I can do the full analysis, however, I have a question regarding library size normalisation which is below:
DiffBind by default, does it. DESeq2 vignette also suggests, it does library size normalisation by default. But the difference that I find is, Diffbind takes the library size information from the BAM files and uses that, which is probably total mapped reads in the BAM files. In terms of DESeq2, since it doesn't have the bams, it probably do the colum wise sample read count sums to get the library size. Now these total read sums would be the read counts of only those portions detected by MACS2, but not the whole bam file, right? Fundamentally, will they be different or not? I can imagine, there might be reads in the bam files that are not detected by macs2, so I will not have the counts generated by, say, featureCounts. I would really appreciate if the community can comment on this!
Also, when Diffbind does this default normalisation (bFullLibrarysize = T) by default, then invokes the DESeq2 to do differential analysis, deseq2 there also does its own normalisation. Then when someone is using DiffBind package, does the count matrix gets two times normalised by the library size? Once from Bam read counts (DiffBind), again from total counts(DESeq2)?
My main point is, can I trust the DESeq2 library size normalisation method as opposed to Diffbind way of library size normalisation? And use DESeq2 only for analysing my data instead of Diffbind?
One probable solution could be, in DESeq2, feeding the total mapped read numbers as an extra column and keep it as continuous variable, and incorporate that column in design matrix. Does it sound logical? Has anyone done this like that?
Thank you again for taking your time to read my post! Stay safe!
Dear Rory,
Thank you very much for your swift reply! I understand now completely! I will do my analysis accordingly. One last question regarding normalisation, if I use
DESeq2
normalization for ChIPseq analysis, would it equal toDiffBind
simple normalisation in terms of the results?Also, I know until now that
Diffbind
cannot use multiple blocking factor, and you probably suggested(I cannot find the post now, sorry!) to use other softwares(likeDESeq2
directly) to model the covariates of complex experimental designs and do differential binding analysis. But I really love usingDiffBind
and I think it is a fantastic package for ChIPSeq analysis, like a swiss-army knife! Will there be a future update ofDiffBind
that might include these functions of modelling complex experimental design?Dear Rory, one more question. this default library size normalisation in
DiffBind
is done of the raw counts, or?Default is normalize counts adjusted as follows: