My goal is to make PCA and correlation plots of my RNA-Seq BAM files. Some useful discussion on BioStars such as this, have helped guide my steps.
In another post, responding to a question on library size normalization at this BioStars post, user ATpoint indicates size factor calculation must be performed as follows:
## edgeR:: calcNormFactors
tmp.NormFactors <- calcNormFactors(object = raw.counts, method = c("TMM"), doWeighting = FALSE)
## raw library size:
tmp.LibSize <- colSums(raw.counts)
## calculate size factors:
SizeFactors <- tmp.NormFactors * tmp.LibSize / 1000000
In my analyses, I used DESeq2
instead of edgeR
, after importing SALMON quantification using tximport, using syntax instructions at BioConductor, as follows:
library(DESeq2)
Design <- DataFrame((cbind(BiolRep, Genotype, TimePoints)))
dim(Design)
#[1] 144 3
rownames(Design) <- colnames(txi.salmon$counts)
design_formula <- ~ TimePoints * Genotype
dds <- DESeqDataSetFromTximport(txi.salmon, Design.df, design_formula)
NormValues <- estimateSizeFactorsForMatrix(counts(dds))
So my 1st question is this:
To use DESeq2-based size Factors for converting BAM to BigWig, using bamCoverage of deepTools, I would still need to calculate SizeFactors
as follows, rather than use just the (inverse of the) NormValues
, am I right?
SizeFactors <- NormValues * LibSize / 1000000
And my 2nd question is :
With SizeFactors
calculated as above, I'd then have to use the inverse of those values to obtain my final normalized BAM files as inputs for use with deepTools, with the following syntax, am I right?
bamCoverage -b $BAM_IN -o $BigWig_OUT --normalizeUsing None --scaleFactor $(1/Size_factor) --effectiveGenomeSize $ACGTtotalCount
Could you please confirm or correct the approach I have indicated above? Thanks in advance!
Thanks, Devon. Just to be doubly sure I understood you right,
LibSize
is not relevant or factored into thesizeFactor
value, just thecalcNormFactors
values, yes? (i.e. before it's inverse is used with bamCoverage)Correct, you don't need to account for library size.
Yes that is true, as Devon says. The DESeq2 factors already have the lib.size-part incorporated while in edgeR you have to calculate it manually.
Thank you for confirming _/\_
On a related topic - for
multiBigwigSummary
, is it possible to specify--bwfiles
and--labels
as 2 text files containing the respective lists, rather than explicitly at the command line? I have ~ 150 input BW files, so syntax clarity may become an issue, hence this query. This is a very minor issue though, if I can even call it that :) TIA!No, there's no way to feed the file names in via a file, since we kind of assume that anyone handling that many files is using something like snakeMake to automatically generate the command. As an aside, it's tough to interpret any plots with that many samples.
I agree - the
plotPCA
andheatmap
images generated were hard to interpret, I had to use much smaller and meaningful subsets to be able to 'see' anything. Thanks very much for your help.