Hello,
I'm trying to use SVA to calculate SVs in my DESeq2 analysis.
In a couple of examples, I've seen people generate model matrices using the raw data, but actually calculate SVs on normalized data. For example, this post: Designing of model.matrix for batch correction of Time Course data ?
Shouldn't you be using the raw data matrix in each step?
Furthermore, if you're a DESeq2 library user and you do decide to use normalized data for the svaseq() step, do you use RLog-normalized data, or size factor-corrected data? (I suspect you shouldn't use the Rlog-normalized data because svaseq() can't take negative values, and rlogging the data can produce values <0 - but then is the size factor multiplication alone really successfully "normalizing" the data in a useful way, since it doesn't account for skew?)
RLog-normalized data generated by:
myData <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
directory = pathToHTSeq,
design = ~Genotype)
myData_rlg <- assay(rlog(myData))
Size factor-corrected generated by:
myData <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
directory = pathToHTSeq,
design = ~Genotype)
tst <- estimateSizeFactors(myData)
tst2<-counts(tst, normalized=TRUE)
Thanks!
For reference, my current SVA code is:
# import raw data
myData <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
directory = pathToHTSeq,
design = ~Genotype)
# Make rawCounts variable
rawCounts <-data.frame( counts(myData) )
#import needed library
library('sva')
# make a full model matrix
mod <- model.matrix(~ Genotype, colData(myData))
# make a null model to compare it to
mod0 <- model.matrix(~ 1, colData(myData))
# perform SVA without defining how many non-Genotype batch effects you think there are
svseq <- svaseq( as.matrix(rawCounts), mod, mod0)
print(svseq)
# perform SVA when you specifically expect 8 SVAs (number of SVAs must be less than number of samples)
nSurr <- 5
svseq <- svaseq( as.matrix(rawCounts), mod, mod0, nSurr)
I'm meeting the seem question, you can have a look at https://www.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#pre-filtering-the-dataset where sva input takes a normalized counts through counts(data, normalized = T)
Great! Thank you very much for your update.