SVA Input: Should SVs be calculated on normalized data?
1
1
Entering edit mode
6.6 years ago

Hello,

I'm trying to use SVA to calculate SVs in my DESeq2 analysis.

In a couple of examples, I've seen people generate model matrices using the raw data, but actually calculate SVs on normalized data. For example, this post: Designing of model.matrix for batch correction of Time Course data ?

Shouldn't you be using the raw data matrix in each step?

Furthermore, if you're a DESeq2 library user and you do decide to use normalized data for the svaseq() step, do you use RLog-normalized data, or size factor-corrected data? (I suspect you shouldn't use the Rlog-normalized data because svaseq() can't take negative values, and rlogging the data can produce values <0 - but then is the size factor multiplication alone really successfully "normalizing" the data in a useful way, since it doesn't account for skew?)

RLog-normalized data generated by:

myData <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                                     directory = pathToHTSeq,
                                     design = ~Genotype)
myData_rlg <- assay(rlog(myData))

Size factor-corrected generated by:

myData <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                                     directory = pathToHTSeq,
                                     design = ~Genotype)
tst <- estimateSizeFactors(myData)
tst2<-counts(tst, normalized=TRUE)

Thanks!

For reference, my current SVA code is:

# import raw data
myData <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                                         directory = pathToHTSeq,
                                         design = ~Genotype)

# Make rawCounts variable
rawCounts <-data.frame( counts(myData) ) 


#import needed library
library('sva')

# make a full model matrix
mod  <- model.matrix(~ Genotype, colData(myData))

# make a null model to compare it to
mod0 <- model.matrix(~   1, colData(myData))

# perform SVA without defining how many non-Genotype batch effects you think there are
svseq <- svaseq( as.matrix(rawCounts), mod, mod0) 
print(svseq)

# perform SVA when you specifically expect 8 SVAs (number of SVAs must be less than number of samples)
nSurr <- 5
svseq <- svaseq( as.matrix(rawCounts), mod, mod0, nSurr)
R SVA RNA-Seq • 3.7k views
ADD COMMENT
0
Entering edit mode

I'm meeting the seem question, you can have a look at https://www.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#pre-filtering-the-dataset where sva input takes a normalized counts through counts(data, normalized = T)

ADD REPLY
0
Entering edit mode

Great! Thank you very much for your update.

ADD REPLY
0
Entering edit mode
11 months ago
Garrett • 0

Hi,

I was interested in comparing PCA plots from “non-SV’d” data to data to which I had applied SVA.

For this, I had needed to apply rlog() (this is what we use in our pipeline).

It is important to note that sva::svaseq() is nearly identical to sva::sva(), with the exception of a log transform on the input data.

Because of this, I just used sva::sva() on rlog’d data (that I had filtered to non-negative data)

I can add code later today.

ADD COMMENT
0
Entering edit mode

Did you notice that you're answering a 6 year old post? I ask because you do not acknowledge that or state how your answer addresses the actual question.

Follow up: Did you even intend on your post being an answer or do you just have a question related to the top level post?

ADD REPLY

Login before adding your answer.

Traffic: 1727 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6