Question

sva + egdeR - differential expression analysis - RNA-seq data

4

Entering edit mode

9.3 years ago

mrodrigues.fernanda ▴ 60

Dear list,

I am performing an RNA-seq analysis for differential gene expression and I have a question regarding the use of the package sva for the estimation of unknown batch effects.

In the sva vignette, it shows examples of using the package for estimation of surrogate variables and then performing DE analysis using the package limma (I am referring to the section 6 of the sva vignette: "Adjusting for surrogate variables using the limma package")

Is that possible to do the same using the package edgeR instead of limma?

Or is sva not compatible with edgeR?

Sorry if this is a dumb question. I am a little new to the bioinformatics world.

Thank you!

RNA-Seq edgeR sva • 8.2k views

ADD COMMENT • link updated 2.7 years ago by Ram 45k • written 9.3 years ago by mrodrigues.fernanda ▴ 60

1

Entering edit mode

For RNA-seq data, you should use the svaseq() function instead of sva(). That's true whether you're using limma voom, edgeR or DESeq. The author also recommends scaling and normalizing the counts before running SVA.

Basic example, assuming counts is your count matrix and clin is your clinical data file:

y <- DGEList(counts)
y <- calcNormFactors(y)
mod <- model.matrix(~ Condition, data=clin)
mod0 <- model.matrix(~ 1, data=clin)
svobj <- svaseq(cpm(y), mod, mod0) 
des <- cbind(mod, svobj$sv)

You can now proceed with des as your design matrix.

ADD REPLY • link 8.6 years ago by d.watson ▴ 10

0

Entering edit mode

Hi Watson, thanks for the reply. I have a question for next steps. So then we have des as design matrix: disp = estimateDisp (???, design) What should we use as data for estimateDisp?

ADD REPLY • link 7.7 years ago by pwwang ▴ 40

score 1 · Answer 1 · 2015-12-29

In my (shallow) understanding, no: sva manual suggests a log( g[ij] + c ) transformation, whereas edgeR uses the negative binomial to model read counts, and specifically states that only read counts should be used. You may use sva + voom + limma; or including batch effects on your glm model and proceedign with edgeR.

score 1 · Answer 2 · 2015-12-29

1

Entering edit mode

9.3 years ago

Devon Ryan 105k

You end up just adding columns to your model matrix in edgeR. Here's a similar discussion about SVA and DESeq2: Batch effect in DESeq2 - multiple factor or SVA?

The same principles apply.

ADD COMMENT • link 9.3 years ago by Devon Ryan 105k