Question

Batch Effect Correction in RNA-seq Data Using SVA and DESeq2

0

Entering edit mode

11 months ago

Nelo ▴ 20

I am analyzing publicly available bulk RNA-seq data to identify differential expression between control and stress-treated samples. The metadata does not explicitly mention batch information, but my PCA plot suggests possible batch effects.

PCA plot is attached here:

enter image description here

I am using DESeq2 for differential expression analysis, and I plan to use the sva package for batch effect correction.

My current code:

counts <- read.delim("counts.csv", header = TRUE, row.names =1, sep = ",", check.names=FALSE)
colData <- read.delim("colData.csv", header= TRUE,sep = ",")
dds <- DESeqDataSetFromMatrix(countData = counts, colData = colData, design = ~ treatment)
vsd <- varianceStabilizingTransformation(dds, blind=T)
dds <- DESeq(dds)
res <- results(dds)
res_ordered <- res[order(res$padj),]
de_genes <- rownames(res_ordered)[which(res_ordered$padj <= 0.01 & abs(res_ordered$log2FoldChange) >= 1)]
de_expression <- assay(vsd)[de_genes, ]
upregulated_genes <- rownames(res_ordered)[res_ordered$log2FoldChange >= 1 & res_ordered$padj  <= 0.01]
downregulated_genes <- rownames(res_ordered)[res_ordered$log2FoldChange <= -1 & res_ordered$padj  <= 0.01]

My specific question is:

How do I correctly use sva to detect and adjust for hidden batch effects given that I do not have explicit batch information?

Are there any additional steps or considerations I should be aware of when using sva? For example, how can I validate that the surrogate variables identified by sva are effectively correcting for batch effects? Is it necessary to run ComBat in conjunction with sva, or is sva sufficient in this scenario?

Thank you very much in advance!

SVA Batch DESeq2 RNA-seq • 957 views

ADD COMMENT • link updated 11 months ago by Kevin Blighe 89k • written 11 months ago by Nelo ▴ 20

score 0 · Answer 1 · 2024-08-25

0

Entering edit mode

11 months ago

Kevin Blighe 89k

The sva package can automatically detect the stat. significant number of surrogate variables. May you please follow the guidance, as elaborated here:

https://www.bioconductor.org/packages/devel/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#using-sva-with-deseq2

Kevin

ADD COMMENT • link 11 months ago by Kevin Blighe 89k