Question

Batch correction in DESeq2---Normalized Counts

1

Entering edit mode

4.4 years ago

mtsompana ▴ 10

I am analyzing bulk RNA-seq data from samples sequenced at different time points (different library preparation and sequencing strategy).

I am using the code below to accommodate for batch effects so I can do clustering:

dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, design = ~ Batch + Condition)

dds <- dds[ rowSums(counts(dds)) > 10, ]

rld<-rlog(dds,blind = FALSE)

assay(rld) <- limma::removeBatchEffect(assay(rld), rld$Batch)

I need to get *normalized gene counts (not rlog counts) after I correct for batch effects. I need this so I can compare levels of expression for a specific gene among the different samples. I am not sure how to do this.*

I know how to do it with the original dds object:

dds.normCts <- estimateSizeFactors(dds)

cts.norm <- counts(dds.normCts,normalized=TRUE)

write.csv(cts.norm, file = "normalizedCounts.csv")

Thank you for your time and help in advance!

RNA-Seq • 3.7k views

ADD COMMENT • link updated 4.4 years ago by ATpoint 88k • written 4.4 years ago by mtsompana ▴ 10

0

Entering edit mode

If you are looking for gene level comparison consider only the normalized values don't take rlog

ADD REPLY • link 4.4 years ago by DareDevil ★ 4.4k

score 2 · Answer 1 · 2021-02-10

2

Entering edit mode

4.4 years ago

ATpoint 88k

If you want to start with the normalized counts without the extended rlog procedure on top of it (rlog are normalized counts plus variance-stabilized and fold changes being moderated/shrunken, see manual) then simply use:

normbatch <- limma::removeBatchEffect(log2(counts(dds, normalized=TRUE)+1), dds$Batch)

ADD COMMENT • link 4.4 years ago by ATpoint 88k

0

Entering edit mode

Thank you so much. Do you recommend using the rlog values instead?

ADD REPLY • link 4.4 years ago by mtsompana ▴ 10

0

Entering edit mode

I think I figured it out. Could you please confirm this is correct. Thank you!!

dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, design = ~ Batch + Condition3)

dds <- dds[ rowSums(counts(dds)) > 10, ]

dds <- DESeq(dds)

rld<-rlog(dds,blind = FALSE)

assay(rld) <- limma::removeBatchEffect(assay(rld), rld$Batch)

counts_afterbatchcorrection <- assay(rld)

write.csv(cts.norm, file = "normalizedCountsafterbatchcorrection.csv")

ADD REPLY • link 4.4 years ago by mtsompana ▴ 10

0

Entering edit mode

Yeah, this is the same as in your original question, is it? Looks ok to me. rlog has a couple of nice features and it is recommended by the DESeq2 author for downstream applications. I use it a lot.