Question

DESeq2 Batch correction enquiry?

0

Entering edit mode

5.1 years ago

prithvi.mastermind ▴ 60

I'm using DESEq2 package for obtaining normalized data from raw read counts of RNA-Seq data. I'm not interested to do any downstream analysis like DEGs identification and just want the normalized and transformed expression matrix. In the steps, I also incorporated the following for any batch effect.

Mnemiopsis_count_data = read.table(file = "COUNT_DATA.txt", header = T, sep = "\t")
Mnemiopsis_col_data = read.table(file = "COL_DATA.txt", header = T, sep = "\t")
boxplot(Mnemiopsis_count_data)
hist(Mnemiopsis_count_data[,1]) # Plotting only the first sample (column 1)
pseudoCount = log2(Mnemiopsis_count_data + 1)
boxplot(pseudoCount)
hist(pseudoCount[,1])

dds = DESeqDataSetFromMatrix(countData = Mnemiopsis_count_data, colData = Mnemiopsis_col_data, design = ~ condition) # we're testing for the different condidtions
rld = rlogTransformation(dds)

par( mfrow = c( 1, 2 ) )
plot(log2( 1 + counts(dds)[ , 1:2] ), pch=16, cex=0.3, main = "log2")
plot(assay(rld)[ , 1:2],pch=16, cex=0.3, main = "rlog")


dds = estimateSizeFactors(dds)
sizeFactors(dds)
plotPCA(rld, intgroup=c("condition"))
plot(pseudoCount[,4], pseudoCount[,7], pch = 20, xlab = "Aboral 4", ylab = "Oral 3") + abline(0,1, col = "red")

vsd <- vst(dds)
assay(vsd) <- limma::removeBatchEffect(assay(vsd), vsd$batch)
plotPCA(vsd)

dds = DESeq(dds)
res <- results(dds)
mcols(res, use.names=TRUE)
summary(res)
norm_counts = counts(dds, normalized = TRUE) # Extract the normalized counts

I have some queries regarding batch correction:

1) Does the vsd object contains our normalized expression matrix and how can I save it in R? 2) If I'm using rlog instead of vst for transformation, then also do I need a batch correction? 3) Does batch correction changes our normalized and transformed expression matrix? If yes, what does it do? 4) Does the above script indicates the batch effects or does it corrects them also? Please share R codes if any modifications are required. 5) Also, do I have to increment the normalized data by +1 before saving?

Thanks

RNA-Seq R software error next-gen • 2.5k views

ADD COMMENT • link updated 5.1 years ago by ATpoint 88k • written 5.1 years ago by prithvi.mastermind ▴ 60

0

Entering edit mode

What is the content of vsd$batch? I cannot see that from the code. It should be a vector that indicates which sample belongs to which batch. This is somewhat related to:

4) Does the above script indicates the batch effects or does it corrects them also?

The script visualizes a potential batch via the PCA you should use, together with information from the wetlab such as the day of library prep etc. to check whether variation across datapoints and biological replicates could be due to a batch effect. The PCA after the batch correction with limma can then be used to check if this was successful. Is this your script?

ADD REPLY • link 5.1 years ago by ATpoint 88k

0

Entering edit mode

Thank you sir for your prompt response. No this is not my script. Its an example script from DESEq2 page which I ran for understanding the concept. the vsd$batch is showing NULL. what does this mean?

ADD REPLY • link 5.1 years ago by prithvi.mastermind ▴ 60

0

Entering edit mode

It does not seem that you are following the DESeq2 workflow / vignette - can you confirm which one you are following? As an example, it makes little sense to create and use your pseudoCount object, as it is just logged raw counts + 1 (?).

Which variable contains your batch information? - where is it stored?

For all intents and purposes, this is a simple workflow that you could use, assuming that batch is a variable in your colData:

# create DESeq object
# include batch in design formula so that our test statistics for condition..
# ...are controlled for batch
dds <- DESeqDataSetFromMatrix(countData = Mnemiopsis_count_data,
  colData = Mnemiopsis_col_data,
  design = ~ condition + batch)

# normalise
dds <- DESeq(dds)

# extract normalised counts
norm_counts = counts(dds, normalized = TRUE)

# regularised log transformation
rld <- rlogTransformation(dds)

# variance -stabilised transformation
vsd <- vst(dds)

# eliminate batch effect directly from vsd
assay(vsd) <- limma::removeBatchEffect(assay(vsd), colData(dds)[['batch']])

ADD REPLY • link 5.1 years ago by Kevin Blighe 89k

0

Entering edit mode

Thank you sir for your help. But how can i save expression data that has been batch corrected after running the last command: assay(vsd) <- limma::removeBatchEffect(assay(vsd), colData(dds)[['batch']])

ADD REPLY • link 5.1 years ago by prithvi.mastermind ▴ 60

0

Entering edit mode

Take a look at the write.table() function. The expression data itself will be stored in assay(vsd)

ADD REPLY • link 5.1 years ago by Kevin Blighe 89k

0

Entering edit mode

Sir after running this step

# normalise
dds <- DESeq(dds)

How can I normalize and log-transform the data? Please help
rld <- rlogTransformation(dds) # for rlog transormation

norm_counts = counts(rld, normalized = TRUE) #for normalization

Is this step relevant?

ADD REPLY • link updated 5.1 years ago by GenoMax 152k • written 5.1 years ago by prithvi.mastermind ▴ 60

0

Entering edit mode

This is getting a bit beyond the scope of biostars. log2() is a basic R function, please get more background before doing analysis in R, and most importantly, please use online tutorials such as the DESeq2 manual to learn how to do RNA-seq analyssis and which steps are necessary and why. The idea of this forum is not spoon-feeding.

ADD REPLY • link 5.1 years ago by ATpoint 88k

0

Entering edit mode

I completely understand the idea of the respective forum sir, but i'm a complete beginner in R and trying to dig in the concepts. Help from your end would be a great motivation for me.

i framed this code after reading:

dds <- DESeqDataSetFromMatrix(countData = CC, colData = Col_data, design = ~ Condition)
dds = DESeq(dds)
keep <- rowSums(counts(dds)) >= 10 #keeping genes below having count above 10
dds <- dds[keep,]
dds = estimateSizeFactors(dds)
sizeFactors(dds)
vsd <- assay(vst(dds, blind=FALSE))

I have applied vst transormation. Could you please tell me if the vaues in vsd are normalized or not? or do we have to perform normalization.

ADD REPLY • link updated 5.1 years ago by Kevin Blighe 89k • written 5.1 years ago by prithvi.mastermind ▴ 60

0

Entering edit mode

In your above code, vsd will contain normalised and transformed expression levels - yes.

ADD REPLY • link 5.1 years ago by Kevin Blighe 89k

0

Entering edit mode

Thanks, Kevin. Before creating the DESeQ object i read my data like this.

Count_data <- read.table(file = "BRCA_DESeQ.txt", header = T, sep = "\t")
CC <- (Count_data[,2:1218])
Col_data = read.table(file = "COL.txt", header = T, sep = "\t")
dds = DESeqDataSetFromMatrix(countData = Count_data[,2:1218], colData = Col_data, design = ~ Condition)
dds = DESeq(dds)

where Col_data contains the sample information and Count_data contains expression data where the first column has the genes. I read the expression data without genes and assigned to variable CC. This was done because the samples in col data and count data must have the same dimensions

The vsd variable has normalized and log-transformed values without genes. Please help how can I get the genes with data also.

ADD REPLY • link updated 5.1 years ago by GenoMax 152k • written 5.1 years ago by prithvi.mastermind ▴ 60