Question on batch effect for a bulk RNA seq analysis
1
0
Entering edit mode
8 days ago
akb • 0

Hi,

I am not sure if this is the correct method of correcting batch effect when the batch effect is seen with bulk RNA-seq data from three experimental groups. The three experimental groups are: knockout1, knockout2 and the control.

Based on some other posts, I generated a column for the batch in the metadata dataframe:

sample                              | condition | batch
sample_knockout_1_n1.   | knockout1 | n1
sample_knockout_1_n2    | knockout1 | n2
sample_knockout_1_n3.   | knockout1 | n3
sample_knockout_2_n1.   | knockout2 | n1
sample_knockout_2_n2    | knockout2| n2
sample_knockout_2_n3.   | knockout2 | n3
sample_control_n1           | control       | n1
sample_control_n2           | control       | n2
sample_control_n3           | control       | n3

Then, I just accounted for the batch in the design parameter here:

dds <- DESeqDataSetFromTximport(txi, colData = metadata, design = ~ batch + condition)

If someone can please explain what this type of batch correction is doing when the results are being analyzed via DEseq2. And if there are any other steps that I need to do besides adjust the design parameter. Thank you.

RNA-seq batch-effect • 513 views
ADD COMMENT
0
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or use one of (a) the option highlighted in the image below/ (b) fenced code blocks for multi-line code. Fenced code blocks are useful in syntax highlighting. If your code has long lines with a single command, break those lines into multiple lines with proper escape sequences so they're easier to read and still run when copy-pasted. I've done it for you this time.
code_formatting

ADD REPLY
0
Entering edit mode

Are there any reading materials on the internal method that batch correction is being done by DESeq? I wanted to ensure that I understand it to ensure the results are valid when I specify the batch in the design parameter. Thank you.

ADD REPLY
0
Entering edit mode

You should really read about linear modelling and how the model can be adjusted to account for batch effects. This is not something specific to DESeq2.

If available, I'd recommend finding a local statistician or expert to explain linear models to you. At minimum, consider watching the statquest video on design matrices.

ADD REPLY
0
Entering edit mode

Ok great! Thank you for providing some guidance on what topics I should focus on learning.

ADD REPLY
0
Entering edit mode
8 days ago
ATpoint 85k

This is a good experimental design since you have each condition in each batch. What this is doing is to internally correct for potential per-gene baseline differences between batches. No, for DE analysis including into the design is sufficient. For visualization you could first regress the batch from the log2-scale counts (e.g. the output of vst()) using removeBatchEffect from limma, see also DESeq2 vignette.

ADD COMMENT
0
Entering edit mode

Thank you, I have not tried the vst() method, but I did use removeBatchEffect when plotting the PCA and it did show correction of the batch, but not sure if I can trust these results.

Also, just want to confirm that if I wanted to do a differential gene expression analysis between an individual replicate sample_knockout_1_n1 and control_n1, I don't think it can be done with DEseq2. What else method can I use to do this individual replicate comparison?

ADD REPLY
0
Entering edit mode

In short, you shouldn't really. Any results from a one to one comparison will be very unreliable. Why do you want to do this?

ADD REPLY
0
Entering edit mode

Ok, good to know. Both DESeq and edgeR manuals advise against doing the one to one comparison as well. I think the reasoning was to see the genes expressed in each replicate, but we are not doing this comparison anymore since the results are not reliable.

ADD REPLY

Login before adding your answer.

Traffic: 2019 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6