Hi,
I am not sure if this is the correct method of correcting batch effect when the batch effect is seen with bulk RNA-seq data from three experimental groups. The three experimental groups are: knockout1, knockout2 and the control.
Based on some other posts, I generated a column for the batch in the metadata dataframe:
sample | condition | batch
sample_knockout_1_n1. | knockout1 | n1
sample_knockout_1_n2 | knockout1 | n2
sample_knockout_1_n3. | knockout1 | n3
sample_knockout_2_n1. | knockout2 | n1
sample_knockout_2_n2 | knockout2| n2
sample_knockout_2_n3. | knockout2 | n3
sample_control_n1 | control | n1
sample_control_n2 | control | n2
sample_control_n3 | control | n3
Then, I just accounted for the batch in the design parameter here:
dds <- DESeqDataSetFromTximport(txi, colData = metadata, design = ~ batch + condition)
If someone can please explain what this type of batch correction is doing when the results are being analyzed via DEseq2. And if there are any other steps that I need to do besides adjust the design parameter. Thank you.
Please use the formatting bar (especially the
code
option) to present your post better. You can use backticks for inline code (`text` becomestext
), or use one of (a) the option highlighted in the image below/ (b) fenced code blocks for multi-line code. Fenced code blocks are useful in syntax highlighting. If your code has long lines with a single command, break those lines into multiple lines with proper escape sequences so they're easier to read and still run when copy-pasted. I've done it for you this time.Are there any reading materials on the internal method that batch correction is being done by DESeq? I wanted to ensure that I understand it to ensure the results are valid when I specify the batch in the design parameter. Thank you.
You should really read about linear modelling and how the model can be adjusted to account for batch effects. This is not something specific to DESeq2.
If available, I'd recommend finding a local statistician or expert to explain linear models to you. At minimum, consider watching the statquest video on design matrices.
Ok great! Thank you for providing some guidance on what topics I should focus on learning.