Hello!
Please tell me if my reasoning is correct and how can I do run the proper analysis using DESEq2.
I have two batches of RNA-Seq samples: (format: TYPE_condition_batch
)
WT_BASE_batch1
KO_BASE_batch1
WT_LEARN_batch2
KO_LEARN_batch2
I need to run DGE for WT_BASE
and WT_LEARN
and I don't care for KO in this case. However in order to remove batch effect I need to use KO in my model, right? Is my model below correct?
~batch + condition + condition:type
How can I get rid of the batch effect and only compare DEGs between Wild Types (WT)?
Thank you!
I have another question - I realised that I can actually split LEARN condition in two different conditions - NEW_room and FAMILIAR-room. In this case could I remove batch effect easier? Llike ~batch+condition?
I would have:
WT_BASE-batch1
WT_NEW_batch2
WT_FAMILIAR_batch2
I was also thinking about adding the third batch which is WT_BASE_batch3 but I am not sure if it helps much... In all the cases the "batch" is the time of sequencing (April, May, September) - all the technical parameters (tissue extraction, library construction and size, sequencing etc.) are the same so I suppose the effect should not be huge however removing it is probably necessary.
That's a bad experiment design. You have to make sure you are sequencing all of the different conditions together. I don't think it will have a lot of effect since the libraries were constructed together but couldn't be emphasized enough. If you'll search this forum you'll see multiple examples of this mistake. It has nothing to do with bioinformatics, just standard experiment design.
yeah, I know - the problem is I came after the experiments were performed and I am asked to analyse the data... So I am just trying to do it the best way possible.
Yes, that's the story of bioinformatics. Rule of thumb - get as many samples as you can for better dispersion estimates and ask the questions you want using contrasts.