Hi,
I have a couple of questions regarding my RNA-Seq experiment but I will start with a hopefully easy one.
I have three batch-effects in my design. Unfortunately, one of these effects contain two batches with only one sample. I found one post here, which recommended not to use such batch-effects (see: here in the next-to-last comment). My question is, is this right and can someone explain why (it makes sense but I can not explain why).
My second question also refers to the batch-topic. Let's assume I can use two of my batch-variables (these are factor-variables with at least 3 samples per batch), how do I define the design for DESeq2?
I read in the same post, referenced above, that one should put the variable of interest to the end of the design-formula like:
~batch1 + batch2 + condition
Again the question, whether this is correct?
Many thanks for your help. If this is clarified I will continue with my other questions.
Thanks!
Hi Devon, thanks for the quick reply. By the way, I'm referring to your comment. Additionally I also found the answer to question two, in the DESeq2-manual. I'm sorry. Regarding the number of samples in a batch: Can you clarify what you mean with removing? Removing from the whole analysis? But then I will lose important information, will I? In my experiment I compare two conditions with 6 biological replicates in each group. My idea was not to consider the related batch effect in the design.
Yes, remove it from the whole analysis. Such samples don't contribute anything to the analysis since they can only be used to calculate the batch effect, which you don't care about. If you don't include the batch in the model then certainly go ahead and include everything. Just have a look at some clustering to ensure that the batches don't have an appreciable effect.