I have some data divided in 3
conditions: A,B,C
. I would like to make a DE with only the first two condition, removing the samples with condition C
. A possible solution I found is to solve the problem at the root, so removing the samples directly on the row data I am reading, but can it be dangerous ? (sorting may be corrupted). On the other hand, if I subset the dds
object after the creation, I will have always three condition in my levels, since I created it considering all the samples, and some errors occurs.
What should I do ?
UPDATE: Using results()
with contrasts
parameter it's not an option for me, because for my understanding, Deseq()
estimates statistics considering all samples. So I thought that would be an incorrect way to do my analysis.
Why do you need to discard samples with condition C ? (biological, technical reasons ?)
I actually read that deseq2 performs estimates for single genes, so they suggest to create the dds with all the samples and then to subset later. I found the answer in the vignette faq.
Your issue is typically described here : https://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#if-i-have-multiple-groups-should-i-run-all-together-or-split-into-pairs-of-groups
You need to check the within-group variability of your samples in condition C by PCA. If the variability is huge then you should start with the reduced dataset, otherwise you can use contrasts.