My question is about the model formula used in differential expression analyses is applications such as DESeq2, edgeR, Sleuth etc.
I have a dataset which looks like so. There are more replicates but reduced here.
sample tissue family replicate condition
a11c a 1 1 c
a12c a 1 2 c
a21c a 2 1 c
a22c a 2 2 c
b11c b 1 1 c
b12c b 1 2 c
b21c b 2 1 c
b22c b 2 2 c
a11t a 1 1 t
a12t a 1 2 t
a21t a 2 1 t
a22t a 2 2 t
b11t b 1 1 t
b12t b 1 2 t
b21t b 2 1 t
b22t b 2 2 t
I have 2 tissues a and b for 2 treatments control and treated. And I also have families. I am not really interested in differentially expressed genes/transcripts (deg/det) between tissues. I am interested in deg/det between control and treated in both tissues. How is the correct way to create this model?
~tissue+condition
~tissue*condition
~tissue:condition
Since I am not that interested in degs between tissues, would it make sense to split the data into 2 datasets based on tissues and do it separately
subset(df,tissue=="a")
~condition
subset(df,tissue=="b")
~condition
Family is an additional variable that is not so critical nevertheless would be interesting to inspect. Can I just add that to the original model? Also, does the order matter?
~tissue+condition+family
Any other considerations for such analyses? Thanks.