Question

RNA Seq: Differential expression model formula

0

Entering edit mode

8.8 years ago

firestar ★ 1.7k

My question is about the model formula used in differential expression analyses is applications such as DESeq2, edgeR, Sleuth etc.

I have a dataset which looks like so. There are more replicates but reduced here.

sample tissue family replicate condition
a11c     a        1        1           c
a12c     a        1        2           c
a21c     a        2        1           c
a22c     a        2        2           c
b11c     b        1        1           c
b12c     b        1        2           c
b21c     b        2        1           c
b22c     b        2        2           c
a11t     a        1        1           t
a12t     a        1        2           t
a21t     a        2        1           t
a22t     a        2        2           t
b11t     b        1        1           t
b12t     b        1        2           t
b21t     b        2        1           t
b22t     b        2        2           t

I have 2 tissues a and b for 2 treatments control and treated. And I also have families. I am not really interested in differentially expressed genes/transcripts (deg/det) between tissues. I am interested in deg/det between control and treated in both tissues. How is the correct way to create this model?

~tissue+condition
~tissue*condition
~tissue:condition

Since I am not that interested in degs between tissues, would it make sense to split the data into 2 datasets based on tissues and do it separately

subset(df,tissue=="a")
~condition
subset(df,tissue=="b")
~condition

Family is an additional variable that is not so critical nevertheless would be interesting to inspect. Can I just add that to the original model? Also, does the order matter?

~tissue+condition+family

Any other considerations for such analyses? Thanks.

RNA-Seq R DESeq edgeR • 2.3k views

ADD COMMENT • link updated 5.1 years ago by Biostar 20 • written 8.8 years ago by firestar ★ 1.7k

score 3 · Accepted Answer · 2016-06-22

~tissue*condition, since while you may not care about things like the tissue effect, it'll still be there.

Regarding splitting, while you can do that, you'll have decreased power (there won't be as much variance shrinkage), so I would suggest that you keep everything in.

You can certainly add family in to any of the designs as you showed. If you do that, please ensure that family is a factor. I don't think it'll cause a problem as is for your current experiment, but if you have more than two families and don't ensure that that's a factor then you'll get some messed up results.