Need help with design/workflow please?
1
1
Entering edit mode
4 weeks ago
BioinfGuru ★ 2.1k

Hi all,

I have produced count data for bulk rna-seq samples with the following groups: Trial 1 (high + low) and Trial 2 (high + low). There are at least 5 replicates in each of the 4 groups.

The main interest is differential expression between conditions (high and low), but we would also like to know the effect of trial, because trial 1 is a different cross breed to trial 2. My intention was to use deseq2 with the following design: ~trial + condition + trial:condition. I believe that would be correct. Then I can quantify the effect of trial on condition.

However, I am considering for simplicity, would it just be easier to run 2 separate analyses in deseq2, one for trial 1 and another for trial 2, with a simple design: ~condition. Then I could just compare the differentially expressed lists afterward. I would be unable to quantify the effect, but who cares, I would have the DEGs for each trial. It feels far more intuitive to me to do it this way to see the clear difference between trial 1 and trial 2, rather than having the added headache of what is basically a batch effect of interest.

Any advice would be appreciated,

Kenneth

deseq2 batch workflow • 372 views
ADD COMMENT
1
Entering edit mode

because trial 1 is a different cross breed to trial 2

I would expect a lot of biological variation if your parental lines are diverse, so, analyzing each one is better

ADD REPLY
0
Entering edit mode

Thanks JC.... just to play devils advocate...

What if instead of different parental lines, trial 1 was all male and trial 2 all female? How would it be any different? I would have thought that would produce the same amount of biological variation and is a common enough reason for using an interaction term in deseq2 . Why do we not separate those?

ADD REPLY
1
Entering edit mode
4 weeks ago

However, I am considering for simplicity, would it just be easier to run 2 separate analyses in deseq2, one for trial 1 and another for trial 2, with a simple design: ~condition. Then I could just compare the differentially expressed lists afterward. I would be unable to quantify the effect, but who cares, I would have the DEGs for each trial. It feels far more intuitive to me to do it this way to see the clear difference between trial 1 and trial 2, rather than having the added headache of what is basically a batch effect of interest.

I think I see what you mean there. However, by running separate models you make your analysis less efficient since the samples in the two trials don't share their information (this assuming that the variance in the two trials is about the same in the two, which I guess is a reasonable assumption).

Perhaps even more important, in my opinion running separate analyses looks simpler and intuitive but in fact it makes things more complicated. Intersecting gene lists requires you set some cutoffs for fdr which is kind of arbitrary and usually doesn't have a biological interpretation. I like to think that "differentially expressed genes" are a property of the experiment, not a biological property of the samples. If you double the sample size you will get may more DGEs even if the biology is the same. It's easier to run one model and then you have a more solid and quantitative answer to questions like "which genes show evidence to respond to condition after accounting for trial?", "which genes show interaction between condition and trial?"

ADD COMMENT
0
Entering edit mode

It's easier to run one model and then you have a more solid and quantitative answer to questions like "which genes show evidence to respond to condition after accounting for trial?", "which genes show interaction between condition and trial?"

Yeah, those are almost verbatim the original questions I wrote before starting, with one more, "which genes in Q1 + Q2 are tissue dependent?" (5 datasets, 1 for each tissue). Our hypothesis would be that trial does cause a difference but really we would need to test this... and I have no idea how to do that without looking at the interaction ... unless someone knows of some kind of magical statistical test (which of course will point me right back at DESeq2 and the interaction).

assuming that the variance in the two trials is about the same, which I guess is a reasonable assumption

JC made a reasonable point above, that because of different parental lines, the biological variability (variance) between the 2 trials may differ quite alot, so argued the 2 trials should be separated. any thoughts on that?

PS: It is concerning to me that I seem to agreeing with both sides of the argument.... decision tennis, clearly a lot more to learn.

ADD REPLY

Login before adding your answer.

Traffic: 1489 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6