Question

Need help with design/workflow please?

1

Entering edit mode

9 months ago

BioinfGuru ★ 2.1k

Hi all,

I have produced count data for bulk rna-seq samples with the following groups: Trial 1 (high + low) and Trial 2 (high + low). There are at least 5 replicates in each of the 4 groups.

The main interest is differential expression between conditions (high and low), but we would also like to know the effect of trial, because trial 1 is a different cross breed to trial 2. My intention was to use deseq2 with the following design: ~trial + condition + trial:condition. I believe that would be correct. Then I can quantify the effect of trial on condition.

However, I am considering for simplicity, would it just be easier to run 2 separate analyses in deseq2, one for trial 1 and another for trial 2, with a simple design: ~condition. Then I could just compare the differentially expressed lists afterward. I would be unable to quantify the effect, but who cares, I would have the DEGs for each trial. It feels far more intuitive to me to do it this way to see the clear difference between trial 1 and trial 2, rather than having the added headache of what is basically a batch effect of interest.

Any advice would be appreciated,

Kenneth

deseq2 batch workflow • 913 views

ADD COMMENT • link 9 months ago by BioinfGuru ★ 2.1k

1

Entering edit mode

because trial 1 is a different cross breed to trial 2

I would expect a lot of biological variation if your parental lines are diverse, so, analyzing each one is better

ADD REPLY • link 9 months ago by JC 13k

0

Entering edit mode

Thanks JC.... just to play devils advocate...

What if instead of different parental lines, trial 1 was all male and trial 2 all female? How would it be any different? I would have thought that would produce the same amount of biological variation and is a common enough reason for using an interaction term in deseq2 . Why do we not separate those?

ADD REPLY • link 9 months ago by BioinfGuru ★ 2.1k

score 1 · Answer 1 · 2024-08-14

However, I am considering for simplicity, would it just be easier to run 2 separate analyses in deseq2, one for trial 1 and another for trial 2, with a simple design: ~condition. Then I could just compare the differentially expressed lists afterward. I would be unable to quantify the effect, but who cares, I would have the DEGs for each trial. It feels far more intuitive to me to do it this way to see the clear difference between trial 1 and trial 2, rather than having the added headache of what is basically a batch effect of interest.

I think I see what you mean there. However, by running separate models you make your analysis less efficient since the samples in the two trials don't share their information (this assuming that the variance in the two trials is about the same in the two, which I guess is a reasonable assumption).

Perhaps even more important, in my opinion running separate analyses looks simpler and intuitive but in fact it makes things more complicated. Intersecting gene lists requires you set some cutoffs for fdr which is kind of arbitrary and usually doesn't have a biological interpretation. I like to think that "differentially expressed genes" are a property of the experiment, not a biological property of the samples. If you double the sample size you will get may more DGEs even if the biology is the same. It's easier to run one model and then you have a more solid and quantitative answer to questions like "which genes show evidence to respond to condition after accounting for trial?", "which genes show interaction between condition and trial?"