Question

Comparing two RNA-seq runs (each has enough replicates) in DESeq2

0

Entering edit mode

5 months ago

bibrgr • 0

Anyone have any advice for comparing the output of two RNA-seq runs? Basically, I'm comparing two conditions (control and treatment) from tumor samples. I have one run with 6 samples control/treatment and a second one that was done with 8 samples control/treatment. It's the same setup for both, but expecting some amount of batch effect not just because they're separate runs but because they're tumors which can be variable in their growth between experiments.

I don't necessarily need to combine them into one big experiment (is that even feasible?) but thinking about ways to present the data together. Should I make a Venn Diagram of hits from each run? Can I plot LFC vs. LFC, and would that even be helpful? Most of the analysis I already generated was with the 6-sample cohort, but since I also have this 8-sample cohort it seems a waste not to use the data. Looking for ideas on how to increase confidence in hits with the data and how to "show my work" with plots etc.

Edit: The combined analysis doesn't have to be "in" DESeq2 (that's what I used for each individual run), just looking for ideas.

DESeq2 • 784 views

ADD COMMENT • link updated 5 months ago by james.hawley ▴ 80 • written 5 months ago by bibrgr • 0

score 0 · Answer 1 · 2024-11-14

0

Entering edit mode

5 months ago

james.hawley ▴ 80

it sounds like your design matrix for this measurement looks like this:

Count ~ Condition + Run

Do you have any other factors that are worth including that may contribute to any batch effects? For example, do you have kidney cells in the first batch and lung cells in the second batch?

Splitting samples across multiple runs is common for larger datasets, so this is okay. You don't need to combine the counts together to smooth out batch effects. In my opinion, it is better to explicitly include the batch label, even if your individual measurements are well-designed (e.g. 3 control + 3 tumour in the first run and 4 control + 4 tumour in the second run).

You can combine all the counts together into a single DESeqDataSet object using the DESeqDataSetFromMatrix() function if you want to run DESeq and get results from the differential analysis. I would recommend doing that and specifying the design parameter, using the design formula above.

From there, you could test out batch effect adjustment/removal algorithms and use something like a principal component analysis (PCA) plot or relative log expression (RLE) plot to visually diagnose if you have any weird data points that can't be explained by the disease condition or sequencing run.

ADD COMMENT • link 5 months ago by james.hawley ▴ 80

0

Entering edit mode

Do you have any other factors that are worth including that may contribute to any batch effects? For example, do you have kidney cells in the first batch and lung cells in the second batch?

No, the experimental setup was very similar - same treatment, same type of cells, etc.

You can combine all the counts together into a single DESeqDataSet object

Would this be OK even if the sequencing depth was different for each run? The run was the same type but the total reads/sample was about 30% lower in the second run.

ADD REPLY • link 5 months ago by bibrgr • 0

0

Entering edit mode

Shouldn't matter, it'll normalize for sequencing depth and including the batch effect variable in your design should mitigate most other technical effects.

Slap 'em all together and look at the PCA. It'll tell you if there are real issues to try to deal with or not.

ADD REPLY • link 5 months ago by jared.andrews07 ★ 18k

0

Entering edit mode

Yeah, ideally this is the case where differences in total counts don't make too much of a difference. If you have more than 30 M reads per sample, then you _should_ have sufficient coverage in every sample to avoid additional problems.

Ideally, everything would have been sequenced at the same depth in the same lane, but practically that's often not what happens. Those diagnostic plots should help identify if there's still a problem after including the Run in the design matrix.

ADD REPLY • link 5 months ago by james.hawley ▴ 80