Question

Should I merge my technical replicates for bulk RNA seq?

0

Entering edit mode

2.1 years ago

bioinfo ▴ 150

Hello,

I have samples from 5 patients. From each patient I have 3 samples. I did bulk RNA seq analysis on the samples. I used DESeq2 for DEG analysis. I would just like to confirm that I have done it correctly.

I have not merged the 3 samples from each patient. Instead I kept the separate and when i did the DESeq2 analysis I did it like shown below.

Group 1                                   Group 2
Patient1_Sample_1                All other samples
Patient1_Sample_2    
Patient1_Sample_2

Does that seem ok? Would it be better to merge the 3 samples from each patient and then run DESeq2?

Thank you

RNA-seq DESeq2 • 2.6k views

ADD COMMENT • link updated 21 months ago by Ram 44k • written 2.1 years ago by bioinfo ▴ 150

2

Entering edit mode

Have you performed any sort of exploratory analysis (such as PCA) to assess variation among your technical replicates?

In case your technical replicates (samples coming from a single patient) show no relevant variation, you can simply use DESeq2 function collapseReplicates.

ADD REPLY • link 2.1 years ago by Marco Pannone ▴ 810

0

Entering edit mode

I agree with Marco Pannone. You need to something like PCA to assess variation among your technical replicates before deciding whether or not to collapse them. Why do you have technical replicates? And if your exploratory analysis does show that there is siginficant variation among technical replicates for a single patient, you'll need to understand why (i.e., was it batch effect, a different library kit, different sequencer, etc).

ADD REPLY • link 2.1 years ago by snowpin • 0

score 5 · Accepted Answer · 2022-10-23

Generally, in RNAseq, we collapse technical replicates. However, by "technical replicate", we generally mean multiple libraries built from the same biological sample. In bulk RNAseq we use replicates to estimate the dispersion. The dispersion between repeated samplings from the same library is approximately zero (unless something has gone wrong), and the variance between them is well estimated by the poisson distribution.

Its not clear from your description whether what you have is repeated samplings from the same biological sample, or repeated biological samplings from the same patient. Thus, its not clear if what you describe as technical replicates are what the bulk-RNAseq analysis world refers to as technical replicates. One might investigate this by plotting the mean vs the variance for genes in each patient. If it seems that mean = variance (approximately), then you should regard them as technical replicates and collapse them. If variance appears to be more than the mean, then you shouldn't.

However, if you don't regard them as technical replicates and collapse them, then you can't just use DESeq2 to do the analysis. You have two sources of variance - patient to patient variation and sample to sample variation, and the samples are nested within the patients. You effectively have a nested, or mixed effects model, which DESeq2 isn't really designed to handle.

You might look to limma-voom, which can model the random effect replicates using its duplicateCorrelation function.

A word of warning. If you are doing a DE of one patient against all others, because that patient has something special about them (e.g. they have the disease, while the others don't). What you will get is genes that are DE in that patient, not genes that are DE between disease and normal, as you have no way to disentangle patient effects from condition effects. Furthermore, because you only have one replicate for condition A on the patient level (patient 1), you will be making the assumption that variance on the patient level is the same in condition A and condition B, and that only the means have changed.