Question

Strategies for merging RNAseq datasets

1

Entering edit mode

5.3 years ago

Nico80 ▴ 80

I have two RNAseq experiments that were run on the same type of samples, treated in two different ways and sampled over time.

So, experiment 1 has experimental groups:

Control (untreated) - Treatment A - 1 week recovery from treatment A - 4 weeks recovery from treatment A

Experiment 2 is

Control (untreated) - Treatment B - 1 week recovery from treatment B - 4 weeks recovery from treatment B

RNAseq was performed at different times, but using exactly the same protocol.

I have used DESeq2 to analyse the two experiments separately, but I now would like to merge and compare the two, but I am not sure what the best solution is.

I have tried to feed all of the aligned bam files to DESeq2, but if I then run PCA on the samples, they cluster by experiment and, most annoyingly, the controls do not overlap.

Any suggestions for how to proceed would be greatly appreciated. I am thinking maybe of something on the lines of what the Seurat package uses for integration of multiple scRNAseq datasets, but I am not sure whether that could be applied to my situation.

RNA-Seq R • 5.1k views

ADD COMMENT • link updated 5.3 years ago by Asaf 10k • written 5.3 years ago by Nico80 ▴ 80

score 1 · Answer 1 · 2020-05-20

1

Entering edit mode

5.3 years ago

Asaf 10k

I think of two options here, if you can assume that the controls are more or less similar in gene levels you can introduce a batch effect in the model and correct for that this way. If you think that too much is different then there is no basis for normalizing the two batches together, in that case you can go to the meta-analysis path

ADD COMMENT • link 5.3 years ago by Asaf 10k

0

Entering edit mode

Thank you, Asaf. In theory, the controls should be very similar (same tissue from mice of the same strain/sex/age etc). I will try adding a batch effect in the model, although I don't know whether there is a way of putting that as a random effect in DESeq2 (I seem to recall this not being an option)?

I am not familiar with meta-analysis, any suggested resources?

ADD REPLY • link 5.3 years ago by Nico80 ▴ 80

0

Entering edit mode

It's a fixed effect, not random effect. You add this to the metadata and add this column to the model formula. I don't have enough experience with DE meta-analysis but I think any method to combine p-values should work here.

ADD REPLY • link 5.3 years ago by Asaf 10k

1

Entering edit mode

Yes, I know how to add those in DESeq2, thank you. What I meant with the random effect comment, is that as far as I am aware, DESeq2 only allows for fixed effects, while batch is really a random effect in this type of design.

ADD REPLY • link 5.3 years ago by Nico80 ▴ 80

0

Entering edit mode

It is a random effect but I think that since you have samples in all of the batches you will observe then you can treat it as a fixed effect. (I'm not a statistician)

ADD REPLY • link 5.3 years ago by Asaf 10k

score 1 · Answer 2 · 2020-05-20

From what I understand (not being an expert!) the power of the single-cell integration frameworks comes from the large number of datapoints (=cells), which is not applicable for bulk data. Also, the resulting integrated values are not suitable for anything except clustering because the data transformations creates dependencies (which is not compatible with any DE method) and notably changes magnitude and even directions of fold changes. From what I understand it is really only useful to create a unified clustering landscape. For anything else one would go back to the unintegrated values.

That having said...why don't you try to perform a meta-analysis? You could do rank-based metaanalysis, see here and the corresponding R package to see if the significant genes between experiments are consistent in rank-space. Alternatively, you could combine p-values using something like Fisher's method given that sample sizes between experiments are comparable.