Question

Can We Compare Two Different Rna-Seq Experiments?

7

Entering edit mode

11.7 years ago

k.nirmalraman ★ 1.1k

Hi,

I am trying to analyze two different RNA Seq experimental data ( Two different experiments, (different runs), same platform). I would like to normalize the data from both the experiments together, to gain some insights on cell type specific expression profile (for a preliminary evaluation).

In such a case, can some one tell me how can I do the normalization (any established methods?). Any directions on what are the possible challenges and any directions towards this approach would be of great help.

Thanks in advance!

rna-seq normalization • 17k views

ADD COMMENT • link updated 11.1 years ago by Mikael Huss 4.8k • written 11.7 years ago by k.nirmalraman ★ 1.1k

0

Entering edit mode

I am planning for something similar in my work. Would be helpful if you share your experiences.

ADD REPLY • link 5.6 years ago by Arindam Ghosh ▴ 530

Ram · Answer 1 · 2013-03-25

4

Entering edit mode

11.7 years ago

Damian Kao 16k

Here is a good paper that compares several different normalization methods:

ADD COMMENT • link updated 5.8 years ago by Ram 44k • written 11.7 years ago by Damian Kao 16k

0

Entering edit mode

Hi Damian,

Thanks for the link.. It is a very informative paper... Nevertheless, I was wondering would it be possible to normalize two different RNA-Seq experiments, so one can perform DE kind of analysis.

I understand this will lead to all possible limitations of a poor experiment design :( But this is only to arrive at some kind of candidate genes that can be validated...

ADD REPLY • link updated 5.8 years ago by Ram 44k • written 11.1 years ago by k.nirmalraman ★ 1.1k

Ram · Answer 2 · 2013-10-10

2

Entering edit mode

11.1 years ago

Mikael Huss 4.8k

Not sure I understand your question properly, but in case I do ...

Download FASTQ files for the two experiments
Map them in the same way (e g STAR)
Quantify in the same way (e g HTSeq)
Merge all the counts into a single table
Use some scaling normalization method (e g TMM) on everything
Use some DE package (e g limma) to call differentially expressed genes

Does that help ..?

ADD COMMENT • link updated 5.8 years ago by Ram 44k • written 11.1 years ago by Mikael Huss 4.8k

2

Entering edit mode

The recent update that the experiments were done in different runs throws a bit of a kink in that. It's often the case that different library prep or RNA extraction dates produce a batch effect. Since the cell types (or some other factor) are presumably partitioned by this batch, any DE calls will be confounded by this. I don't know of any great way to get around that sort of things without having at least one other batch of one of the cell types (or whatever) so that the batch effect might at least be estimated.

ADD REPLY • link updated 5.8 years ago by Ram 44k • written 11.1 years ago by Devon Ryan 104k

2

Entering edit mode

In that case (that the cell types [etc] are partitioned by batch) it's hard, yes. I didn't get the impression that that was necessarily the case, but if it is, then it is of course hard to get around the confounding. Apart from that, this recent paper found that reproducibility was good between labs and runs if you stick to the exact same library prep protocol. Even though there is some bias from the RNA extraction, it seems manageable. If the different labs had all done their own library preps, I think the results would have looked a lot different.

ADD REPLY • link updated 5.8 years ago by Ram 44k • written 11.1 years ago by Mikael Huss 4.8k

2

Entering edit mode

I agree but will add that as I understand it they only looked at the sequencing steps in that comparison. The cultures were grown up at the same lab and then a frozen pellet was shipped. There is also variability in the growth that causes batch effects so you would expect that they would have gotten poorer results if, e.g. the cell lines were grown at each site for a while.

I would be super cautious about designing an study where one experiment is the test and the other is the control. You don't know how much slop you have. That said, I would probably do it but only as a pilot or to support another better designed finding.

If you have the same library prep. The GC and length biases that you would get from different protocols would be super hard to sort out from biology. It is possible that it would take so long to do it properly that it would cost more than redoing the experiment with the data you really want.

Doing a badly designed experiment is never cheaper, though the up front costs make it appear so.

ADD REPLY • link updated 5.8 years ago by Ram 44k • written 11.1 years ago by Michele Busby ★ 2.2k