Question

Joint analysis of public RNA-seq samples coming from different extraction protocols

0

Entering edit mode

9.9 years ago

a.james ▴ 240

Hello All,

It would be really great if someone suggest me on the following topic.

I have samples coming from two different extracted protocols: mRNA and Total RNA extraction. The sequenced samples are from different cell lineages (public data).

At this point I have noticed the data from two different methods has variation and I wonder if the variations is an artifact or a biological difference.

Does have experience with this mixed sample set-up? In order to proceed with read quantification and further to do Differential expression analysis, what steps would be required to do beforehand

Thanks

python RNA-Seq sequencing R genome • 2.6k views

ADD COMMENT • link updated 2.9 years ago by Ram 45k • written 9.9 years ago by a.james ▴ 240

1

Entering edit mode

If the samples are different before proceeding for differential expression you should try to use batch effect to see whether all sample behaves in similar pattern. You can use EdgeR witl SVA or Combat tools to check for batch effect. More information..... https://support.bioconductor.org/p/64299/

ADD REPLY • link 9.9 years ago by EagleEye 7.6k

0

Entering edit mode

Ok,Thank you. I have tried it.

ADD REPLY • link updated 2.9 years ago by Ram 45k • written 9.9 years ago by a.james ▴ 240

0

Entering edit mode

Please mention whether this helped ??

ADD REPLY • link 9.9 years ago by EagleEye 7.6k

0

Entering edit mode

Sorry it didn't help. The issue is not batch effect to get rid of it as per the solution, still it's an open question

ADD REPLY • link updated 2.9 years ago by Ram 45k • written 9.9 years ago by a.james ▴ 240

0

Entering edit mode

Sorry it didn't help. The issue is not batch effect to get rid of it as per the solution, still it's an open question

ADD REPLY • link updated 2.9 years ago by Ram 45k • written 9.9 years ago by a.james ▴ 240

Ram · Accepted Answer · 2015-09-28

It is always difficult using public data as there are all sorts of confounders. I would never attempt to compare samples from two different protocols if one is the control and one is the test because you have not blocked one of you confounders. If you have some controls of each and some test of each you might be OK but the analysis will be very difficult and could well cost more than running a new experiment that is properly designed.

By mRNA I assume you mean poly A selection vs some sort of Total RNA extraction where the rRNA is otherwise depleted. But the principles are the same for anything. Here are some differences that will introduce slop into your quantifications:

The poly A extraction will pull out almost exclusively processed RNA. The total RNA protocol will pull out both processed and unprocessed (with and without introns), so you will have a big difference there from the start.

You will also introduce a length bias. If the poly A selected RNA is degraded at all you will lose reads from the start of your long transcripts because the 5' end will not longer be attached to Poly A tail. This will show up as differential expression compared to the total RNA which doesn't have this problem.

Depending on the method, the difference in GC content can be enormous. Here are two methods with the same data: http://michelebusby.tumblr.com/image/62718357939 Points represent the reads aligned to transcripts in each method. In the DSNLite method transcripts with high GC content were systematically lost.

Here is a paper we did on some of the issues you see when you do different methods, where that data was from.

http://www.nature.com/nmeth/journal/v10/n7/fig_tab/nmeth.2483_F6.html

If you do that chart where you highlight high and low GC genes you will see if you have GC artifact. Do the same with long and short transcripts and you will see if you have length bias. Then run everything through RNASeqQC and look at your intron/exon/intergenic rates.

Unless everything is clean, that would suggest substantial artifact and you can't prove much biology. You might be able to get it past a reviewer (people do) but you shouldn't.

You can, perhaps, use this for pilot data for a grant, though, if you convey that you controlled for these things and understood what you were looking at.