Question

Normalization in RNA-seq

1

Entering edit mode

9.2 years ago

Xin ▴ 70

Hi dear friends

I used TopHat and CUfflinks suite applications to find out differential expression.

I just followed the QC -> TopHat -> Cufflinks -> Cuffmerge -> Cuffdiff -> Cummerbund

Then my boss asked me: " Did you do normalization?! "; "Did you use any t-test or Bayesian or other tests to normalize the data?" I said No.

What should I normalize and why?

If any Cufflinks tools does normalization, what kind of test it uses?

RNA-Seq • 3.6k views

ADD COMMENT • link updated 2.4 years ago by Ram 44k • written 9.2 years ago by Xin ▴ 70

6

Entering edit mode

If you don't know the answer to "what should I normalize and why?" then you shouldn't be doing the analysis.

Also, I hope your boss didn't ask if you used a T test to normalize data. That'd be nonsensical.

ADD REPLY • link 9.2 years ago by Devon Ryan 105k

4

Entering edit mode

The question he should have asked is "Did you read the CuffDiff paper or just looked at the pipeline diagram?!"

ADD REPLY • link 9.2 years ago by Asaf 10k

2

Entering edit mode

Or perhaps the PI should also read the widely used cuffdiff paper.

ADD REPLY • link 9.2 years ago by Chirag Nepal ★ 2.4k

1

Entering edit mode

I cannot resist my comments on Istvan's views. There are increasing number of investigators who are getting confused with RNA seq analysis. I agree there are still lot of challenges in RNA-seq analysis but with a load of commercial companies which claim that their tool can do everything is creating more chaos. We have lot of problems I when a blind analysis will be carried out and then lab will invest lot of resources and come back with a statement these results looks weird or not working. Sometime if one sample is creating trouble the blind analysis will cause lots of problem.I think days are not too far when some of the studies may deem to be analyzed again. Having said all this I think we have to be reasonably accommodating with some simple questions in regard to RNAseq and encourage learners on this forum to go back and teach their PIs some good science.

ADD REPLY • link updated 5.1 years ago by Ram 44k • written 9.1 years ago by kanwarjag ★ 1.2k

0

Entering edit mode

This is a good point - we just need to make sure to articulate this correctly.

RNA-Seq is a field that is ripe with contradictions and lofty claims - time and again I am confused by what the RNA-Seq data shows, though we've ran hundreds of analyses. Some analyses work with not effort whatsoever - others are a jumbled mess, and the results are a mess, and the tool should recognize it and warn about it but they do not. They produce them p values like no ones' business. In each case by that time the experiment is over, tens of thousands of dollars and years of man power have been put into it. It is too simplistic to say - well you should not have done this or that or learned more about of this or that before you even started the work. But once you dig into this deeper it is less clear where responsibilities lie.

Bioinformatics publications always make a tool sound a lot better than what it actually is. Are biologists actually responsible of deciphering what percent of that paper is actually valid?

The real change in bioinformatics needs to come from us - where we devise and publish software and protocols that actually work reliably - not just kind a work if one jumps through all kinds of hoops. The first step of that change comes from us, when we say yeah it is awful that a tool can give you nonsensical results and is not your fault that it does not work as it claims to.

ADD REPLY • link updated 5.1 years ago by Ram 44k • written 9.1 years ago by Istvan Albert 102k

0

Entering edit mode

Hi,

You may want to have look at this video, to understand RNA-Seq normalisation and differential expression.

ADD REPLY • link 9.2 years ago by Thibault D. ▴ 700

0

Entering edit mode

To help yourself and the BioStars community, you should show us first what you found on pubmed/google/seqanswers/biostars/etc., searches about your question. Then you'll come up with "real" questions, e.g. why using RPKMs/TPMs/etc. Cheers.

ADD REPLY • link 9.2 years ago by Israel Barrantes ▴ 790

Ram · Answer 1 · 2015-11-08

I'd like to urge everyone to be nicer and gentler to newcomers. Be supportive and a little more generous. You have managed to hurt the feelings of someone that came here for help.

RNA-Seq is a very complicated topic. The concept of normalization has been changed and reworked many times over. Most of the concepts and explanations that one would find via google searches are only partially right and some have been proven to be incorrect.

As for the original poster: if you used the TopHat pipeline then it has automatically applied a whole slew of normalizations and statistical tests that the original authors thought were appropriate.