normalization RNA-seq data between different types of libraries
3
5
Entering edit mode
10.4 years ago
ktian.whu ▴ 50

Dear members,

I have several RNA-seq data sets from published articles. We think integration of them may be interesting. But they come from different types of libraries (single-end/paired-end, PolyA+/rRNA depletion). How can I normalize them and in this way, compare their expression directly?

I read an article using different types of data just normalizing them by RPKM, is this a good method?

rpkm RNA-Seq fpkm • 6.5k views
ADD COMMENT
0
Entering edit mode

I really wonder why these many RPKM/FPKM questions come up lately.

ADD REPLY
0
Entering edit mode

Any update on this work? I am planning for something similar.

ADD REPLY
2
Entering edit mode
10.4 years ago
Michael 55k

FPKM/RPKM is not a normalization method. It is a unit, meant for representing some sort of molar concentration of transcripts.

In my experience, if you have very different libraries, FPKM will confound biological comparability instead of making it more comparable. Calculating FPKM (imho) does not remove technical bias, but adds an unknown proportionality factor to each library, which is based on library composition. Therefore if you have very different protocols it will not help to length-scale abundances. Problems of using FPKM have been stated in several posts, e.g.:

Some more problems:

  • FPKM/RPKM has never been peer-reviewed, it has been introduced as an ad-hoc measure in a supplementary
  • One of the authors of this paper states, that it should not be used because of faulty arithmetic
  • All reviews so far have shown it to be an inferior scale for DE analysis of genes
  • Length normalization is mostly dispensable imo in DE analysis because gene length is constant

Using FPKM scaled values for your use-case might be the worst possible application.

ADD COMMENT
0
Entering edit mode

Would it be right if we quantile normalize the dataframe containing FPKM values of genes obtained from Cufflinks, where samples in dataframe had different types of libraries?

ADD REPLY
0
Entering edit mode

From my data, it doesn't seem that quantile normalization cures it. Of course this is a very limited view. I can only say that observations on our data correlate with the published concerns, and we seem to exactly observe the library dependent bias that was mentioned. In principle all points I made above apply even more for quantile normalized FPKM (not peer reviewed combination of two methods, etc.).

ADD REPLY
0
Entering edit mode

Thanks for sharing this. It helps.

ADD REPLY
1
Entering edit mode
10.4 years ago
Ming Tommy Tang ★ 4.5k

If you want to get differentially expressed genes, you need to control the library type as a factor when you use DESeq or limma.

You may also use sva to remove batch effect

ADD COMMENT
0
Entering edit mode
10.4 years ago
Floris Brenk ★ 1.0k

When combining these datasets please take extreme caution with results interpretation and design your analysis in such a way that you exclude bias effect.

For more information about the biases in library prep read this:

Genome Biol. 2014 Jun 30;15(6):R86. [Epub ahead of print] IVT-seq reveals extreme bias in RNA-sequencing.

Lahens NF, Kavakli IH, Zhang R, Hayer K, Black MB, Dueck H, Pizarro A, Kim J, Irizarry R, Thomas RS, Grant GR, Hogenesch JB.

ADD COMMENT

Login before adding your answer.

Traffic: 2453 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6