To make difference or Ratio, it's a problem
1
0
Entering edit mode
6.6 years ago
Yijun Tian ▴ 20

Hello everyone,

I am trying to give evidence for my regulation of gene A upon gene B in cDNA Chip (i.e. hgu133a) and RNA-seq (i.e. TCGA-RNA Hiseq) dataset. I already found a high correlation between the two gene's mRNA(say coeffient R for log2 transformed expresssion is at least 0.5 in all above dataset). My assumption is if A activated, then A/B ratio will be smaller across all samples within each dataset. Now the question is I did a ratio for chip data( probeset intensity) and it worked well for survival prediction, but for RPKM data, I only found the direct Δ between A and B readcounts predict well. So do I have reason to use Δ instead of ratio for RPKM data? Does anyone have relevant reference to recommand?

Thank you!

RPKM Probeset rna-seq • 2.0k views
ADD COMMENT
1
Entering edit mode
6.2 years ago

Difficult to answer. All that I know is that RPKM data is ideal is not ideal - the normalisation method that produces RPKM expression values was one of the first forms of normalisation developed for RNA-seq but it has since been shown to be ineffective for cross-sample comparisons. Some have even questioned within-sample comparisons. With your HTseq counts, I would re-process these using DEseq2, EdgeR, or limma/voom.

Kevin

ADD COMMENT
1
Entering edit mode

Well, I am thinking ratio might be a better way since normalizing RPKM or even TPM of gene A to gene B (both gene expression are obvious and the variations across samples are equal) may be able to give a more accurate evaluation of my prediction. Difference method seemed to be too crude in my question... I am using estimate count calculated from rsem-calculate-expression to calculate my ratio now and it worked well.

Thank you!

ADD REPLY
0
Entering edit mode

Did you mean 'not ideal'?

ADD REPLY
0
Entering edit mode

lol - yes, you already know I am somewhat against RPKM. Will modify.

ADD REPLY
1
Entering edit mode

An update (6th October 2018):

You should abandon RPKM / FPKM. They are not ideal where cross-sample differential expression analysis is your aim; indeed, they render samples incomparable via differential expression analysis:

Please read this: A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis

The Total Count and RPKM [FPKM] normalization methods, both of which are still widely in use, are ineffective and should be definitively abandoned in the context of differential analysis.

Also, by Harold Pimental: What the FPKM? A review of RNA-Seq expression units

The first thing one should remember is that without between sample normalization (a topic for a later post), NONE of these units are comparable across experiments. This is a result of RNA-Seq being a relative measurement, not an absolute one.

ADD REPLY

Login before adding your answer.

Traffic: 1827 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6