Hello everyone,
I am trying to give evidence for my regulation of gene A upon gene B in cDNA Chip (i.e. hgu133a) and RNA-seq (i.e. TCGA-RNA Hiseq) dataset. I already found a high correlation between the two gene's mRNA(say coeffient R for log2 transformed expresssion is at least 0.5 in all above dataset). My assumption is if A activated, then A/B ratio will be smaller across all samples within each dataset. Now the question is I did a ratio for chip data( probeset intensity) and it worked well for survival prediction, but for RPKM data, I only found the direct Δ between A and B readcounts predict well. So do I have reason to use Δ instead of ratio for RPKM data? Does anyone have relevant reference to recommand?
Thank you!
Well, I am thinking ratio might be a better way since normalizing RPKM or even TPM of gene A to gene B (both gene expression are obvious and the variations across samples are equal) may be able to give a more accurate evaluation of my prediction. Difference method seemed to be too crude in my question... I am using estimate count calculated from rsem-calculate-expression to calculate my ratio now and it worked well.
Thank you!
Did you mean 'not ideal'?
lol - yes, you already know I am somewhat against RPKM. Will modify.
An update (6th October 2018):
You should abandon RPKM / FPKM. They are not ideal where cross-sample differential expression analysis is your aim; indeed, they render samples incomparable via differential expression analysis:
Please read this: A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis
Also, by Harold Pimental: What the FPKM? A review of RNA-Seq expression units