Hi all
I have a question with regards to comparison between RNA seq datasets that have been analyzed using different techniques (in this case FPKM-UQ vs TPM). For reference, the data is cBioportal harvested TCGA data (FPKM-UQ) and GTEx TPMs downloaded from the GTEx portal.
I am not comparing counts between datasets, I am comparing the counts for gene A vs gene X within an individual patient. Basically obtaining a ratio of gene counts. It is this ratio between gene A:gene X that I have been using to compare between datasets.
My question is... Is it reasonable to predict that, in most part, the ratio between gene A and gene X will remain the same (or thereabouts) regardless of whether the counts are FPKM-UQ or TPM?
PS I am aware that it is ideal to re-analyze raw counts using an appropriate between-sample normalization protocol, but this technique allows me to obtain preliminary data from many datasets quickly and identify studies of interest for further in depth analysis.
This doesn't quite answer the question I am asking... I understand that the normalization techniques differ.
The question is, would the ratio between gene X and gene Y significantly differ if the raw counts were processed to give us FPMK-UQ, FPKM, TPM or RSEM.
The literal answer to 'would the ratio differ' is quite possibly. Thus my above that quantile normalized data shouldn't be compared in this way.
Quantile normalization is not the same as the other normalization concepts of TPM/FPKM/RPKM. And even for those it is not recommended to compare across types. You can see a quick explanation of QN here. In the example grids, if you were to compare
Row A/Row B
ratio asgeneA/geneB
you get different ratios from the un-QN data: