GDC provides RNA-seq quantification in multiple forms:
For mRNA-Seq data, the GDC generates gene level and exon level quantification in Fragments Per Kilobase of transcript per Million mapped reads (FPKM). To facilitate cross-sample comparison and differential expression analysis, the GDC also provides Upper Quartile normalized FPKM (UQ-FPKM) values and raw mapping count.
I tried downloading both FPKM and FPKM-UQ data for TCGA-GBM dataset. The distributions of FPKM-UQ values look more comparable across samples than for FPKM values, which makes sense.
The sums for each sample of FPKM values range from 200k to 318k, so the highest sample has about 60% more. For FPKM-UQ, the sums range from 4x10^9 to 9x10^9, so the highest sample is more than double the lowest. UQ normalization actually increases that difference. Does that imply that the total number of transcripts is 2x more in some samples compared to others?
Hi Igor!
Did you provide a plausible answer about this issue of yours? I know it is longer but I am facing the same problem and I can not find any good source about this.
Cheers