Question

Differences between FPKM and FPKM-UQ files in gene expression analysis

8

Entering edit mode

8.2 years ago

alcs417 ▴ 100

Hi guys, I am planning to perform a pan-cancer gene expression analysis across several cancer types. However, I found that the TCGA data portal has been replaced by GDC. After carefully checking the harmonized data in GDC, I am now wondering which file I should use for gene expression analysis, FPKM or FPKM-UQ? What are the differences between the two file types? Previously, I used the files with suffix "rsem.genes.normalized_results" to perform the gene expression analysis. Is FPKM the same as the "*.rsem.genes.normalized_results" file? If so, when shall we use FPKM-UQ? Any help would be really appreciated. Thanks

TCGA; RNA-seq; FPKM; FPKM-UQ • 22k views

ADD COMMENT • link updated 2.4 years ago by LayneSadler ▴ 90 • written 8.2 years ago by alcs417 ▴ 100

1

Entering edit mode

FPKM and UQ-FPKM are calculated by GDC just for legacy reason, because ppl used to use FPKM data, and UQ provides a method for normalization. However, for any serious analysis, using count data with DESeq/EdgeR are encouraged.

ADD REPLY • link 7.2 years ago by Zhenyu Zhang ★ 1.2k

0

Entering edit mode

Just to add, FPKM can still be really useful. As it normalizes reads to correct for transcript size, it can be useful to correct across potential differences in input RNA quality. Just make sure you know what you are working with. If you are doing some heavy CPM cutoff, obviously FPKM will drastically alter your results in a negative sense. Zhenyu is not wrong, but I felt it worth adding the caveat.

ADD REPLY • link 7.0 years ago by SpaceMenEatSpacePlants • 0

1

Entering edit mode

An update (6th October 2018):

You should abandon RPKM / FPKM. They are not ideal where cross-sample differential expression analysis is your aim; indeed, they render samples incomparable via differential expression analysis:

Please read this: A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis

The Total Count and RPKM [FPKM] normalization methods, both of which are still widely in use, are ineffective and should be definitively abandoned in the context of differential analysis.

Also, by Harold Pimental: What the FPKM? A review of RNA-Seq expression units

The first thing one should remember is that without between sample normalization (a topic for a later post), NONE of these units are comparable across experiments. This is a result of RNA-Seq being a relative measurement, not an absolute one.

ADD REPLY • link 5.8 years ago by Kevin Blighe 88k

0

Entering edit mode

shouldn't the quantile normalization make FPKM-UQ be suitable for cross-sample comparison?

https://pubmed.ncbi.nlm.nih.gov/29112707/

here they tried different normalization methods including FPKM-UQ without finding big differences

ADD REPLY • link 4.2 years ago by demoraesdiogo2017 ▴ 110

1

Entering edit mode

...'they' == a single author? The bias in FPKM expression units exists from the very beginning when these units are created. No further transformation can then mitigate this bias without first reverse-engineering these to raw counts.

ADD REPLY • link 4.2 years ago by Kevin Blighe 88k

0

Entering edit mode

They compare methods they apply to UQ-normalized files, they do not compare methods operating on raw counts to produce the normalized counts in the first place.

ADD REPLY • link 4.2 years ago by ATpoint 85k

score 3 · Answer 1 · 2016-10-07

3

Entering edit mode

8.1 years ago

joshualevipayne ▴ 70

This link might also be helpful:

https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline/

ADD COMMENT • link 8.1 years ago by joshualevipayne ▴ 70

1

Entering edit mode

To add the actual answer: "The upper quartile FPKM (FPKM-UQ) is a modified FPKM calculation in which the total protein-coding read count is replaced by the 75th percentile read count value for the sample."

ADD REPLY • link 6.8 years ago by Michael Schubert ★ 7.1k

0

Entering edit mode

the link is dead

ADD REPLY • link 2.4 years ago by LayneSadler ▴ 90

score 0 · Answer 2 · 2016-09-25

RPKM (reads per kilobase per million mapped reads)

Upper Quantile (UQ)

See this link:

http://qian.human.cornell.edu/Files/nmeth.3208.pdf

and this paragraph inside:

"Quantification of Ribo-seq and QTI-seq. Reads per kilobase per million reads (RPKM) value was calculated to quantify the ribosome occupancy of mRNA for CHX profiling (ref 20). A window centering the predicted TIS codon (−1, +4) was summarized to represent the abundance of translation initiation signal. To facili tate the comparison between different experimental conditions, we applied upper quartile (UQ) normalization to each predicted TIS codon on the basis of the population of total QTI-seq read count of each individual mRNA (ref 35). The fold changes of translational signal between two experimental conditions for both LTM and CHX profiling data were normalized by fold changes of RNA- seq FPKM values of the corresponding mRNAs".

"In statistics and the theory of probability, quantiles are cutpoints dividing the range of a probability distribution into contiguous intervals with equal probabilities, or dividing the observations in a sample in the same way. There is one less quantile than the number of groups created. Thus quartiles are the three cut points that will divide a dataset into four equal-size groups (cf. depicted example). Common quantiles have special names: for instance quartile, decile (creating 10 groups: see below for more). The groups created are termed halves, thirds, quarters, etc., though sometimes the terms for the quantile are used for the groups created, rather than for the cut points." WIKI