Question

Rnaseq Fpkm Quantile Normalization

0

Entering edit mode

11.4 years ago

J.F.Jiang ▴ 930

Hi all,

Here is the situation, i got the gene expression from the RNAseq, with the FPKM value.

However, for some genes, more than 50% sample do not have the value, that is the FPKM = 0.

For this kind of condition, how can we do the quantile normalization?

Thanks

rnaseq fpkm normalization • 9.8k views

ADD COMMENT • link updated 11.4 years ago by Damian Kao 16k • written 11.4 years ago by J.F.Jiang ▴ 930

0

Entering edit mode

Why do you want to do quantile normalization? Also: how many genes does this happen to?

ADD REPLY • link 11.4 years ago by Steve Lianoglou 5.2k

0

Entering edit mode

Just want to do the eQTL calculation, which need the quantile to make the expression distribution as normal.

However, some genes may happen to have many missing values across the sample, and therefore can not do the quantile normalization process.

My question is how to deal with such a situation, just remove those genes or any other method to find the solution.

ADD REPLY • link 11.4 years ago by J.F.Jiang ▴ 930

1

Entering edit mode

You could consider using a GLM approach such as edgeR for doing your QTL analysis.

ADD REPLY • link 11.4 years ago by Sean Davis 27k

0

Entering edit mode

thanks, will consider if the approach i used failed for calculation. Actually, are you the one from NCI? Looked familiar.

ADD REPLY • link 11.4 years ago by J.F.Jiang ▴ 930

score 1 · Answer 1 · 2013-06-27

What you are seeing are either 1) you don't have enough sequencing depth to resolve expression of lowly expressed genes or 2) there are actually that many genes that just aren't being expressed. How many reads do you have for the sample?

I would generate a rarification plot of increasing subsets of your reads vs number of genes with tags more than X reads. For example, a plot where you take 1,2,3,4,... million reads and see how many genes have more than 10 reads mapping for each increasing subset.

If you see a plateau, then you probably do have enough read coverage and what you are seeing is probably a biological effect. If no plateau, then you might not have enough read depth.

**edit I might have read your question incorrectly. Are you saying 50% of the genes in your sample have FPKM of 0 or one specific gene has FPKM of 0 in 50% of your samples?

score 0 · Answer 2 · 2013-06-27

0

Entering edit mode

11.4 years ago

Charles Warden 8.3k

I agree - FPKM (or RPKM) expression values are already normalized. Quantile normalization probably isn't necessary, and it is much more common for microarray analysis than RNA-Seq.

If you see a lot of 0 values, then they may already be rounded to a certain number of significant figures. This is actually somewhat good because genes with low coverage can show artificially high fold-change values (if you think about it, some reads are infinitely more than no reads). I would usually just add a value between 0.01 and 1, but rounding down to 0.0 or 0.00 is actually a similar idea.

ADD COMMENT • link 11.4 years ago by Charles Warden 8.3k

0

Entering edit mode

I figure out the normalization. Quantile normalization actually is really not necessary for RNAseq gene expression, however, it is quite important when I want to do the eQTL calculation.

So, my focus is how to deal with those transcripts having lots of missing FPKM values.

Thanks for your comment

ADD REPLY • link 11.4 years ago by J.F.Jiang ▴ 930