Rnaseq Fpkm Quantile Normalization
2
0
Entering edit mode
11.4 years ago
J.F.Jiang ▴ 930

Hi all,

Here is the situation, i got the gene expression from the RNAseq, with the FPKM value.

However, for some genes, more than 50% sample do not have the value, that is the FPKM = 0.

For this kind of condition, how can we do the quantile normalization?

Thanks

rnaseq fpkm normalization • 9.9k views
ADD COMMENT
0
Entering edit mode

Why do you want to do quantile normalization? Also: how many genes does this happen to?

ADD REPLY
0
Entering edit mode

Just want to do the eQTL calculation, which need the quantile to make the expression distribution as normal.

However, some genes may happen to have many missing values across the sample, and therefore can not do the quantile normalization process.

My question is how to deal with such a situation, just remove those genes or any other method to find the solution.

ADD REPLY
1
Entering edit mode

You could consider using a GLM approach such as edgeR for doing your QTL analysis.

ADD REPLY
0
Entering edit mode

thanks, will consider if the approach i used failed for calculation. Actually, are you the one from NCI? Looked familiar.

ADD REPLY
1
Entering edit mode
11.4 years ago

What you are seeing are either 1) you don't have enough sequencing depth to resolve expression of lowly expressed genes or 2) there are actually that many genes that just aren't being expressed. How many reads do you have for the sample?

I would generate a rarification plot of increasing subsets of your reads vs number of genes with tags more than X reads. For example, a plot where you take 1,2,3,4,... million reads and see how many genes have more than 10 reads mapping for each increasing subset.

If you see a plateau, then you probably do have enough read coverage and what you are seeing is probably a biological effect. If no plateau, then you might not have enough read depth.

**edit I might have read your question incorrectly. Are you saying 50% of the genes in your sample have FPKM of 0 or one specific gene has FPKM of 0 in 50% of your samples?

ADD COMMENT
0
Entering edit mode

Thanks for your replying. Just some specific transcripts (~2000) have more than 50% missing FPKM

ADD REPLY
0
Entering edit mode
11.4 years ago

I agree - FPKM (or RPKM) expression values are already normalized. Quantile normalization probably isn't necessary, and it is much more common for microarray analysis than RNA-Seq.

If you see a lot of 0 values, then they may already be rounded to a certain number of significant figures. This is actually somewhat good because genes with low coverage can show artificially high fold-change values (if you think about it, some reads are infinitely more than no reads). I would usually just add a value between 0.01 and 1, but rounding down to 0.0 or 0.00 is actually a similar idea.

ADD COMMENT
0
Entering edit mode

I figure out the normalization. Quantile normalization actually is really not necessary for RNAseq gene expression, however, it is quite important when I want to do the eQTL calculation.

So, my focus is how to deal with those transcripts having lots of missing FPKM values.

Thanks for your comment

ADD REPLY

Login before adding your answer.

Traffic: 2192 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6