What you are seeing are either 1) you don't have enough sequencing depth to resolve expression of lowly expressed genes or 2) there are actually that many genes that just aren't being expressed. How many reads do you have for the sample?
I would generate a rarification plot of increasing subsets of your reads vs number of genes with tags more than X reads. For example, a plot where you take 1,2,3,4,... million reads and see how many genes have more than 10 reads mapping for each increasing subset.
If you see a plateau, then you probably do have enough read coverage and what you are seeing is probably a biological effect. If no plateau, then you might not have enough read depth.
**edit
I might have read your question incorrectly. Are you saying 50% of the genes in your sample have FPKM of 0 or one specific gene has FPKM of 0 in 50% of your samples?
I agree - FPKM (or RPKM) expression values are already normalized. Quantile normalization probably isn't necessary, and it is much more common for microarray analysis than RNA-Seq.
If you see a lot of 0 values, then they may already be rounded to a certain number of significant figures. This is actually somewhat good because genes with low coverage can show artificially high fold-change values (if you think about it, some reads are infinitely more than no reads). I would usually just add a value between 0.01 and 1, but rounding down to 0.0 or 0.00 is actually a similar idea.
I figure out the normalization. Quantile normalization actually is really not necessary for RNAseq gene expression, however, it is quite important when I want to do the eQTL calculation.
So, my focus is how to deal with those transcripts having lots of missing FPKM values.
Why do you want to do quantile normalization? Also: how many genes does this happen to?
Just want to do the eQTL calculation, which need the quantile to make the expression distribution as normal.
However, some genes may happen to have many missing values across the sample, and therefore can not do the quantile normalization process.
My question is how to deal with such a situation, just remove those genes or any other method to find the solution.
You could consider using a GLM approach such as edgeR for doing your QTL analysis.
thanks, will consider if the approach i used failed for calculation. Actually, are you the one from NCI? Looked familiar.