Hi,
I am calculating RPKM/FPKM to make a heatmap of differentially expressed genes and have a few questions. I have of course done the differential expression analysis starting with raw counts. This is only about visualization.
Question 1: Should I use length normalization using gene length or effective length? I think I should use effective length, but cannot formulate for myself why this is. Why is it preferable to use effective length?
Question 2: I am using the following code to transform raw counts for visualization only:
data.set.RPKM <- rpkm(y, log=TRUE, prior.count=1, gene.length = y$genes$effective_length)
where y is a DGEList object.
Since I have paired-end reads, can I call this (log2) FPKM directly without doing any conversion?
Question 3: I have tried this visualization with CPM as well. Any reason to prefer one over the other?
Here is Lior Patcher's thoughts on FPKM: https://www.reddit.com/r/bioinformatics/comments/25yopp/lior_pachter_who_invented_the_fpkm_unit_for/