RNA-seq heatmaps: effective length, RPKM vs. CPM
1
0
Entering edit mode
8 weeks ago
bioinfo2345 ▴ 40

Hi,

I am calculating RPKM/FPKM to make a heatmap of differentially expressed genes and have a few questions. I have of course done the differential expression analysis starting with raw counts. This is only about visualization.

Question 1: Should I use length normalization using gene length or effective length? I think I should use effective length, but cannot formulate for myself why this is. Why is it preferable to use effective length?

Question 2: I am using the following code to transform raw counts for visualization only:

data.set.RPKM <- rpkm(y, log=TRUE, prior.count=1, gene.length = y$genes$effective_length)

where y is a DGEList object.

Since I have paired-end reads, can I call this (log2) FPKM directly without doing any conversion?

Question 3: I have tried this visualization with CPM as well. Any reason to prefer one over the other?

edgeR RNA-seq • 311 views
ADD COMMENT
1
Entering edit mode
ADD REPLY
2
Entering edit mode
8 weeks ago
ATpoint 86k

A heatmap that aims to emphasize differences between samples is usually transformed to Z-score first, and since the Z-score is done across all samples of the same gene the length correction does not matter. I think the difference will be neglectible. I always use CPM since this is what could also be used with testing frameworks such as limma-trend, so for my standard workflows this ensures consistency.

That having said, since the rpkm function from edgeR also uses its normalization factors (which robustifies the per-million normalization that naive rpkm does, given that calcNormFactors has been run) it is just as fine I think.

ADD COMMENT

Login before adding your answer.

Traffic: 2210 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6