Question

Clustering/Hetamap of count RNAseq data

0

Entering edit mode

4.6 years ago

english.server ▴ 300

Dear all,

I think this is a pretty trivial question, but how should one normalize count RNAseq data for clustering/mds/etc? Will his suffice?

library (DESeq2)
cnts <- matrix(rnbinom(n=1000, mu=100, size=1/0.5), ncol=10)
normed= varianceStabilizingTransformation(cnts)
pheatmap::pheatmap(cor(normed))

RNA-Seq Normalization • 1.5k views

ADD COMMENT • link updated 4.6 years ago by ATpoint 89k • written 4.6 years ago by english.server ▴ 300

1

Entering edit mode

from DESeq2 help on rlog

The rlog transformation produces a similar variance stabilizing effect as varianceStabilizingTransformation, though rlog is more robust in the case when the size factors vary widely. The transformation is useful when checking for outliers or as input for machine learning techniques such as clustering or linear discriminant analysis.

library (DESeq2)
cnts <- matrix(rnbinom(n=1000, mu=100, size=1/0.5), ncol=10)
normed= rlog(cnts)
pheatmap::pheatmap(cor(normed))

ADD REPLY • link 4.6 years ago by english.server ▴ 300

score 3 · Accepted Answer · 2020-12-13

For heatmaps and other downstream such as PCA or any kind of classification/machine learning one commonly uses vst/rlog or something like the normalized counts on the log2-scale. For Pearson correlation (cor) it depends what you want to show. The linear cor changes obviously when you apply a log transformation as log scale is not linear (which is the whole point of logs). If you want to see how your samples compare in terms of a traditional Pearson correlation then I'd use the raw counts or the output of counts(dds,normalized=TRUE) which will give the same correlations as they are on the same linear scale and normalization by DESeq2 is just dividing the raw counts by a single factor.