Entering edit mode
3.3 years ago
randalljellis
▴
90
I am using single-cell RNA-seq data from the Allen Institute, and I want to look at gene co-expression in different cell populations. They provide raw UMI counts, so I'm wondering what normalization method to use (e.g., CPM, TPM, VST) to look at these correlations. Any rationale/justification is appreciated.
Using scTransform is not a bad idea; it normalizes for sequencing depth and does a VST transform.
You can simply take the log of the CPMs -- but there are some problems with it (see the scTransform paper).
I wouldn't use TPMs -- UMIs generally shouldn't exhibit length biases (i.e. longer genes = more counts) that require TPM correction.
The authors of Seurat now recommend not to use the SCTransformed normalized counts outside of integration and dimension reduction. Instead they recommend using
NormalizeData
.Thank you. I have another question. If I want to compare correlations between populations (Ex. Compare the correlation of Gene1 and Gene2 in Population 1 with G1/G2 in P2), should I normalize each population separately, or together?
If they're cell populations from the same sequenced sample, I'd normalize them together (see https://github.com/ChristophH/sctransform/issues/55 )