Question

normalization in single cell RNAseq

0

Entering edit mode

6.1 years ago

kanwarjag ★ 1.2k

There are various methods to normalize single cell RNAseq data https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4848819/ Is TPM is a reasonable approach for normalization of ScRNAseq data?

RNA-Seq • 6.7k views

ADD COMMENT • link updated 4.6 years ago by ATpoint 87k • written 6.1 years ago by kanwarjag ★ 1.2k

score 4 · Answer 1 · 2019-02-19

4

Entering edit mode

6.1 years ago

igor 13k

There is still a lot of debate in the field regarding the best way to normalize scRNA-seq data. It seems that the most popular tool right now is Seurat. The normalization it uses by default is TPM, except to 10K reads instead of 1M. Thus, TPM may not be the best option, but is certainly a reasonable approach.

ADD COMMENT • link 6.1 years ago by igor 13k

0

Entering edit mode

I completely agree and also the limitations of having less than 80% zeros are there. Denoising tools can help to solve this problem. But yes, still a very new field.

ADD REPLY • link 6.1 years ago by Gjain 5.8k

0

Entering edit mode

it's not TPM actually, it's CPM - since most scRNAseq datasets now are 3' Chromium 10x, it would be wrong to normalize to gene length.

ADD REPLY • link 4.2 years ago by predeus ★ 2.1k

0

Entering edit mode

In case of 3' datasets, theoretically one transcript results in one count regardless of length, so TPM would be the same as CPM. You would not need to perform additional gene length normalization.

Caveat: if you want to account for technical variables like internal priming, things get more complicated.

ADD REPLY • link 4.2 years ago by igor 13k

score 1 · Answer 2 · 2019-02-19

Hi,

I would look into SCnorm package (https://github.com/rhondabacher/SCnorm).

Paper link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5473255/

Important point:

To evaluate the extent to which biases introduced during normalization affect the identification of DE genes, we applied MAST9 (FDR = 0.05) to identify DE genes between the H1-1M and H1-4M conditions. Normalization with SCnorm resulted in the identification of no DE genes, whereas MR, TPM, scran, SCDE, and BASiCS resulted in 530, 315, 684, 401, and 1147 DE genes, respectively, being identified. The majority of DE calls made using data normalized from these latter approaches are lowly expressed genes (Fig. 2 (b)), which appear to be over-normalized (Fig. 2 (a)). Supplementary Fig. S4 shows similar results using H9 cells.

enter image description here

Fold-changes and DE genes calculated from the H1 case study data. For each gene, the fold-change of non-zero counts between the H1-4M and H1-1M groups was computed for data following normalization via SCnorm, MR, TPM, scran, SCDE, and BASiCS. Box-plots of gene-specific fold-changes are shown in panel (a) for data normalized by each method. The number of genes identified as DE using MAST is shown in panel (b). Genes are divided into four equally sized expression groups based on their median among non-zero un-normalized expression measurements and results are shown as a function of expression group.

In my case, it actually worked better in comparison to SCDE and TPM normalized counts.

I hope this helps.