normalization in single cell RNAseq
3
0
Entering edit mode
5.8 years ago
kanwarjag ★ 1.2k

There are various methods to normalize single cell RNAseq data https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4848819/ Is TPM is a reasonable approach for normalization of ScRNAseq data?

RNA-Seq • 6.4k views
ADD COMMENT
4
Entering edit mode
5.8 years ago
igor 13k

There is still a lot of debate in the field regarding the best way to normalize scRNA-seq data. It seems that the most popular tool right now is Seurat. The normalization it uses by default is TPM, except to 10K reads instead of 1M. Thus, TPM may not be the best option, but is certainly a reasonable approach.

ADD COMMENT
0
Entering edit mode

I completely agree and also the limitations of having less than 80% zeros are there. Denoising tools can help to solve this problem. But yes, still a very new field.

ADD REPLY
0
Entering edit mode

it's not TPM actually, it's CPM - since most scRNAseq datasets now are 3' Chromium 10x, it would be wrong to normalize to gene length.

ADD REPLY
0
Entering edit mode

In case of 3' datasets, theoretically one transcript results in one count regardless of length, so TPM would be the same as CPM. You would not need to perform additional gene length normalization.

Caveat: if you want to account for technical variables like internal priming, things get more complicated.

ADD REPLY
1
Entering edit mode
5.8 years ago
Gjain 5.8k

Hi,

I would look into SCnorm package (https://github.com/rhondabacher/SCnorm).

Paper link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5473255/

Important point:

To evaluate the extent to which biases introduced during normalization affect the identification of DE genes, we applied MAST9 (FDR = 0.05) to identify DE genes between the H1-1M and H1-4M conditions. Normalization with SCnorm resulted in the identification of no DE genes, whereas MR, TPM, scran, SCDE, and BASiCS resulted in 530, 315, 684, 401, and 1147 DE genes, respectively, being identified. The majority of DE calls made using data normalized from these latter approaches are lowly expressed genes (Fig. 2 (b)), which appear to be over-normalized (Fig. 2 (a)). Supplementary Fig. S4 shows similar results using H9 cells.

enter image description here

Fold-changes and DE genes calculated from the H1 case study data. For each gene, the fold-change of non-zero counts between the H1-4M and H1-1M groups was computed for data following normalization via SCnorm, MR, TPM, scran, SCDE, and BASiCS. Box-plots of gene-specific fold-changes are shown in panel (a) for data normalized by each method. The number of genes identified as DE using MAST is shown in panel (b). Genes are divided into four equally sized expression groups based on their median among non-zero un-normalized expression measurements and results are shown as a function of expression group.

In my case, it actually worked better in comparison to SCDE and TPM normalized counts.

I hope this helps.

ADD COMMENT

Login before adding your answer.

Traffic: 2497 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6