Entering edit mode
4.8 years ago
JulianC
▴
30
Hi!
I am working on a single-cell RNA-seq dataset. The expression values within the matrix the authors provided contains unique molecular identifier (UMI)-filtered counts per cell detected in the raw data. No normalization is performed. I assume that a normalization is needed. How can I properly normalize these data? Another question: what is the lowest value for which I can say that a gene is expressed? I know that for FPKM and TPM is usually considered 0.5 as a cutoff, but I do not know for UMI.
Thank you in advance!
There are many normalization methods for scRNA-seq data. I suggest you perform normalization depending on your downstream analysis. I find
SCtransform
from Seurat a good start as it is fully integrated into their tutorials and workflows. Check the manual. There are also other methods, check this review. As always there is no gold standard, it depends on the data and the library prep. Whatever you do, please DO NOT use naive TPM or FPKM. This is already bad enough for bulk RNA-seq data and even worse for scRNA-seq with all its confounding effects. You have to correct for library composition, not only for total counts. See the literature for details.