I have found several posts in Biostars and other online communities where the general advice has been that one can use TPM values instead of raw counts in Seurat. I have a single-cell dataset with TPM values only and there's no way to get the raw count data. I checked using colSums()
and the values came to 10^6 (to make sure it was really TPM). So, I started the analysis by creating the Seurat object as follows (I didn't perform any log transformation):
ctrl <- CreateSeuratObject(raw.data = TPM_matrix, min.cells = 3, min.features = 200)
Next, I wanted to check the top variable features that would be indicative of the signal I have in the data. Using the following command,
ctrl <- FindVariableFeatures(ctrl, selection.method = "vst", nfeatures = 2000, verbose = FALSE)
When I check the top 30 features, I get a lot of noncoding RNA genes, SNORD47, MIR1244-1, MIR1244-2, MIR1244-3, etc. None of these were among the top biomarker genes implicated in the disease of interest. I am not sure if this is because I am missing any steps in between.