Question

Single Cell RNAseq Imputation in Seurat

0

Entering edit mode

7.0 years ago

bioinformatics.cancer ▴ 260

Hi,

I was writing to find out if anyone has experience doing imputation in Seurat.

Where in the workflow steps would you suggest to use the function “addImputedScore” and once this is done will the imputed values be picked up by the Clustering function (FindCluster) to make use of the imputed values. I can only see the flag “use.imputed” in the “RunPCA” function, so after “adding” the imputed expression values do I need to execute RunPCA with “use.imputed” = TRUE and then will the subsequent functions work with the imputed values?

Secondly, would you suggest using the HVG for “genes.use” and perhaps a list of GOI (such as marker genes for CD8 T cells) for “genes.fit” in the addImputedScore function.

Thanks,

Pankaj

RNA-Seq single cell imputation • 6.8k views

ADD COMMENT • link updated 6.9 years ago by Kristoffer Vitting-Seerup ★ 4.2k • written 7.0 years ago by bioinformatics.cancer ▴ 260

score 3 · Accepted Answer · 2018-11-16

Hello Pankaj,

I would generally advise not to use imputed scores to process your dataset. I always calculate PCs, perform clustering and reduce dimensions using the variable genes you discover using FindVariableGenes(), without any imputation. That being said, the imputation that Seurat offers is a practical solution to generate output plots. use.imputed is not always specifically noted in the help files of functions, but you can try adding it to functions that generate output, and it will work in a lot of cases.

What I normally do is this:

dataset <- Seurat::AddImputedScore(object = dataset, genes.use = dataset@var.genes, genes.fit = c("GOI1", "GOI2", etc), do.print = TRUE)

The do.print variable plots a message for each gene that was imputed, which I personally like. You can set it to FALSE if you want.

This imputes the specific genes that you are interested in modelling them on the variable genes that you identified in your dataset. This normally works quite well for me. The imputed scores are stored in the @imputed tab of your seurat object and can be queried as such. You can add more genes as you work. I would advise against imputing all genes, as the process is not very fast.

Functions that work with imputation (add "use.imputed = TRUE") as far as I know them: FeaturePlot(), VlnPlot(), GenePlot(), RidgePlot(). Maybe there's more, check the scripts on the Seurat github, or just test as you work

Hope this helps, Leon

score 2 · Accepted Answer · 2018-11-16

I would be a bit carefull with doing imputations. Tools for imputation are still quite new and we do not know what is a good way of doing it. The problem is if you introduce a biased by doing imputation. Recently a set of problems for imputations was described in the article "False signals induced by single-cell imputation".

Please note the standard workflow in Seurat up to and including clusters is very robust to dropouts since it works on Principal Components PCs) - hence the whole section about finding the right number of PCs to do the clustering on. For that reason I would NOT use imputation before clustering.

Since dropout could be a problem for differential expression analysis you could consider doing as Leon suggest - impute the specific genes you are interested in.

Hope this helps, Kristoffer

score 1 · Accepted Answer · 2018-09-19

I would typically filter out cells with a low coverage of genes (either without any reads, or genes with CPM below a certain threshold), as well as some total number of reads per cell.

If you only look at cells with say >50% gene coverage, I don't think imputation would be as important (presumably, the missing genes are most likely the ones that aren't expressed or have low expression in that cell). If the gene / read coverage is really low, I don't think the imputed values would be reliable anyways. However, I would be interested to see other people's thoughts.