Question

Deviance for feature selection in Seurat

0

Entering edit mode

18 months ago

Immreg ▴ 10

I am very new to Bioinformatics and first started looking into scanpy to follow a standard scRNA-seq workflow. Since I will need to use tools only available in R in the future, I now started working with Seurat.

In scanpy, I used deviance for feature selection as recommended in this "Best practices for single-cell analysis across modalities" paper. I am trying to do the same using Seurat but can just not figure out how to do it. Specifically, I would like to generate a plot as shown here at the bottom of the page with mean expression vs. variance and highly deviant features grouped by color.

To calculate deviance, I am using the devianceFeatureSelection() function of the scry package. I tried various approaches already to somehow integrate the resulting feature meta data into the Seurat object for plotting, but this is what seems most promising so far (only showing relevant lines):

obj <- CreateSeuratObject(counts = obj.data, project = "scRNAseq", min.cells = 3, min.features = 200)
deviance.feat <- devianceFeatureSelection(obj[["RNA"]]$counts)
obj[["RNA"]] <- AddMetaData(object = obj[["RNA"]], metadata = deviance.feat, col.name = 'deviance')

With this, I at least manage to add the deviance values to the feature meta data. I then apply default normalization and the FindVariableFeatures function to calculate the mean and variance. Since the variable features determined by Seurat do not include the top 2000 highly deviant features, I just include all features as highly variable (thus, nfeatures = dim(obj)[1]):

obj <- NormalizeData(obj)
obj <- FindVariableFeatures(obj, nfeatures = dim(obj)[1], selection.method = "vst")

Now, I should have the required data for my desired plot. I try to extract this data from the feature-level meta data and determine the top 2000 highly deviant genes to group those by color in the mean expression vs. variance plot.

feature.mean <- obj[["RNA"]][["vf_vst_counts_mean"]]
feature.var <- obj[["RNA"]][["vf_vst_counts_variance"]]
feature.dev <- obj[["RNA"]][["deviance"]] # same as deviance.feat but extracted from meta data (in case it is relevant for plotting?)

feature.dev.ranked <- Features(obj)[order(deviance.feat, decreasing = TRUE)]
top2000 <- head(feature.dev.ranked, 2000)

If I am not mistaken, I cannot use VariableFeaturePlot to include other meta data columns for grouping cells. So I tried using Seurat's FeatureScatter function:

plot1 <- FeatureScatter(obj, feature1 = feature.mean, feature2 = feature.var, group.by = feature.dev)
plot1

However, I get the following error message:

Error in FetchData(object = object, vars = c(feature1, feature2, group.by),  :

Or, even when omitting the group.by argument, so only trying to plot feature.mean vs. features.var, I get:

Error in .subset(x, j) : invalid subscript type 'list'

Which seems like it cannot fetch data from the list, however, both class(feature.mean) and class(feature.var) return "data.frame". I probably lack some basic understanding of the Seurat object structure and therefore cannot figure out how to integrate the calculated deviance as "usable" feature meta data. Also, I only started working with R as I started using Seurat, so I might be missing some other obvious solutions due to that.

What is the reason this approach is not working? And what could be an approach to replicate the figure (or whole process) as described on "Single-cell best practices" using Seurat?

Thanks already in advance!

feature-selection Seurat • 812 views

ADD COMMENT • link 18 months ago by Immreg ▴ 10